Skip to content

BUG: Series stealing references from CategoricalIndex is invalid for read-only arrays#63306

@vyasr

Description

@vyasr

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

importpandasaspd# Not necessary for pandas 3.0, but I validated this in both the 3.0 rc and 2.3.3# and this setting makes the MRE work for both.pd.set_option("mode.copy_on_write", True) # We must use an int8 array here or pandas will make a (writeable) copy of the array in# https://github.com/pandas-dev/pandas/blob/499c5d4dd52a8645bf96c39bad60613097e84c06/pandas/core/dtypes/cast.py#L878# We also must convert the codes to a numpy array since that produces a read-only array,# whereas pandas has more internal logic to handle an Index correctly in CoW mode.codes=pd.Index([0, 1, 2, 3], dtype="int8").to_numpy() cats=pd.Index(["a", "b", "c", "d"]) data=pd.Categorical.from_codes(codes, cats) # We can't create a series directly from the Categorical data because the# implementation details in pandas prevent copies in this case. When we construct a# Series from a CategoricalIndex pandas tries to steal references to optimize# copying in CoW mode, which is necessary to observe the error.s=pd.Series(pd.Index(data)) s[[False, False, True, True]] =cats[2:4]

Issue Description

The above example will fail with an error

 File "${SITE}/pandas/core/arrays/_mixins.py", line 269, in __setitem__ self._ndarray[key] = value ~~~~~~~~~~~~~^^^^^ ValueError: assignment destination is read-only

The issue arises under the following circumstances:

  1. mode.copy_on_write is enabled
  2. A Series (or column in a DataFrame) is constructed from a read-only array
  3. The Series is constructed from an input that is determined not to have any other outstanding references such that CoW will not force a copy the first time an operation occurs.

Under these circumstances, pandas will not currently realize that the input data is read-only and will attempt to modify it, resulting in the above error. This example is a fairly specific case where this occurs, but I suspect that there are other similar cases where it is possible to end up with a read-only array inside a pandas object in CoW mode. The challenge is that such cases are easily obscured by any references floating around. Point 3 above is particularly delicate. While debugging this issue in my original example it took a lot of work to distill it into a minimal example because the reference counting logic in pandas will result in copies at various points in CoW mode if any foreign references exist, and I found it quite easy to wind up in cases where such references were preserved in orphaned reference cycles or hidden in other variables. In such cases pandas will defensively make copies that would cover up issues with a read-only input array.

Expected Behavior

When input data is read-only, in CoW mode pandas should check that blocks were constructed from a read-only input and make a copy when writing if necessary. That check could probably be inserted around here in setitem, but I don't know if that is the best place for it.

Installed Versions

Details
❯ python Python 3.13.11 | packaged by conda-forge | (main, Dec 6 2025, 11:24:03) [GCC 14.3.0] on linux Type "help", "copyright", "credits" or "license"for more information. >>> import pandas as pd >>> pd.show_versions() INSTALLED VERSIONS ------------------ commit : 1a3230dc5be4c87b8356765ea3b6568d37cb82fd python : 3.13.11 python-bits : 64 OS : Linux OS-release : 5.4.0-208-generic Version :#228-Ubuntu SMP Fri Feb 7 19:41:33 UTC 2025 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 3.0.0rc0 numpy : 2.4.0rc1 dateutil : 2.9.0.post0 pip : 25.3 Cython : None sphinx : None IPython : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None bottleneck : None fastparquet : None fsspec : None html5lib : None hypothesis : None gcsfs : None jinja2 : None lxml.etree : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None psycopg2 : None pymysql : None pyarrow : None pyiceberg : None pyreadstat : None pytest : None python-calamine : None pytz : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlsxwriter : None zstandard : None qtpy : None pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions