Skip to content

Reference counting bug with manually allocated heap types#128923

@colesbury

Description

@colesbury

Bug report

Found by @vfdev-5.

This is specific to the free threading build and 3.14.

XLA/Jax uses the following code to create a heap type:

// We need to use heap-allocated type objects because we want to add// additional methods dynamically. ... nb::str name = nb::str("PmapFunction"); nb::str qualname = nb::str("PmapFunction"); PyHeapTypeObject* heap_type = reinterpret_cast<PyHeapTypeObject*>( PyType_Type.tp_alloc(&PyType_Type, 0)); // Caution: we must not call any functions that might invoke the GC until// PyType_Ready() is called. Otherwise the GC might see a half-constructed// type object.CHECK(heap_type) << "Unable to create heap type object"; heap_type->ht_name = name.release().ptr(); heap_type->ht_qualname = qualname.release().ptr(); ...

https://github.com/openxla/xla/blob/19a8e8e05fb34c5c4b8c38c9a8225e89f008c8c1/xla/python/pmap_lib.cc#L1027-L1058

In other words, the heap type is created by by calling PyType_Type.tp_alloc and filling in the fields, instead of the more common use of PyType_FromSpec. This leaves unique_id zero initialized. The problem is that unique_id=0 currently looks like a valid unique id for per-thread reference counting, which leads to reference counting errors and use-after-frees.

I think we should change the per-thread reference counting so that unique_id=0 is the sentinel value indicating that it's not assigned instead of the current unique_id=-1 convention.

Full repro

Linked PRs

Metadata

Metadata

Assignees

Labels

3.14bugs and security fixestopic-free-threadingtype-bugAn unexpected behavior, bug, or errortype-crashA hard crash of the interpreter, possibly with a core dump

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions