gh-107137: Add _PyTuple_NewNoTrack() internal C API#107183

vstinner · 2023-07-24T17:05:14Z

Fix a crash in PySequence_Tuple() when the tuple which is being initialized was accessed via GC introspection, like the gc.get_referrers() function. Now the function only tracks the tuple by the GC once it's fully initialized.

Add _PyTuple_NewNoTrack() and _PyTuple_ResizeNoTrack() functions to the internal C API: similar to PyTuple_New() and _PyTuple_Resize(), but don't track the tuple by the GC.

Modify PySequence_Tuple() to use these functions.

Issue: C API: Add internal C API to build a tuple: _PyTupleBuilder #107137

Fix a crash in PySequence_Tuple() when the tuple which is being initialized was accessed via GC introspection, like the gc.get_referrers() function. Now the function only tracks the tuple by the GC once it's fully initialized. Add _PyTuple_NewNoTrack() and _PyTuple_ResizeNoTrack() functions to the internal C API: similar to PyTuple_New() and _PyTuple_Resize(), but don't track the tuple by the GC. Modify PySequence_Tuple() to use these functions.

vstinner · 2023-07-24T17:07:26Z

cc @corona10 @gvanrossum

vstinner · 2023-07-24T17:18:22Z

gc.get_referrers() documentation has a warning added in 2003 by issue #39117:

Care must be taken when using objects returned by get_referrers() because some of them could still be under construction and hence in a temporarily invalid state. Avoid using get_referrers() for any purpose other than debugging.

vstinner · 2023-07-24T17:23:40Z

@pablogsal: in 2021, you wrote:

I don't know if is worth to add a new API just for tuples, given that this problem happens with many other objects

Are you still against the GC bug in PySequence_Tuple()?

Note: I'm only proposing to add functions to the internal C API. Make such API public would deserves its own discussion. Also, I proposed a different API to fix the issue, an internal "tuple builder" API: PR #107139.

vstinner · 2023-07-24T17:27:51Z

I mark this PR as a draft since in 2021, the issue #59313 was closed as "not a bug" by @pablogsal who wrote:

I mainly agree on closing this issue as we already have a warning about this behaviour in gc.get_referrers and other friends.
On the other hand I am still a bit afraid of a crash that could happen if the GC does a pass and one of these tuples is in an inconsistent state. Is possible that this crashes Python without the user calling ever any function from the GC module. I need to do some investigation around this, but I agree that this is a separate issue, so I will open a new one if it happens to be relevant.

gvanrossum · 2023-07-25T17:22:28Z

This leaves me worried. The issue where the crash due to incomplete tuples (gh-59313) was closed, basically telling people that this is an acceptable risk, and the risk is due to the existence of cg.get_referrers() (see also gh-39117).

It seems that using PyTuple_New() with a nonzero size argument is always unsafe, because the tuple is immediately tracked by GC.

An alternative way to deal with this is to not call _PyObject_GC_TRACK in PyTuple_New(), and document to users that if they create a tuple that contains mutable objects they should call PyObject_GC_Track() after fully initializing the tuple.

Introducing a new internal API, _PyTuple_NewNoTrack(), does nothing for extensions calling PyTuple_New().

I feel we have two choices here:

Either we remove the _PyObject_GC_TRACK() call from PyTuple_New() and document that users must call PyObject_GC_Track() or risk that the tuple becomes part of an immortal cycle.
Or we accept that the problem is with gc.get_referrers() and we refuse to fix it.

I don't feel that just fixing tuple(<iterator that calls gc.get_referrers()> but not doing anything about the public API (PyTuple_New()) is not an effective solution.

(This discussion should probably go into an issue, but the issues are all closed.)

vstinner · 2023-07-25T18:02:16Z

It seems that using PyTuple_New() with a nonzero size argument is always unsafe, because the tuple is immediately tracked by GC.

Correct

An alternative way to deal with this is to not call _PyObject_GC_TRACK in PyTuple_New(),

It sounds like a backward incompatible changes. Such subtle change reminds me when instances of heap types should now call Py_DECREF(Py_TYPE(self)) in their tp_dealloc function, implement tp_traverse and tp_clear, visit the type in tp_traverse, and clear the type in tp_clear :-(

I'm not saying that so many changes are needed. My worry is that such incompatible change can be silently ignored, and "suddenly" people will get new GC problems that they didn't have before since their tuples were tracked by the GC.

Not tracking tuples by the GC sounds worse then the risk of race condition of code calling gc.get_referrers().

Introducing a new internal API, _PyTuple_NewNoTrack(), does nothing for extensions calling PyTuple_New().

That's why I have a bigger plan: provide a new API to replace PyTuple_New() + PyTuple_SET_ITEM(): a "tuple builder" API, issue #107137.

The migration plan doesn't imply to remove immediately the old API. It can be slow. Moreover, at this stage, I only propose an internal C API and so how it goes in Python internals first, how the API is used, and see if it does help fixing the race conditions for us at least.

Or we accept that the problem is with gc.get_referrers() and we refuse to fix it.

Sure, that's also an option. But even if we decide to not provide a solution to fix 3rd party C extensions, does it mean that we should not fix known bugs in Python internals if there is a simple solution for that?

(This discussion should probably go into an issue, but the issues are all closed.)

I created issue #107137. These days, most discussions happen on pull requests, issues are just created to get "an issue number" for the Changelog and What's New entries :-)

I don't understand well the rationale against fixing PySequence_Tuple(). Do you mean that my internal API is too complicated to use?

gvanrossum · 2023-07-25T18:20:24Z

I don't understand well the rationale against fixing PySequence_Tuple(). Do you mean that my internal API is too complicated to use?

APIs, even internal ones, are forever. I would like you to ask other core devs to chime in here rather than it just being you and me. After all the problem has existed for decades, so why hurry now?

vstinner · 2023-07-25T18:51:23Z

APIs, even internal ones, are forever.

The difference is that we don't provide any backward compatibility warranty on the internal C API. We have greater freedom to change it (rename functions, add parameters, remove functions, etc.)

I would like you to ask other core devs to chime in here rather than it just being you and me. After all the problem has existed for decades, so why hurry now?

Sure, I would like to hear more voices. I pinged multiple core devs on the issue, on my first PR and this second PR. I only created this issue very recently, other people seem to be busy (the EuroPython is just over ;-)).

Why makes you think that there is a hurry?

gvanrossum · 2023-07-25T20:34:08Z

Why makes you think that there is a hurry?

Because you have a pattern of making changes, requesting a review (or not!), and merging before you actually get a review. See e.g. gh-106871.

davidhewitt · 2023-07-30T09:28:39Z

From PyO3's perspective, if you want us to replace PyTuple_New with something which doesn't risk GC crashes then making this stable with a name like PyTuple_UnsafeNewNoTrack (if we want unsafe to hint that this needs to be tracked by the GC after) would be helpful.

PyO3 can then use either PyTuple_SetItem or PyTuple_SET_ITEM depending on whether users are building for the stable API or the version-specific API.

vstinner · 2023-08-14T20:10:22Z

@gvanrossum:

Because you have a pattern of making changes, requesting a review (or not!), and merging before you actually get a review. See e.g. #106871.

Well. I created this PR to get reviews. 3 weeks later: no review. I pinged 3 core devs (including you). I tried to explain my problem with reviews a few times:

How can I get a review on this internal C API to fix a GC crash? :-)

gvanrossum · 2023-08-14T20:23:39Z

3 weeks later: no review.

Not true. I wrote two separate review comments, but stating concerns about API bloat (even internal) and pointing out that the issue was closed as "won't fix" for a good (IMO) reason. What more do you want? We have one core dev (me) still against and nobody else in favor. In that case the status quo wins.

vstinner · 2023-08-14T20:36:13Z

Since issue #59313 got closed as "wont fix" two years ago, the C API was discussed and a working group is trying to make it better and address C API Problems. I'm proposing a concrete solution to one of these problems. I wrote a concrete PR to discuss the problem and my specific solution.

As explained in the issue related to this PR, issue #107137, the problem is that the C API allows to create incomplete and inconsistent Python objects: capi-workgroup/problems#56 We can incrementally fix the C API to reduce this problem and design a migration path to a safer API which doesn't allow that. Fixing the problem in the internal C API is a first step, and IMO it's worth it even if the work stops here (since it does fix a concrete crash).

What more do you want?

Well, it would be nice to hear other voices. It seems like @davidhewitt is interested by a fix for this C API issue for PyO3.

gvanrossum · 2023-08-14T20:41:38Z

It might also make sense to just wait until the core dev sprint in Brno, where the future of the C API will be discussed.

nascheme · 2023-08-14T23:47:31Z

FYI, I don't think the problem is just with gc.get_referrers(). If the GC "sees" an object with invalid pointer in it, it could crash. E.g. it calls tp_traverse() and that follows a NULL pointer or invalid pointer. My guess is that PyTuple_New() is mostly okay since the GC doesn't get a chance to run until the tuple becomes valid. The original design intention of PyObject_GC_Track() and PyObject_GC_UnTrack() is that you only reveal objects to the GC when they are valid and you untrack them before they become invalid.

gvanrossum · 2023-08-15T01:48:56Z

But for tuple objects, tp_traverse is initialized with tupletraverse which uses the standard Py_VISIT macro which skips NULL. I assume that other types that leave NULL values after construction use this too. I don't think we have to worry about other garbage values, so I don't think there's any a risk from tp_traverse.

My guess is that PyTuple_New() is mostly okay since the GC doesn't get a chance to run until the tuple becomes valid.

This I'm not so sure about -- IIUC the bug reported in gh-59313 shows how to get the GC to run (note it was closed because it was deemed an acceptable risk, not because it was fixed). FWIW This PR should reference that issue in its title, not gh-107137 (which is about adding a different tuple building API).

davidhewitt · 2023-08-15T05:15:33Z

While I'm not aware of crashes in the wild caused by this in PyO3, the issue in #59313 probably does affect Rust code using PyO3. We allow tuples to be built from Rust iterators. Users have the flexibility to run arbitrary Python code while the uninitialized tuple is under construction. We could work around this by first collecting the iterator into a Rust vector of PyObject*, from some measurements that was a 20-50% slowdown.

If the fix to avoid the crash and keep current performance "just" exposing a new API which is PyTuple_New but where users have to manually call PyObject_GC_Track once the tuple is fully initialised, that'd be extremely easy for PyO3 to migrate to. So my ask is that if you merge this PR, please add this to the public API.

An alternative way to deal with this is to not call _PyObject_GC_TRACK in PyTuple_New(),

I think that's not feasible because stable API extensions built before this change will start producing tuples not tracked by GC, which sounds bad to me.

markshannon · 2023-08-15T08:35:46Z

As @vstinner said in capi-workgroup/problems#56 creating incomplete tuples is unsafe.

Rather than an add another unsafe API, would it not be better to promote the safe ones and deprecate the unsafe ones?

From smaller tuples, PyTuple_Pack() is reasonably efficient, and _PyTuple_FromArray() more so.
We should probably make PyTuple_FromArray() public.

For larger tuples, the safe thing to do is build a list, then call PyList_AsTuple().
If that is inefficient, we should fix that, rather than playing with broken tuples.

erlend-aasland · 2023-08-15T08:53:00Z

From smaller tuples, PyTuple_Pack() is reasonably efficient, and _PyTuple_FromArray() more so. We should probably make PyTuple_FromArray() public.

I'd be in favour of making _PyTuple_FromArray() public. It is a neatly implemented API, and it has been used with great success in the code base. Also, I think Raymond's idea of filling newly created tuples with Py_None should be considered now that None is immortal.

corona10 · 2023-08-15T09:57:04Z

I'd be in favour of making _PyTuple_FromArray() public. It is a neatly implemented API, and it has been used with great success in the code base.

Looks good to me also, because I am not sure how people want to use APIs based on elements of size.
For example, recommending using API in small cases and using B API for other scenarios is not good for me in most cases.
PyTuple_FromArray will be a good single way to create a tuple in most of cases.

vstinner · 2023-08-16T21:19:22Z

@markshannon:

Rather than an add another unsafe API, would it not be better to promote the safe ones and deprecate the unsafe ones?

This change is related to issue #107137 which proposes adding a new safe API. But for the implementation, I need to add internal unsafe functions, right.

vstinner · 2023-08-16T21:30:05Z

Right, initializing tuple times to None would prevent crashes on repr(tuple) when devs play with GC introspection functions.

But my concern is more general than that. For me, PyTuple_SET_ITEM() is too different from all other C API functions: it doesn't clear the reference to the old item, since it makes the assumption that the function must only be called on newly created tuple with uninitialized items.

PyTuple_SET_ITEM() doc:

Like PyTuple_SetItem(), but does no error checking, and should only be used to fill in brand new tuples.

I would prefer to move towards an API where it's impossible to create unitialized/incomplete objects and so a function like PyTuple_SET_ITEM() would make no sense. Reference counting would be more regular and the API would be less error-prone.

PyTuple_SET_ITEM(tuple, index, item) is really just tuple->ob_items[index] = item;. It doesn't call Py_DECREF(tuple->ob_items[index]) (or Py_XDECREF()).

PyTuple_SetItem() is more regular: it clears the old item before setting the new item using Py_XSETREF().

On the implementation side, because PyTuple_New() API exists, most Objects/tupleobject.c functions have to handle the case of NULL items. For example, PyTuple_SetItem() uses Py_XSETREF(), instead of Py_SETREF(). tupledealloc() uses Py_XDECREF(), instead of Py_DECREF(). For me, it's weird to have to care about partially initialized tuple just for the uncommon code path (tuple partially initialized, and oops, an error occurred).

Other APIs like PyTuple_Pack() or proposed "tuple builder" API doesn't have such problem: if we manage to remove PyTuple_New() in the long term, it will be possible to cleanup Objects/tupleobject.c to avoid the NULL items case.

vstinner · 2023-08-22T14:15:13Z

By the way, this change also fix Lib/test/crashers/gc_inspection.py which can be removed in that case.

vstinner · 2023-08-26T02:59:21Z

I close my issue, see: #107139 (comment)

vstinner · 2023-08-26T02:59:58Z

Ooops, wrong window, I wanted to close the issue #107137

vstinner · 2023-09-13T16:13:08Z

Apparently, I failed to convinced that _PyTuple_NewNoTrack() is the good fix for the issue, and some people consider that this crash is not important enough to be fixed, so I close just my PR.

markshannon · 2023-09-13T16:18:38Z

Could you open an issue for the crash?
The issue for the crash shouldn't be tied to a new API.
We should definitely fix the crash.

vstinner · 2023-09-13T16:40:29Z

Could you open an issue for the crash?

There was an issue closed by @pablogsal and @rhettinger was also against it. I asked @pablogsal if he can reconsider his decision but he didn't replied. So I give up. If you feel ready to reconsider this design choice, please go ahead and open a new issue :-) I'm done on this topic for now ;-)

markshannon · 2023-09-13T16:54:14Z

I think we should fix this. It is a crash in legal Python code.
#59313 (comment)

I don't think we need a new API, we need to fix PySequence_Tuple()

vstinner · 2023-09-13T16:57:24Z

I don't think we need a new API, we need to fix PySequence_Tuple()

My PR only added an internal C API. Is it now wrong to add internal C API functions?

markshannon · 2023-09-14T11:02:24Z

Is it now wrong to add internal C API functions?

No, but I don't think this is the way to go. _PyTuple_NewNoTrack might be less bug-prone than PyTuple_New, but it is still bug-prone.
We should be creating tuples all at once, so we don't have worry about how to handle invalid tuples.
PyTuple_FromArray and PyTupleFromList are the way to go.

markshannon · 2023-09-14T11:04:42Z

And PyTuple_Pack.

If the loss of efficiency in refcount handling is an obstacle to adoption, we could add variants that consume the references.

markshannon · 2024-04-11T09:17:56Z

See #117747

bedevere-bot mentioned this pull request Jul 24, 2023
C API: Add internal C API to build a tuple: _PyTupleBuilder #107137
Closed

bedevere-bot added the awaiting core review label Jul 24, 2023

vstinner mentioned this pull request Jul 24, 2023
gh-107137: Add _PyTupleBuilder API to the internal C API #107139
Closed

vstinner force-pushed the tuple_new_notrack branch from 9c9f4a7 to 55d1b7aCompare July 24, 2023 17:07

vstinner marked this pull request as draft July 24, 2023 17:26

bedevere-bot removed the awaiting core review label Jul 24, 2023

nascheme mentioned this pull request Aug 15, 2023
Incomplete tuple created by PyTuple_New() and accessed via the GC can trigged a crash #59313
Closed

vstinner closed this Aug 26, 2023

vstinner reopened this Aug 26, 2023

eendebakpt mentioned this pull request Aug 29, 2023
gh-72793: C implementation of parts of copy.deepcopy #91610
Open

vstinner closed this Sep 13, 2023

vstinner deleted the tuple_new_notrack branch September 13, 2023 16:13

Uh oh!

gh-107137: Add _PyTuple_NewNoTrack() internal C API#107183

gh-107137: Add _PyTuple_NewNoTrack() internal C API #107183

Uh oh!

Conversation

vstinner commented Jul 24, 2023• edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner commented Jul 24, 2023

Uh oh!

vstinner commented Jul 24, 2023

Uh oh!

vstinner commented Jul 24, 2023• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner commented Jul 24, 2023

Uh oh!

gvanrossum commented Jul 25, 2023

Uh oh!

vstinner commented Jul 25, 2023

Uh oh!

gvanrossum commented Jul 25, 2023

Uh oh!

vstinner commented Jul 25, 2023

Uh oh!

gvanrossum commented Jul 25, 2023

Uh oh!

davidhewitt commented Jul 30, 2023

Uh oh!

vstinner commented Aug 14, 2023

Uh oh!

gvanrossum commented Aug 14, 2023

Uh oh!

vstinner commented Aug 14, 2023

Uh oh!

gvanrossum commented Aug 14, 2023

Uh oh!

nascheme commented Aug 14, 2023

Uh oh!

gvanrossum commented Aug 15, 2023

Uh oh!

davidhewitt commented Aug 15, 2023• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markshannon commented Aug 15, 2023

Uh oh!

erlend-aasland commented Aug 15, 2023

Uh oh!

corona10 commented Aug 15, 2023

Uh oh!

vstinner commented Aug 16, 2023

Uh oh!

vstinner commented Aug 16, 2023

Uh oh!

vstinner commented Aug 22, 2023

Uh oh!

vstinner commented Aug 26, 2023

Uh oh!

vstinner commented Aug 26, 2023

Uh oh!

vstinner commented Sep 13, 2023

Uh oh!

markshannon commented Sep 13, 2023

Uh oh!

vstinner commented Sep 13, 2023

Uh oh!

markshannon commented Sep 13, 2023

Uh oh!

vstinner commented Sep 13, 2023

Uh oh!

markshannon commented Sep 14, 2023

Uh oh!

markshannon commented Sep 14, 2023

Uh oh!

markshannon commented Apr 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

vstinner commented Jul 24, 2023•
edited by bedevere-bot
Loading

vstinner commented Jul 24, 2023•
edited
Loading

davidhewitt commented Aug 15, 2023•
edited
Loading