Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
gh-115999: Add free-threaded specialization for FOR_ITER#128798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
Yhg1s commented Jan 13, 2025 • edited by bedevere-app bot
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by bedevere-app bot
Uh oh!
There was an error while loading. Please reload this page.
thread-safe as without spcialization (i.e. not much to none at all).
Yhg1s commented Jan 13, 2025
I still need to add tests for the specialization (rather than the thread-safety, which are in #128637), but otherwise the PR is good enough to review I think. |
mpage commented Jan 14, 2025
Why not use the approach that you suggested in discord where we only specialize for the case where the iterator is uniquely referenced by the current thread? It seems like that should cover the overwhelmingly common case, would be simpler to implement, and the resulting instructions would be faster. |
Yhg1s commented Jan 14, 2025
I realised that approach wouldn't work for the list referenced by the list iterator (which is the bulk of the work for list iteration) because we're not doing any refcount operations on it (on purpose) and |
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
…ly to uniquely referenced iterators. This handles the common case of 'for item in seq' (where 'seq' is a list, tuple or range object) and 'for item in generator_function()', but not, for example, 'g = gen(...); for item in g:'.
Yhg1s commented Jan 21, 2025
I've added a test that exercises the deopt paths, although for it to pass with ThreadSanitizer relies on #128637 going in first. The test is a little wacky because for list/range/tuple iterators it's not easy to leak the uniquely referenced iterators. (For generators it could just use a weakref.) I wrote the test to make sure the deopt paths were actually safe, but I'm not sure if long-term those tests make sense. We could just keep them around while we fiddle with specialization, but drop them once we're sure the iterators can't leak (or we change the entire approach to iterator specialization.) |
FOR_ITER_LIST specialization.
Python/bytecodes.c Outdated
| // For free-threaded Python, the loop exit can happen at any point during item | ||
| // retrieval, so separate ops don't make much sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like this comment applies to the tuple specialization. Can you delete it if not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Python/bytecodes.c Outdated
| // For free-threaded Python, the loop exit can happen at any point during item | ||
| // retrieval, so separate ops don't make much sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand a little bit on why this doesn't make sense? It'd be nice to keep the structure of the ops the same between the two builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
| [FOR_ITER] ={true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_ERROR_FLAG | HAS_ERROR_NO_POP_FLAG | HAS_ESCAPES_FLAG }, | ||
| [FOR_ITER_GEN] ={true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_DEOPT_FLAG }, | ||
| [FOR_ITER_LIST] ={true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_EXIT_FLAG }, | ||
| [FOR_ITER_LIST] ={true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_DEOPT_FLAG | HAS_EXIT_FLAG | HAS_ESCAPES_FLAG }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need the HAS_ESCAPES_FLAG? How does it escape?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mechanism for fetching an item from a shared list includes calling _Py_TryIncrefCompare, which has a path which DECREFs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which path? I'm confused as why an incref would need to decref.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the reference is removed from the array that backs the list between the time it's retrieved and it's returned.
cpython/Include/internal/pycore_object.h
Lines 595 to 610 in 180ee43
| /* Tries to incref the object op and ensures that *src still points to it. */ | |
| staticinlineint | |
| _Py_TryIncrefCompare(PyObject**src, PyObject*op) | |
| { | |
| if (_Py_TryIncrefFast(op)){ | |
| return1; | |
| } | |
| if (!_Py_TryIncRefShared(op)){ | |
| return0; | |
| } | |
| if (op!=_Py_atomic_load_ptr(src)){ | |
| Py_DECREF(op); | |
| return0; | |
| } | |
| return1; | |
| } |
Added in #114512.
markshannonJan 28, 2025 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how that works. What happens if *src is modified after op != _Py_atomic_load_ptr(src) but before the function returns?
Regardless, we want to keep FOR_ITER_LIST non-escaping in the default build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@colesbury do you remember why lists use _Py_TryXGetRef (in list_get_item_ref) or why it matters that the src ptr still refers to the same item? Can the object be invalid in some way that still requires us to DECREF it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two hazards that _Py_TryXGetRef handles:
- The object
opmay be deallocated between the initial load and the incref. The try-incref, in cooperation with the allocator, handles this case. It returns zero ifophas zero refcount. - The memory block for
opmay be deallocated and reallocated for a different object in between the initial load and incref. In that case the try-incref succeeds, but the subsequent check fails and we need to decrefop.
It does not matter if *src is modified after op != _Py_atomic_load_ptr(src). op will still be a valid reference to an object that was the next element in the list at some point during the operation. It's always been possible for the list to be concurrently modified between the execution of FOR_ITER and subsequent code that uses the result.
See https://peps.python.org/pep-0703/#optimistically-avoiding-locking
Regardless, we want to keep FOR_ITER_LIST non-escaping in the default build
Is there any performance reason to do so?
brandtbucherJan 29, 2025 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless, we want to keep FOR_ITER_LIST non-escaping in the default build
Is there any performance reason to do so?
Just to follow up: this does make JIT code slightly worse for traces covering for loops over lists. Since the _ITER_NEXT_LIST at the top of the loop is now escaping, this will add a new _CHECK_VALIDITY uop to the start of each loop body where none exists currently.
Consider this loop:
deff(): x=0foriinlist(range(10_000)): x+=iThis change increases the number of validity checks per loop from 2 to 3. Before:
_MAKE_WARM _SET_IP _CHECK_PERIODIC _CHECK_VALIDITY <------------- OLD _ITER_CHECK_LIST _GUARD_NOT_EXHAUSTED_LIST _ITER_NEXT_LIST _SET_IP _STORE_FAST_1 _CHECK_VALIDITY <------------- OLD _LOAD_FAST_0 _LOAD_FAST_1 _GUARD_BOTH_INT _BINARY_OP_ADD_INT _SET_IP _STORE_FAST_0 _JUMP_TO_TOP After:
_MAKE_WARM _SET_IP _CHECK_PERIODIC _CHECK_VALIDITY <------------- OLD _ITER_CHECK_LIST _GUARD_NOT_EXHAUSTED_LIST _ITER_NEXT_LIST _CHECK_VALIDITY_AND_SET_IP <-- NEW _STORE_FAST_1 _CHECK_VALIDITY <------------- OLD _LOAD_FAST_0 _LOAD_FAST_1 _GUARD_BOTH_INT _BINARY_OP_ADD_INT _SET_IP _STORE_FAST_0 _JUMP_TO_TOP There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify:FOR_ITER_LIST is escaping because _PyList_GetItemRefNoLock is escaping, and _PyList_GetItemRefNoLock is escaping because _Py_TryIncrefCompareStackRef is escaping.
Either we need to find an alternative approach or make _Py_TryIncrefCompareStackRef non-escaping.
Yhg1s commented Mar 6, 2025
@markshannon This is the FOR_ITER specialization you wanted to take another look at. |
markshannon left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a shame about FOR_ITER_LIST becoming escaping.
We need to make _Py_TryIncrefCompareStackRef non-escaping, but not in this PR.
Otherwise we'll end up with everything escaping, as more code uses _Py_TryIncrefCompareStackRef.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
| [FOR_ITER] ={true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_ERROR_FLAG | HAS_ERROR_NO_POP_FLAG | HAS_ESCAPES_FLAG }, | ||
| [FOR_ITER_GEN] ={true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_DEOPT_FLAG }, | ||
| [FOR_ITER_LIST] ={true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_EXIT_FLAG }, | ||
| [FOR_ITER_LIST] ={true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_DEOPT_FLAG | HAS_EXIT_FLAG | HAS_ESCAPES_FLAG }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify:FOR_ITER_LIST is escaping because _PyList_GetItemRefNoLock is escaping, and _PyList_GetItemRefNoLock is escaping because _Py_TryIncrefCompareStackRef is escaping.
Either we need to find an alternative approach or make _Py_TryIncrefCompareStackRef non-escaping.
Yhg1s commented Mar 8, 2025
I think we discussed this in the weekly meeting a month or two ago... I think the only way to avoid it is to delay DECREF'ing objects we've accidentally incorrectly INCREF'ed -- that is to say, objects that were newly allocated from the same memory as the object we were trying to INCREF but we lost the INCREF/DECREF race with another thread on. I think in this case that might not be too bad: if the INCREF fails we deopt anyway, and we could delay that DECREF until that deopt. For other cases where we need to TryIncref, I'm not sure of the impact of delaying (plus, we'd need some mechanism for the delay). |
Yhg1s commented Mar 10, 2025
@markshannon let me know if you want that to happen in this PR (but I'd prefer it not). |
markshannon commented Mar 11, 2025
No need to do it in this PR. |
de2f7da into python:mainUh oh!
There was an error while loading. Please reload this page.
…n#128798) Add free-threaded versions of existing specialization for FOR_ITER (list, tuples, fast range iterators and generators), without significantly affecting their thread-safety. (Iterating over shared lists/tuples/ranges should be fine like before. Reusing iterators between threads is not fine, like before. Sharing generators between threads is a recipe for significant crashes, like before.)
Add free-threaded versions of existing specialization for FOR_ITER (list, tuples, fast range iterators and generators), without significantly affecting their thread-safety. (Iterating over shared lists/tuples/ranges should be fine like before. Reusing iterators between threads is not fine, like before. Sharing generators between threads is a recipe for significant crashes, like before.)
--disable-gilbuilds #115999