gh-129069: make list ass_slice and memory_repeat safe in free-threading#131882

tom-pytel · 2025-03-29T20:11:55Z

This PR exists because of #129069 and Tools/tsan/suppressions_free_threading.txt:

# List resizing happens through different paths ending in memcpy or memmove # (for efficiency), which will probably need to rewritten as explicit loops # of ptr-sized copies to be thread-safe.

It seems to hit the requirements: performance, atomicity assured (per pointer), TSAN shuts up. Pretty sure can remove the list_ass_slice_lock_held and list_inplace_repeat_lock_held suppressions, and if not yet then should be able to add the atomic memcpy to list_resize where needed to be able to do so.

The "atomic memcpy" (atomic per ptr, not whole memcpy) functions are in the atomic wrappers header because they can be reused, if want otherwise though can move into listobject.c, or somewhere else. Can also make non-inline callable real functions. Or the inline funcs could go into pyatomic.h?

@colesbury, you may recognize this, I did something similar for the lock-free array module PR, but this is simpler. PyObject pointers are a bit more fragile than pure values though so I assume you want this here?

Here is an example of what the inner loop a FT_ATOMIC_MEMMOVE_PTR_RELAXED compiles to, its no rep movsq or other specialized move instructions, but its fast:

.L2971:# ./Include/cpython/pyatomic_gcc.h:387:{return (void *)__atomic_load_n((void * const *)obj, __ATOMIC_RELAXED)}movrax, QWORD PTR [rdi] # _160,* s# ./Include/internal/pycore_pyatomic_ft_wrappers.h:143: for (void **d = (void **)dest + n,**s = (void **)src + n,**e = (void **)dest -1; d != e; d--, s--){subrsi,64 # d,# ./Include/internal/pycore_pyatomic_ft_wrappers.h:143: for (void **d = (void **)dest + n,**s = (void **)src + n,**e = (void **)dest -1; d != e; d--, s--){subrdi,64 # s,# ./Include/cpython/pyatomic_gcc.h:509:{__atomic_store_n((void **)obj, value, __ATOMIC_RELAXED)}mov QWORD PTR 64[rsi],rax #,, _160movrax, QWORD PTR 56[rdi] # _160,mov QWORD PTR 56[rsi],rax #,, _160# ./Include/cpython/pyatomic_gcc.h:387:{return (void *)__atomic_load_n((void * const *)obj, __ATOMIC_RELAXED)}movrax, QWORD PTR 48[rdi] # _160,mov QWORD PTR 48[rsi],rax #,, _160# ./Include/cpython/pyatomic_gcc.h:387:{return (void *)__atomic_load_n((void * const *)obj, __ATOMIC_RELAXED)}movrax, QWORD PTR 40[rdi] # _160,mov QWORD PTR 40[rsi],rax #,, _160# ./Include/cpython/pyatomic_gcc.h:387:{return (void *)__atomic_load_n((void * const *)obj, __ATOMIC_RELAXED)}movrax, QWORD PTR 32[rdi] # _160,mov QWORD PTR 32[rsi],rax #,, _160# ./Include/cpython/pyatomic_gcc.h:387:{return (void *)__atomic_load_n((void * const *)obj, __ATOMIC_RELAXED)}movrax, QWORD PTR 24[rdi] # _160,mov QWORD PTR 24[rsi],rax #,, _160# ./Include/cpython/pyatomic_gcc.h:387:{return (void *)__atomic_load_n((void * const *)obj, __ATOMIC_RELAXED)}movrax, QWORD PTR 16[rdi] # _160,mov QWORD PTR 16[rsi],rax #,, _160# ./Include/cpython/pyatomic_gcc.h:387:{return (void *)__atomic_load_n((void * const *)obj, __ATOMIC_RELAXED)}movrax, QWORD PTR 8[rdi] # _160,mov QWORD PTR 8[rsi],rax #,, _160cmp QWORD PTR 32[rsp],rsi # %sfp, djne .L2971 #,

Simple benchmark. Note the decrementing address copy case in FT_ATOMIC_MEMMOVE_PTR_RELAXED seems to hurt a bit cache-wise in the 'tins' simple bench, but it doesn't seem to make a difference in the large move or overall in pyperformance. I've had this one jump around from parity with current to this (which is the worst case so I put it here). The difference disappears if the real memmove is used in this case but the difference seems to come from the real memmove using special instructions (which are not atomic), a la: https://codebrowser.dev/glibc/glibc/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S.html. The other two simple (tdel and tins big) are solid where they are. Test script:

fromtimeitimporttimeitfromtimeimporttimetdel=timeit('del dst[:1024]', 'dst = [None] * 4096', number=10000000) print(f't del: {tdel}') tins=timeit('dst[:0] = src', 'dst = [None] * 4096; dst.pop(); src = [None]', number=100000) # pop() to make sure no realloc in timingprint(f't ins: {tins}') dst= [None] *0x10000000src= [None] dst.pop() t0=time() dst[:0] =srct1=time() print(f't ins big: {t1-t0}')

Times, average of 10 runs each:

current main: ------------- t del: 0.2191115931245804 t ins: 0.3050231862498549 t ins big: 0.0799584686756134 fix-129069: ----------- t del: 0.21582267370067712 t ins: 1.072859044800134 t ins big: 0.08154952526092529

pyperformance benchmark. Difference in average performance is essentially nonexistent though individual tests can vary a bit (AMD 7950x running in VirtualBox):

 current main fix-129069 norm diff async_generators 250 254 -0.0157 asyncio_tcp 248 247 0.0040 asyncio_tcp_ssl 1.08 1.06 0.0189 asyncio_websockets 344 342 0.0058 chaos 42.3 42.4 -0.0024 comprehensions 11.1 11 0.0091 bench_mp_pool 17.5 17.4 0.0057 bench_thread_pool 1.16 1.09 0.0642 coroutines 12.8 12.8 0.0000 coverage 57 54.5 0.0459 crypto_pyaes 51.9 51.6 0.0058 deepcopy 182 180 0.0111 deepcopy_reduce 1.91 1.94 -0.0155 deepcopy_memo 20.2 21.7 -0.0691 deltablue 2.3 2.3 0.0000 django_template 25.2 25 0.0080 docutils 1.61 1.58 0.0190 dulwich_log 26.3 25.8 0.0194 fannkuch 260 257 0.0117 float 42.4 43.5 -0.0253 create_gc_cycles 494 512 -0.0352 gc_traversal 1.17 1.27 -0.0787 generators 20.9 21.4 -0.0234 genshi_text 16.6 16.7 -0.0060 genshi_xml 34.3 34.9 -0.0172 go 80.3 80.8 -0.0062 hexiom 4.17 4.22 -0.0118 html5lib 33.2 34 -0.0235 json_dumps 7.43 7.31 0.0164 json_loads 14.2 13.9 0.0216 logging_format 4.58 4.61 -0.0065 logging_silent 64.8 63.7 0.0173 logging_simple 4.2 4.22 -0.0047 mako 7.95 7.97 -0.0025 mdp 1.78 1.98 -0.1010 meteor_contest 68.2 69.7 -0.0215 nbody 82.6 78.8 0.0482 nqueens 58.6 61.1 -0.0409 pathlib 13.8 13.9 -0.0072 pickle 6.12 6.13 -0.0016 pickle_dict 15.3 15.2 0.0066 pickle_list 2.3 2.18 0.0550 pickle_pure_python 204 203 0.0049 pidigits 116 117 -0.0085 pprint_safe_repr 499 513 -0.0273 pprint_pformat 1.02 1.05 -0.0286 pyflate 296 304 -0.0263 python_startup 9.29 9.29 0.0000 python_startup_no_site 7.78 7.62 0.0210 raytrace 192 193 -0.0052 regex_compile 74.4 74.1 0.0040 regex_dna 111 110 0.0091 regex_effbot 1.69 1.7 -0.0059 regex_v8 13.8 13.7 0.0073 richards 32.1 32.4 -0.0093 richards_super 36.8 37.3 -0.0134 scimark_fft 232 231 0.0043 scimark_lu 81 80.7 0.0037 scimark_monte_carlo 45.9 47.9 -0.0418 scimark_sor 77.9 76.3 0.0210 scimark_sparse_mat_mult 3.83 3.78 0.0132 spectral_norm 68.5 67 0.0224 sqlglot_normalize 197 196 0.0051 sqlglot_optimize 36.8 36.9 -0.0027 sqlglot_parse 887 888 -0.0011 sqlglot_transpile 1.1 1.1 0.0000 sqlite_synth 1.43 1.44 -0.0069 sympy_expand 277 276 0.0036 sympy_integrate 12.8 12.7 0.0079 sympy_sum 88.1 85.3 0.0328 sympy_str 164 162 0.0123 telco 5.47 5.39 0.0148 tomli_loads 1.37 1.37 0.0000 typing_runtime_protocols 112 112 0.0000 unpack_sequence 28.7 28.6 0.0035 unpickle 8 8.18 -0.0220 unpickle_list 2.87 2.75 0.0436 unpickle_pure_python 144 140 0.0286 xml_etree_parse 74 72.6 0.0193 xml_etree_iterparse 49.6 47.8 0.0377 xml_etree_generate 56.2 56.9 -0.0123 xml_etree_process 40.4 40.3 0.0025 AVERAGE -0.0001

Issue: Race in concurrent list mutation and item retrieval #129069

tom-pytel · 2025-03-29T20:37:31Z

Ping @colesbury , @Yhg1s , Windows doesn't like size_t as a parameter, will fix, but would like a thumbs up or down on this PR in general?

Include/internal/pycore_pyatomic_ft_wrappers.h

Misc/NEWS.d/next/Core_and_Builtins/2025-03-29-20-12-28.gh-issue-129069.QwbbWV.rst

python-cla-bot · 2025-04-06T13:56:29Z

All commit authors signed the Contributor License Agreement.

Include/cpython/pyatomic.h

Include/internal/pycore_list.h

Include/cpython/pyatomic.h

tom-pytel · 2025-08-31T22:18:06Z

Thank you @mhsmith for the analysis, it led to the solution.

Specifically:

cpython/configure

Line 30963 in 78acd8e

#include "pyatomic.h"

needs to be accomodated on ARM Android build (but apparently nowhere else, and I swear it compiled fine a couple weeks ago).

@kumaraditya303 I had to revert #131882 (comment) because _Py_IS_ALIGNED is not available in the above case. In fact I had to add #include <assert.h> to pyatomic.h in order to accomodate even direct alignment validation. Which leads me to a question, should we even bother with those assertions here (which necessitates that include)?

mhsmith · 2025-09-01T07:30:28Z

As I explained in my previous comment, this is a macOS problem, not an Android one. If the PR was merged in its previous state, it would have failed on the macOS buildbots, unless they use the configure cache in the same unsafe way as GitHub Actions does.

bedevere-bot · 2025-09-06T07:13:27Z

🤖 New build scheduled with the buildbot fleet by @kumaraditya303 for commit 9c4debd 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F131882%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

tom-pytel · 2025-09-06T11:28:24Z

WHY???

 extra_compile_args: [..., '-Werror=declaration-after-statement', '-DMODULE_NAME=_test_c99_cext', '-std=c99'] /Users/buildbot/buildarea/pull_request.pablogsal-macos-m1.macos-with-brew/build/Include/cpython/pyatomic.h:590:16: error: mixing declarations and code is incompatible with standards before C99 [-Werror,-Wdeclaration-after-statement] 590 | void **dest_ = (void **)dest + n; | ^ 1 error generated.

Include/cpython/pyatomic.h

bedevere-bot · 2025-09-07T14:06:47Z

🤖 New build scheduled with the buildbot fleet by @kumaraditya303 for commit 32f9604 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F131882%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

colesbury · 2025-12-19T18:44:11Z

Hi @tom-pytel, thanks for your work on this and sorry for letting the PR languish. I've taken a slightly different approach in #142957 for the same issue.

tom-pytel · 2025-12-20T01:32:11Z

Hi @tom-pytel, thanks for your work on this and sorry for letting the PR languish. I've taken a slightly different approach in #142957 for the same issue.

No worries, I'll close this out. But what should I do with #130771, close or is it still viable?

pythongh-129069: make list ass_slice and memory_repeat safe
0c8dcfc

bedevere-appbot added the awaiting review label Mar 29, 2025

bedevere-appbot mentioned this pull request Mar 29, 2025
Race in concurrent list mutation and item retrieval #129069
Closed

blurb-itbotand others added 2 commits March 29, 2025 20:12

📜🤖 Added by blurb_it.
67c9459

include atomic wrappers in pycore_list.h
e460b5a

provisional fix to compile on windows
fcdb634

tom-pytel changed the title ~~gh-129069: make list ass_slice and memory_repeat safe~~gh-129069: make list ass_slice and memory_repeat safe in free-threadingMar 29, 2025

Yhg1s reviewed Apr 1, 2025
View reviewed changes

tom-pytel added 3 commits April 1, 2025 06:45

requested changes so far
b1e9216

Merge branch 'main' into fix-issue-129069
944abf0

Merge branch 'main' into fix-issue-129069
3a9e249

tom-pytel added 4 commits April 8, 2025 07:49

remove atomic load and rename
0f96d58

Merge branch 'main' into fix-issue-129069
47f3d81

Merge branch 'main' into fix-issue-129069
adf2c15

Merge branch 'main' into fix-issue-129069
b569865

ZeroIntensity added the topic-free-threading label May 18, 2025

ZeroIntensity reviewed May 18, 2025
View reviewed changes

Include/cpython/pyatomic.h Outdated Show resolvedHide resolved

tom-pytel added 2 commits May 18, 2025 07:21

Merge branch 'main' into fix-issue-129069
87d48ff

requested variable name changes
26ee081

kumaraditya303 reviewed Aug 1, 2025
View reviewed changes

Include/cpython/pyatomic.h Outdated Show resolvedHide resolved
Include/internal/pycore_list.hShow resolvedHide resolved
Include/cpython/pyatomic.h Outdated Show resolvedHide resolved

tom-pytel added 2 commits August 1, 2025 09:00

use _Py_IS_ALIGNED()
ac4e968

Merge branch 'main' into fix-issue-129069
890186e

kumaraditya303 reviewed Aug 2, 2025
View reviewed changes

Include/cpython/pyatomic.h Outdated Show resolvedHide resolved

tom-pytel added 5 commits August 2, 2025 08:46

Py_ssize_t arg -> size_t
7c250fa

Merge branch 'main' into fix-issue-129069
a0dca26

add tests and remove suppressions
80c12d2

Merge branch 'main' into fix-issue-129069
f16eb60

Merge branch 'main' into fix-issue-129069
e36e314

tom-pytel added 9 commits August 31, 2025 17:10

last test of copy at pyatomic.h level
a5f4d9e

test location dependence
73c7cc1

put back test
93b90c7

test basic assert
bf15e93

leave assertions commented out
39bee16

Merge branch 'main' into fix-issue-129069
387232d

test assert(1 == 1) with latest main
6f4ef28

remove assert(1 == 1) test, now we know
c2de542

#include <assert.h>
e6d3e5c

Merge branch 'main' into fix-issue-129069
9c4debd

kumaraditya303 added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Sep 6, 2025

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Sep 6, 2025

tom-pytel added 2 commits September 6, 2025 07:22

Merge branch 'main' into fix-issue-129069
f167747

make C90 compatible
32f9604

kumaraditya303 reviewed Sep 7, 2025
View reviewed changes

Include/cpython/pyatomic.hShow resolvedHide resolved

kumaraditya303 added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Sep 7, 2025

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Sep 7, 2025

Merge branch 'main' into fix-issue-129069
00c57d3

mhsmith mentioned this pull request Oct 5, 2025
Caching config.cache in GitHub Actions is unsafe #139596
Closed

kumaraditya303 requested review from Yhg1s and colesbury October 7, 2025 18:28

Merge branch 'main' into fix-issue-129069
38a3787

tom-pytel closed this Dec 20, 2025

Uh oh!

gh-129069: make list ass_slice and memory_repeat safe in free-threading#131882

gh-129069: make list ass_slice and memory_repeat safe in free-threading #131882

Uh oh!

Conversation

tom-pytel commented Mar 29, 2025• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tom-pytel commented Mar 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

python-cla-botbot commented Apr 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tom-pytel commented Aug 31, 2025

Uh oh!

mhsmith commented Sep 1, 2025

Uh oh!

bedevere-bot commented Sep 6, 2025

Uh oh!

tom-pytel commented Sep 6, 2025

Uh oh!

Uh oh!

bedevere-bot commented Sep 7, 2025

Uh oh!

colesbury commented Dec 19, 2025

Uh oh!

tom-pytel commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

tom-pytel commented Mar 29, 2025•
edited
Loading