Skip to content

Conversation

@colesbury
Copy link
Contributor

@colesburycolesbury commented Feb 27, 2025

The PyThreadState field gains a reference count field to avoid issues with PyThreadState being a dangling pointer to freed memory. The refcount starts with a value of two: one reference is owned by the interpreter's linked list of thread states and one reference is owned by the OS thread. The reference count is decremented when the thread state is removed from the interpreter's linked list and before the OS thread calls PyThread_hang_thread(). The thread that decrements it to zero frees the PyThreadState memory.

The holds_gil field is moved out of the _status bit field, to avoid a data race where on thread calls PyThreadState_Clear(), modifying the _status bit field while the OS thread reads holds_gil when attempting to acquire the GIL.

The PyThreadState.state field now has _Py_THREAD_SHUTTING_DOWN as a possible value. This corresponds to the _PyThreadState_MustExit() check. This avoids race conditions in the free threading build when checking _PyThreadState_MustExit().

The PyThreadState field gains a reference count field to avoid issues with PyThreadState being a dangling pointer to freed memory. The refcount starts with a value of two: one reference is owned by the interpreter's linked list of thread states and one reference is owned by the OS thread. The reference count is decremented when the thread state is removed from the interpreter's linked list and before the OS thread calls `PyThread_hang_thread()`. The thread that decrements it to zero frees the `PyThreadState` memory. The `holds_gil` field is moved out of the `_status` bit field, to avoid a data race where on thread calls `PyThreadState_Clear()`, modifying the `_status` bit field while the OS thread reads `holds_gil` when attempting to acquire the GIL. The `PyThreadState.state` field now has `_Py_THREAD_SHUTTING_DOWN` as a possible value. This corresponds to the `_PyThreadState_MustExit()` check. This avoids race conditions in the free threading build when checking `_PyThreadState_MustExit()`.
Copy link
Member

@vstinnervstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Avoiding dangling pointers close a whole category of bugs. Using a reference count is a nice solution for that. I also like the fact that Py_Finalize() now sets threads state to "shutting down": it's more explicit like that.

@vstinner
Copy link
Member

By the way, this change may also fix the old crash #110052 (test_4_daemon_threads).

@colesbury
Copy link
ContributorAuthor

Yeah, I think it should fix test_4_daemon_threads as well. I'm using the repro from that issue:

./python -m test test_threading -m test_4_daemon_threads -j50 -F --fail-env-changed

And also a small patch to ceval_gil.c to make the crash more likely to occur on my machine.

On main (with the patch to ceval_gil.c), I see a crash pretty quickly after ~100 iterations on both the default and free threaded builds.

With this PR, I haven't seen any failures in 25,000 iterations (~10 minutes).

Patch to ceval_gil.c
diff --git a/Python/ceval_gil.c b/Python/ceval_gil.c index 2c1cc17b2ff..e14f1a8afa2 100644 --- a/Python/ceval_gil.c+++ b/Python/ceval_gil.c@@ -306,6 +306,8 @@ take_gil(PyThreadState *tstate) _PyThreadState_HangThread(tstate)} + usleep(10);+ assert(_PyThreadState_CheckConsistency(tstate)); PyInterpreterState *interp = tstate->interp; struct _gil_runtime_state *gil = interp->ceval.gil;

@colesburycolesbury merged commit 052cb71 into python:mainMar 6, 2025
42 checks passed
@colesburycolesbury deleted the gh-124878-tstate-refcount branch March 6, 2025 15:38
@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot s390x RHEL8 3.x (tier-3) has failed when building commit 052cb71.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/509/builds/8563) and take a look at the build logs.
  4. Check if the failure is related to this commit (052cb71) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/509/builds/8563

Failed tests:

  • test.test_multiprocessing_spawn.test_manager

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last): File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/managers.py", line 266, in serve_clientraise ke File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/managers.py", line 260, in serve_client obj, exposed, gettypeid = id_to_obj[ident] ~~~~~~~~~^^^^^^^KeyError: '3ff9859f260' --------------------------------------------------------------------------- Timeout (0:20:00)! Thread 0x000003ffb74f7270 (most recent call first): File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/popen_fork.py", line 28 in poll File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/popen_fork.py", line 44 in wait File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/process.py", line 149 in join File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/unittest/case.py", line 623 in _callCleanup File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/unittest/case.py", line 697 in doCleanups File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/unittest/case.py", line 664 in run File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/unittest/case.py", line 716 in __call__ File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/unittest/suite.py", line 122 in run File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/unittest/suite.py", line 84 in __call__ File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/unittest/suite.py", line 122 in run File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/unittest/suite.py", line 84 in __call__ File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/unittest/runner.py", line 259 in run File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/libregrtest/single.py", line 84 in _run_suite File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/libregrtest/single.py", line 42 in run_unittest File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/libregrtest/single.py", line 162 in test_func File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/libregrtest/single.py", line 118 in regrtest_runner File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/libregrtest/single.py", line 165 in _load_run_test File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/libregrtest/single.py", line 210 in _runtest_env_changed_exc File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/libregrtest/single.py", line 319 in _runtest File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/libregrtest/single.py", line 348 in run_single_test File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/libregrtest/worker.py", line 92 in worker_process File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/libregrtest/worker.py", line 127 in main File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/libregrtest/worker.py", line 131 in <module> File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/runpy.py", line 88 in _run_code File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/runpy.py", line 198 in _run_module_as_main Traceback (most recent call last): File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/managers.py", line 266, in serve_clientraise ke File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/managers.py", line 260, in serve_client obj, exposed, gettypeid = id_to_obj[ident] ~~~~~~~~~^^^^^^^KeyError: '3ff985fb340' --------------------------------------------------------------------------- k Traceback (most recent call last): File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/process.py", line 313, in _bootstrapself.run() ~~~~~~~~^^ File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/process.py", line 108, in runself._target(*self._args, **self._kwargs) ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/_test_multiprocessing.py", line 1625, in f woken.release() ~~~~~~~~~~~~~^^ File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/managers.py", line 1067, in releasereturnself._callmethod('release') ~~~~~~~~~~~~~~~~^^^^^^^^^^^ File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/managers.py", line 848, in _callmethodraise convert_to_error(kind, result) multiprocessing.managers.RemoteError: --------------------------------------------------------------------------- Traceback (most recent call last): File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/managers.py", line 264, in serve_clientself.id_to_local_proxy_obj[ident] ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^KeyError: '3ff89835aa0' Traceback (most recent call last): File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/managers.py", line 266, in serve_clientraise ke File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/managers.py", line 260, in serve_client obj, exposed, gettypeid = id_to_obj[ident] ~~~~~~~~~^^^^^^^KeyError: '3ff89835aa0' --------------------------------------------------------------------------- Timeout (0:20:00)! Thread 0x000003ff9f377270 (most recent call first): File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/popen_fork.py", line 28 in poll File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/popen_fork.py", line 44 in wait File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/process.py", line 149 in join File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/unittest/case.py", line 623 in _callCleanup File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/unittest/case.py", line 697 in doCleanups File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/unittest/case.py", line 664 in run File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/unittest/case.py", line 716 in __call__ File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/unittest/suite.py", line 122 in run File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/unittest/suite.py", line 84 in __call__ File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/unittest/suite.py", line 122 in run File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/unittest/suite.py", line 84 in __call__ File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/unittest/runner.py", line 259 in run File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/libregrtest/single.py", line 84 in _run_suite File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/libregrtest/single.py", line 42 in run_unittest File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/libregrtest/single.py", line 162 in test_func File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/libregrtest/single.py", line 118 in regrtest_runner File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/libregrtest/single.py", line 165 in _load_run_test File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/libregrtest/single.py", line 210 in _runtest_env_changed_exc File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/libregrtest/single.py", line 319 in _runtest File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/libregrtest/single.py", line 348 in run_single_test File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/libregrtest/worker.py", line 92 in worker_process File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/libregrtest/worker.py", line 127 in main File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/libregrtest/worker.py", line 131 in <module> File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/runpy.py", line 88 in _run_code File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/runpy.py", line 198 in _run_module_as_main Traceback (most recent call last): File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/process.py", line 313, in _bootstrapself.run() ~~~~~~~~^^ File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/process.py", line 108, in runself._target(*self._args, **self._kwargs) ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/_test_multiprocessing.py", line 1625, in f woken.release() ~~~~~~~~~~~~~^^ File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/managers.py", line 1067, in releasereturnself._callmethod('release') ~~~~~~~~~~~~~~~~^^^^^^^^^^^ File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/managers.py", line 848, in _callmethodraise convert_to_error(kind, result) multiprocessing.managers.RemoteError: --------------------------------------------------------------------------- Traceback (most recent call last): File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/managers.py", line 264, in serve_clientself.id_to_local_proxy_obj[ident] ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^KeyError: '3ff985fb340' Traceback (most recent call last): File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/process.py", line 313, in _bootstrapself.run() ~~~~~~~~^^ File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/process.py", line 108, in runself._target(*self._args, **self._kwargs) ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/test/_test_multiprocessing.py", line 1625, in f woken.release() ~~~~~~~~~~~~~^^ File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/managers.py", line 1067, in releasereturnself._callmethod('release') ~~~~~~~~~~~~~~~~^^^^^^^^^^^ File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/managers.py", line 848, in _callmethodraise convert_to_error(kind, result) multiprocessing.managers.RemoteError: --------------------------------------------------------------------------- Traceback (most recent call last): File "/home/buildbot/buildarea/3.x.cstratak-rhel8-s390x/build/Lib/multiprocessing/managers.py", line 264, in serve_clientself.id_to_local_proxy_obj[ident] ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^KeyError: '3ff9859f260'

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

@colesbury@vstinner@bedevere-bot