Skip to content

Conversation

@mpage
Copy link
Contributor

@mpagempage commented Sep 17, 2024

During finalization, daemon threads are forced to exit immediately:

/* Remaining daemon threads will automatically exit
when they attempt to take the GIL (ex: PyEval_RestoreThread()). */
_PyInterpreterState_SetFinalizing(tstate->interp, tstate);
_PyRuntimeState_SetFinalizing(runtime, tstate);

without returning through the call-stack normally upon acquiring the GIL:

if (_PyThreadState_MustExit(tstate)){
/* bpo-39877: If Py_Finalize() has been called and tstate is not the
thread which called Py_Finalize(), exit immediately the thread.
This code path can be reached by a daemon thread after Py_Finalize()
completes. In this case, tstate is a dangling pointer: points to
PyThreadState freed memory. */
PyThread_exit_thread();
}

Finalizers that run after this must be able to join the forcefully terminated threads.

The current implementation notified of thread completion before returning from thread_run:

_PyEvent_Notify(&handle->thread_is_exiting);

Unfortunately, this code will never execute if the thread is forced to exit during finalization. Any code that attempts to join such a thread will block indefinitely.

To fix this, keep track of daemon threads and have the runtime notify their events after they are forced to exit.

During finalization, daemon threads are force to exit immediately (without returning through the call-stack normally) upon acquiring the GIL. Finalizers that run after this must be able to join the forcefully terminated threads. The current implementation notified of thread completion before returning from `thread_run`. This code will never execute if the thread is forced to exit during finalization. Any code that attempts to join such a thread will block indefinitely. To fix this, use the old approach of notifying of thread completion when the thread state is cleared. This happens both when `thread_run` exits normally and when thread states are destroyed as part of finalization (which happens immediately after forcing daemon threads to exit, before any python code can run).
@mpagempage marked this pull request as ready for review September 17, 2024 03:12
Comment on lines +1184 to +1186
def loop():
while True:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A slight variant where loop() calls time.sleep(1) hard crashes in 3.11 and 3.12. I'm not sure what to make of that, other than this sort of behavior wasn't robust previously either. I think it's noteworthy that the change that led to the issue was from just a few days ago -- it doesn't look like it was some longstanding code that just broke now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly #87135 related.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the issue #116514 colesbury linked when I asked (I wasn't able to repro just the time.sleep variant mentioned here)

@mpage
Copy link
ContributorAuthor

Marking this as a draft while I figure out the linker errors on windows.

@mpagempage marked this pull request as draft September 18, 2024 20:44
Include threadmodule when building _freeze_module
@mpagempage marked this pull request as ready for review September 18, 2024 21:29
@mpagempage requested a review from a team as a code ownerSeptember 18, 2024 21:29
@mpage
Copy link
ContributorAuthor

@colesbury@ericsnowcurrently - Would you please have another look at this? I think it makes sense to move ThreadHandle from _threadmodule.c into pycore_pythread.h/Python/thread.c but that's a bigger change, so I went with a smaller fix here. I can do the refactor in a separate PR.

Copy link
Member

@ericsnowcurrentlyericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly LGTM

There's just the matter of where the code lives. It should mostly go in Python/thread.c. However, we have a bit of a mess in threadmodule.c already, which we needn't try to fix all in this PR. I might be okay with punting on that in favor of a follow-up PR, but I'd like the opinion of others on that first.

Comment on lines +668 to +669
llist_init(&interp->threads.daemon_handles);
llist_init(&interp->threads.non_daemon_handles);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make sense to have a _PyThread_InitThreadHandles() for this, to match _PyThread_ClearThreadHandles().

@bedevere-app
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@mpage
Copy link
ContributorAuthor

@ericsnowcurrently and I chatted in person and the consensus is that this PR is ok as-is, but we should follow it up with a refactor (only on main) to move functionality out of Modules/_threadmodule.c and into Python/thread.c.

@gpshead
Copy link
Member

Unfortunately, this code will never execute if the thread is forced to exit during finalization. Any code that attempts to join such a thread will block indefinitely.

To fix this, keep track of daemon threads and have the runtime notify their events after they are forced to exit.

FYI - #105805, if merged, is set to change this so that these threads do not exit because our C API has no way for them to do so cleanly to cause an unwind of the call stack. They'll be made to hang instead.

@encukou
Copy link
Member

And now with #130402, you get PythonFinalizationError when joining such a daemon thread (which is either sleeping in PyThread_hang_thread, or will be made to hang when it next tries to acquire the GILattach thread state).

It looks like there's nothing to do here, but, I'm not sure if someone still wants to go back to (and improve) force-killing rather than hanging.

@mpage, do you want to close the PR?

@mpagempage closed this Jun 9, 2025
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

@mpage@gpshead@encukou@colesbury@ericsnowcurrently@hauntsaninja