Skip to content

Deadlock in threaded application when using sys._current_frames#106883

@mmohrhard

Description

@mmohrhard

Bug report

When using sys._current_frames in a threaded application there is a high risk of a deadlock in the interpreter. Below is my analysis from one such hang. We have never seen this problem with Python 3.8 but experience it regularly with Python 3.11. I know that the sys._current_frames documentation warns about the function and one solution to this bug report could be to document that the function is not thread safe.

We have a huge number of threads in this application but I'll limit the output to the two threads that cause the deadlock.

We have a thread calling sys._current_frames with a Python stack trace:

Thread 118868 (idle): "Profile" _gc_callback (distributed/utils_perf.py:184) _watch (distributed/profile.py:261) run (threading.py:975) _bootstrap_inner (threading.py:1038) _bootstrap (threading.py:995) 

and a corresponding C stack trace:

Thread 325 (Thread 0x7f8b992d6700 (LWP 118868) ""): #0 0x00007f8ce83adde2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00000000004eae32 in PyCOND_TIMEDWAIT (us=<optimized out>, mut=<optimized out>, cond=0x879128 <_PyRuntime+392>) at /usr/local/src/conda/python-3.11.4/Python/condvar.h:73 #2 take_gil (tstate=0x7f8b992d54e0) at /usr/local/src/conda/python-3.11.4/Python/ceval_gil.h:250 #3 0x00000000005134bc in eval_frame_handle_pending (tstate=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:1211 #4 _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:1754 #5 0x000000000055451f in _PyEval_EvalFrame (throwflag=0, frame=0x7f8bb645e248, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_ceval.h:73 #6 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=0x0, func=0x7f8cdc1d2340, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:6439 #7 _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, stack=<optimized out>, func=<function at remote 0x7f8cdc1d2340>) at /usr/local/src/conda/python-3.11.4/Objects/call.c:393 #8 _PyObject_VectorcallTstate (tstate=0x5c60c20, callable=<function at remote 0x7f8cdc1d2340>, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_call.h:92#9 0x0000000000554172 in method_vectorcall (method=method@entry=<method at remote 0x7f8bce070c00>, args=args@entry=0x7f8b992d57d0, nargsf=<optimized out>, kwnames=kwnames@entry=0x0) at /usr/local/src/conda/python-3.11.4/Objects/classobject.c:89 #10 0x00000000004fe378 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f8b992d57d0, callable=<method at remote 0x7f8bce070c00>, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_call.h:92 #11 _PyObject_CallFunctionVa (is_size_t=0, va=0x7f8b992d57b0, format=<optimized out>, callable=<method at remote 0x7f8bce070c00>, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Objects/call.c:536 #12 PyObject_CallFunction (callable=<method at remote 0x7f8bce070c00>, format=<optimized out>) at /usr/local/src/conda/python-3.11.4/Objects/call.c:558 #13 0x00000000005c23d2 in invoke_gc_callback (phase=0x654129 "stop", generation=<optimized out>, collected=<optimized out>, uncollectable=<optimized out>, tstate=<optimized out>, tstate=<optimized out>) at /usr/local/src/conda/python-3.11.4/Modules/gcmodule.c:1378 #14 0x00000000005c230a in gc_collect_with_callback (tstate=0x5c60c20, generation=0) at /usr/local/src/conda/python-3.11.4/Modules/gcmodule.c:1401 #15 0x000000000053b418 in _PyObject_GC_Link (op=set()) at /usr/local/src/conda/python-3.11.4/Modules/gcmodule.c:2270 #16 gc_alloc (presize=<optimized out>, basicsize=<optimized out>) at /usr/local/src/conda/python-3.11.4/Modules/gcmodule.c:2290 #17 _PyObject_GC_NewVar (nitems=<optimized out>, tp=0x869fe0 <PyFrame_Type>) at /usr/local/src/conda/python-3.11.4/Modules/gcmodule.c:2318 #18 _PyFrame_New_NoTrack (code=<optimized out>, code=<optimized out>) at /usr/local/src/conda/python-3.11.4/Objects/frameobject.c:1024 #19 _PyFrame_MakeAndSetFrameObject (frame=0x7f8b90e95a68) at /usr/local/src/conda/python-3.11.4/Python/frame.c:33 #20 0x00000000004e19df in _PyFrame_GetFrameObject (frame=0x7f8b90e95a68) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_frame.h:171 #21 _PyThread_CurrentFrames () at /usr/local/src/conda/python-3.11.4/Python/pystate.c:1420 #22 0x00000000005061c1 in cfunction_vectorcall_NOARGS (func=<built-in method _current_frames of module object at remote 0x7f8ce8746ca0>, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.4/Include/cpython/methodobject.h:52 #23 0x000000000051bde1 in _PyObject_VectorcallTstate (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<built-in method _current_frames of module object at remote 0x7f8ce8746ca0>, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_call.h:92 #24 PyObject_Vectorcall (callable=<built-in method _current_frames of module object at remote 0x7f8ce8746ca0>, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.4/Objects/call.c:299 #25 0x000000000050ed83 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:4774 #26 0x0000000000534f13 in _PyEval_EvalFrame (throwflag=0, frame=0x7f8bb645e188, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_ceval.h:73 #27 _PyEval_Vector (kwnames=<optimized out>, argcount=0, args=0x7f8bce07e978, locals=0x0, func=<optimized out>, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:6439 #28 _PyFunction_Vectorcall (func=<optimized out>, stack=0x7f8bce07e978, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.4/Objects/call.c:393 #29 0x000000000053ef0d in _PyVectorcall_Call (kwargs=<optimized out>, tuple=<optimized out>, callable=<function at remote 0x7f8cdc121800>, func=0x534da0 <_PyFunction_Vectorcall>, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Objects/call.c:257 #30 _PyObject_Call (kwargs=<optimized out>, args=<optimized out>, callable=<function at remote 0x7f8cdc121800>, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Objects/call.c:328 #31 PyObject_Call (callable=<function at remote 0x7f8cdc121800>, args=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.11.4/Objects/call.c:355 #32 0x000000000051290a in do_call_core (use_tracing=<optimized out>, kwdict={'thread_id': 140243172636480, 'interval': '10ms', 'cycle': '1000ms', 'log': <collections.deque at remote 0x7f8bc008b1f0>, 'omit': ('profile.py', 'selectors.py'), 'stop': <function at remote 0x7f8bc0058220>}, callargs=(), func=<function at remote 0x7f8cdc121800>, tstate=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:7357 #33 _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:5381 #34 0x000000000055451f in _PyEval_EvalFrame (throwflag=0, frame=0x7f8bb645e020, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_ceval.h:73 #35 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=0x0, func=0x7f8ce0e41b20, tstate=0x5c60c20) at /usr/local/src/conda/python-3.11.4/Python/ceval.c:6439 #36 _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, stack=<optimized out>, func=<function at remote 0x7f8ce0e41b20>) at /usr/local/src/conda/python-3.11.4/Objects/call.c:393 #37 _PyObject_VectorcallTstate (tstate=0x5c60c20, callable=<function at remote 0x7f8ce0e41b20>, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.4/Include/internal/pycore_call.h:92 #38 0x00000000005541f8 in method_vectorcall (method=<optimized out>, args=0x8875d0 <_PyRuntime+58928>, nargsf=<optimized out>, kwnames=0x0) at /usr/local/src/conda/python-3.11.4/Objects/classobject.c:67 #39 0x0000000000627f6f in thread_run (boot_raw=0x7f8cc9c987b0) at /usr/local/src/conda/python-3.11.4/Modules/_threadmodule.c:1092 #40 0x00000000005f63d4 in pythread_wrapper (arg=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/thread_pthread.h:241 #41 0x00007f8ce83a9ea5 in start_thread () from /lib64/libpthread.so.0 #42 0x00007f8ce79c9b0d in clone () from /lib64/libc.so.6 

So, this thread does not hold the GIL but seems to hold the runtime->interpreters.mutex

We have another thread that seems to hold the GIL:

Thread 388 (Thread 0x7f8b62f7f700 (LWP 120217) ""): #0 0x00007f8ce83afb3b in do_futex_wait.constprop () from /lib64/libpthread.so.0 #1 0x00007f8ce83afbcf in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0 #2 0x00007f8ce83afc6b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0 #3 0x00000000004ea969 in PyThread_acquire_lock_timed (intr_flag=<optimized out>, microseconds=<optimized out>, lock=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/thread_pthread.h:497 #4 PyThread_acquire_lock (lock=0x18bd040, waitflag=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/thread_pthread.h:751 #5 0x00000000005f7266 in tstate_delete_common (tstate=0x7f8cc417c250, gilstate=0x8791d8 <_PyRuntime+568>) at /usr/local/src/conda/python-3.11.4/Python/pystate.c:1080 #6 0x00000000005f6d93 in _PyThreadState_DeleteCurrent (tstate=0x7f8cc417c250) at /usr/local/src/conda/python-3.11.4/Python/pystate.c:1132 #7 0x00000000005f63d4 in pythread_wrapper (arg=<optimized out>) at /usr/local/src/conda/python-3.11.4/Python/thread_pthread.h:241 #8 0x00007f8ce83a9ea5 in start_thread () from /lib64/libpthread.so.0 #9 0x00007f8ce79c9b0d in clone () from /lib64/libc.so.6 

See the native_thread_id in the following gdb output.

(gdb) p *PyThreadState_Get() $10 ={prev = 0x7f8cc4269680, next = 0x7f8c94483340, interp = 0x8875d8 <_PyRuntime+58936>, _initialized = 1, _static = 0, recursion_remaining = 1000, recursion_limit = 1000, recursion_headroom = 0, tracing = 0, tracing_what = 0, cframe = 0x7f8cc417c3a0, c_profilefunc = 0x0, c_tracefunc = 0x0, c_profileobj = 0x0, c_traceobj = 0x0, curexc_type = 0x0, curexc_value = 0x0, curexc_traceback = 0x0, exc_info = 0x7f8cc417c390, dict = 0x0, gilstate_counter = 1, async_exc = 0x0, thread_id = 140236637599488, native_thread_id = 120217, trash_delete_nesting = 0, trash_delete_later = 0x0, on_delete = 0x628f10 <release_sentinel>, on_delete_data = 0x7f8b974b1ee0, coroutine_origin_tracking_depth = 0, async_gen_firstiter = 0x0, async_gen_finalizer = 0x0, context = 0x0, context_ver = 1, id = 1085, trace_info ={code = 0x0, bounds ={ar_start = 0, ar_end = 0, ar_line = 0, opaque ={computed_line = 0, lo_next = 0x0, limit = 0x0}}}, datastack_chunk = 0x7f8b99522000, datastack_top = 0x7f8b99522020, datastack_limit = 0x7f8b99526000, exc_state ={exc_value = 0x0, previous_item = 0x0}, root_cframe ={use_tracing = 0 '\000', current_frame = 0x0, previous = 0x0}} 

Your environment

Python 3.11.4 on a CentOS 7 machine with conda-forge binaries

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.11only security fixesinterpreter-core(Objects, Python, Grammar, and Parser dirs)type-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions