Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 33.9k
Description
Bug report
With a ProcessPoolExecutor, after submitting and quickly canceling a future, a call to shutdown(wait=True) would hang indefinitely.
This happens pretty much on all platforms and all recent Python versions.
Here is a minimal reproduction:
importconcurrent.futuresppe=concurrent.futures.ProcessPoolExecutor(1) ppe.submit(int).result() ppe.submit(int).cancel() ppe.shutdown(wait=True)The first submission gets the executor going and creates its internal queue_management_thread.
The second submission appears to get that thread to loop, enter a wait state, and never receive a wakeup event.
Introducing a tiny sleep between the second submit and its cancel request makes the issue disappear. From my initial observation it looks like something in the way the queue_management_worker internal loop is structured doesn't handle this edge case well.
Shutting down with wait=False would return immediately as expected, but the queue_management_thread would then die with an unhandled OSError: handle is closed exception.
Environment
- Discovered on macOS-12.2.1 with cpython 3.8.5.
- Reproduced in Ubuntu and Windows (x64) as well, and in cpython versions 3.7 to 3.11.0-beta.3.
- Reproduced in pypy3.8 as well, but not consistently. Seen for example in Ubuntu with Python 3.8.13 (PyPy 7.3.9).
Additional info
When tested with pytest-timeout under Ubuntu and cpython 3.8.13, these are the tracebacks at the moment of timing out:
Details
_____________________________________ test _____________________________________ @pytest.mark.timeout(10) deftest(): ppe = concurrent.futures.ProcessPoolExecutor(1) ppe.submit(int).result() ppe.submit(int).cancel() > ppe.shutdown(wait=True) test_reproduce_python_bug.py:14: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/concurrent/futures/process.py:686: in shutdown self._queue_management_thread.join() /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py:1011: in join self._wait_for_tstate_lock() _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <Thread(QueueManagerThread, started daemon 140003176535808)> block = True, timeout = -1 def_wait_for_tstate_lock(self, block=True, timeout=-1): # Issue #18808: wait for the thread state to be gone.# At the end of the thread's life, after all knowledge of the thread# is removed from C data structures, C code releases our _tstate_lock.# This method passes its arguments to _tstate_lock.acquire().# If the lock is acquired, the C code is done, and self._stop() is# called. That sets ._is_stopped to True, and ._tstate_lock to None. lock =self._tstate_lock if lock isNone: # already determined that the C code is doneassertself._is_stopped > elif lock.acquire(block, timeout): E Failed: Timeout >10.0s /opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py:1027: Failed ----------------------------- Captured stderr call ----------------------------- +++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++ ~~~~~~~~~~~~~~~~~ Stack of QueueFeederThread (140003159754496) ~~~~~~~~~~~~~~~~~ File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line 890, in _bootstrapself._bootstrap_inner() File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line 932, in _bootstrap_innerself.run() File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line 870, in runself._target(*self._args, **self._kwargs) File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/multiprocessing/queues.py", line 227, in _feed nwait() File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line 302, in wait waiter.acquire() ~~~~~~~~~~~~~~~~ Stack of QueueManagerThread (140003176535808) ~~~~~~~~~~~~~~~~~ File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line 890, in _bootstrapself._bootstrap_inner() File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line 932, in _bootstrap_innerself.run() File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line 870, in runself._target(*self._args, **self._kwargs) File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/concurrent/futures/process.py", line 362, in _queue_management_worker ready = mp.connection.wait(readers + worker_sentinels) File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/multiprocessing/connection.py", line 931, in wait ready = selector.select(timeout) File "/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/selectors.py", line 415, in select fd_event_list =self._selector.poll(timeout) +++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++Tracebacks in PyPy are similar on the concurrent.futures.process level. Tracebacks in Windows are different in the lower-level areas, but again similar on the concurrent.futures.process level.
Linked PRs:
Linked PRs
Metadata
Metadata
Assignees
Labels
Projects
Status