Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
bpo-38323, asyncio: Fix MultiLoopChildWatcher race condition#26536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
vstinner commented Jun 4, 2021 • edited by bedevere-bot
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by bedevere-bot
Uh oh!
There was an error while loading. Please reload this page.
Fix a race condition in asyncio MultiLoopChildWatcher: call the loop add_signal_handler() rather than calling directly signal.signal(), so signal.set_wakeup_fd() is used to wake up the event loop when the SIGCHLD signal is received. Previously, the event loop could block forever if the signal was received while the loop was idle. The signal was only proceed when another event woke up the event loop.
cjerdonek commented Jun 4, 2021
@vstinner This removes the purpose of having a separate |
vstinner commented Jun 4, 2021
Without this PR, test_asyncio is less than 1 minute on my laptop (8 logical CPUs, 4 physical CPUs) with the command: With this PR, the test no longer hangs. I ran it for 30 minutes (8332 runs). I marked this PR as a draft since this PR changes the MultiLoopChildWatcher design. Currently, MultiLoopChildWatcher documentation says:
With this change, MultiLoopChildWatcher looks like a clone of SafeChildWatcher. IMHO MultiLoopChildWatcher is broken by design. You must connect the signal handler to an event loop to be able to wake up the event loop when a signal is received, using signal.set_wakeup_fd(). At least, test_cancel_make_subprocess_transport_exec() is wrong: it expects that the event loop will be awaken when child process completes, wheras SIGCHLD currently cannot wake up the event loop, since nothing links the signal handler and the event loop. Sometimes, the Python signal handler is called by luck only because the signal is received while the event loop has pending events which wake up the event loop which indirectly calls the Python signal handler in Moreover, as I already wrote 2 years ago, MultiLoopChildWatcher is incompatible with the PEP 475 design: https://bugs.python.org/issue38323#msg355159 A signal handler cannot "wake up" Python code if it doesn't raise an exception. In short, I suggest to remove MultiLoopChildWatcher, rather than trying to fix it. @1st1@asvetlov@aeros@cjerdonek: What do you think? |
pablogsal commented Jun 4, 2021
Having a race condition in the test suite is raising so many false positives and is just a pain for automated systems and people watching the buildbots. We must prioritize fixing this issue. |
cjerdonek commented Jun 4, 2021 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
I'm okay with removing it, but I think Andrew should weigh in since I think the class was his idea / implementation. One approach would be to deprecate the class and skip all of its tests. That would address the buildbot issue and without changing the behavior of |
vstinner commented Jun 4, 2021
This bug affects 3.9 and 3.10 branches. I don't think that we can remove a class in a stable branche. I propose to merge this fix (or a similar fix) in all branches, deprecate the class, and only remove it later (2 releases later). If we fix the class, it becomes less urgent to remove it. |
pablogsal commented Jun 4, 2021
I am confused, why is this PR backwards compatible if this is now adding the requirement that "A watcher that requires running loop in the main thread" while before that was not a requirement? |
vstinner commented Jun 4, 2021
I'm no longer working on/using asyncio, I cannot say what is the best trade-off:
We can take a different decisions depending on the branch. (C) is acceptable for 3.9 branch. (A) and (B) can be considered for 3.10 since 3.10.0 final is not released yet. I don't recall if it's important that the event loop runs in the main thread, or if it works if the event loop runs in another thread. Does someone know? The answer matters a lot :-) |
cjerdonek commented Jun 4, 2021 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
I agree. In my (edited) comment above, I suggested the following approach:
|
pablogsal commented Jun 4, 2021
Well, meanwhile we discuss this issue, I am going to open a PR to just remove the test so we can backport it |
pablogsal commented Jun 4, 2021
Apparently this is not enough, other tests have the same problem: |
pablogsal commented Jun 4, 2021
Seems that we either need to remove all tests or proceed with VIctor's PR |
cjerdonek commented Jun 4, 2021
Correct, you need to skip or remove all the tests of that class. Can you skip them without removing them? |
cjerdonek commented Jun 4, 2021
You can skip classes, FYI: https://docs.python.org/3/library/unittest.html#skipping-tests-and-expected-failures |
pablogsal commented Jun 4, 2021
Right, what I was saying is that I was more comfortable skipping/removing one test, but removing all of them sounds quite suboptinmal. |
pablogsal commented Jun 4, 2021 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Ok, I opened #26542 for the skip of the class. Let's go with that meanwhile we decide what to do in general, we can reactivate the tests if we decide to fix/deprecate the class. |
pablogsal commented Jun 4, 2021
cjerdonek left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest converting this PR into a draft that deprecates MultiLoopChildWatcher as I think that is something more likely to be approved, or else make a separate PR that does that.
shreyanavigyan commented Jun 6, 2021
How about installing the handler for SIGCHLD for all event loops? I've implemented that in my fork and running,
works really well. The patch does the following -
I've committed the changes in my fork - shreyanavigyan@90c0e81. I've prepared the patch by keeping this phrase in mind - "We have to support multiple event loops. That's why MultiLoopChildWatcher is used instead of SafeChildWatcher". I don't know if there's any other thing I might be missing. |
cjerdonek commented Jun 6, 2021
@shreyanavigyan I don't think what you've done will work for a few different reasons. For example, you are changing |
cjerdonek commented Jun 6, 2021
@shreyanavigyan PS - if you have code to suggest, it's better to file a separate PR and discuss it there. |
shreyanavigyan commented Jun 7, 2021
@cjerdonek I've opened a new PR as you suggested - #26574 |
vstinner commented Jun 29, 2021
https://bugs.python.org/issue38323 issue was fixed by skipping the test (GH-26542), I close my PR. |
Fix a race condition in asyncio MultiLoopChildWatcher: call the loop
add_signal_handler() rather than calling directly signal.signal(), so
signal.set_wakeup_fd() is used to wake up the event loop when the
SIGCHLD signal is received. Previously, the event loop could block
forever if the signal was received while the loop was idle. The
signal was only proceed when another event woke up the event loop.
https://bugs.python.org/issue38323