Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
gh-126554: ctypes: Correctly handle NULL dlsym values#126555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
grgalex commented Nov 7, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
ghost commented Nov 7, 2024 • edited by ghost
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by ghost
Uh oh!
There was an error while loading. Please reload this page.
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
b436c5f to bc3ce76CompareFor dlsym(), a return value of NULL does not necessarily indicate an error [1]. Therefore, to avoid using stale (or NULL) dlerror() values, we must: 1. clear the previous error state by calling dlerror() 2. call dlsym() 3. call dlerror() If the return value of dlerror() is not NULL, an error occured. In our case, we also subjectively treat a NULL return value from dlsym() as an error, so we format a special string for that. [1]: https://man7.org/linux/man-pages/man3/dlsym.3.html Signed-off-by: Georgios Alexopoulos <[email protected]>
bc3ce76 to 3c3c29dCompareefimov-mikhail commented Nov 7, 2024
This is direct quotation from the man page you linked.
|
grgalex commented Nov 7, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Indeed, I failed to mention that at some point the man page contradicts itself, and I believe that the part you quoted is something the maintainer forgot to remove.
I say this because further up, in the DESCRIPTION, the man page states:
Additionally, further down, in the NOTES segment, the man page states: |
efimov-mikhail commented Nov 7, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Understood. IMO, it would be better if we can make an actual reproducible test case, and put this case into regular testing procedure. |
grgalex commented Nov 7, 2024
See reproducer here: #126554 (comment) |
ZeroIntensity left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good to see new contributors :)
Unfortunately, this change does need a test. If it's really, really difficult to add one, then I guess we'll figure things out, but in general, it's difficult for us to accept even tiny bugfixes without proper tests.
For something like this, as long as the test works, it's probably OK. You could do something as hacky as hard-coding a byte string into a test file and unpacking that into a .so at runtime, and just disable all the systems that don't work. But it would be preferable if you could find a library that actually triggers this problem in the wild.
Misc/NEWS.d/next/C_API/2024-11-07-20-24-58.gh-issue-126554.ri12eb.rst Outdated Show resolvedHide resolved
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
…2eb.rst Co-authored-by: Peter Bierma <[email protected]>
Co-authored-by: Peter Bierma <[email protected]>
Co-authored-by: Peter Bierma <[email protected]>
grgalex commented Nov 8, 2024
|
grgalex commented Nov 8, 2024
(2) is done, pushing a commit. I also added a comment that explicitly states why we call From a project maintaner's POV, if we have 3 instances of this phenomenon, do we add a comment disclaimer at every instance? (This is the approach I took, but I'm unsure what the best / most common practice is.) |
grgalex commented Nov 8, 2024
Proceeding with the test case (3), will let you know when ready. |
Signed-off-by: Georgios Alexopoulos <[email protected]>
ZeroIntensity commented Nov 8, 2024
Yup. In fact, that's the preferred workflow for us. Force pushing onto one commit is very much discouraged.
That looks like a problem too, yeah. Thanks! I'll create a separate issue if I can come up with a repro for the locale issue. (I'll CC you on it, if you want to author that PR as well.) |
Signed-off-by: Georgios Alexopoulos <[email protected]>
Uh oh!
There was an error while loading. Please reload this page.
grgalex commented Nov 8, 2024
Added a commit with the test case under Running on the PR branch, it passes :) On HEAD of main, it explodes (segfault). Do we have any way of handling this gracefully (maybe running a subprocess with the interpreter?), or is it the correct way to go? |
efimov-mikhail commented Nov 8, 2024
Could you please explain, what exactly are you speaking about? |
grgalex commented Nov 8, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
If we want to be pedantic, we must check all 3 code paths that invoke In this case, we only check one ( On one hand, since our amendment is the same across all 3 funcs, we should theoretically be OK. |
grgalex commented Nov 8, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
If I run the test on the main branch, the Python interpreter gets a segmentation fault. What I'm saying is that in a scenario where someone runs all tests, when the Maybe I have misunderstood something regarding the philosophy of testing. |
ZeroIntensity commented Nov 8, 2024
Run the test in a subprocess and check the return code. |
Uh oh!
There was an error while loading. Please reload this page.
Co-authored-by: Petr Viktorin <[email protected]>
Signed-off-by: Georgios Alexopoulos <[email protected]>
ZeroIntensity commented Nov 14, 2024
I'm not sure if the ASan failures are related. |
picnixz left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One PEP-7 nitpick and two de-chop suggestions. I didn't write suggestions for the PyErr_SetString usages since I wasn't sure it fits on 80 chars.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Co-authored-by: Bénédikt Tran <[email protected]>
Signed-off-by: Georgios Alexopoulos <[email protected]>
bedevere-bot commented Nov 14, 2024
🤖 New build scheduled with the buildbot fleet by @ZeroIntensity for commit ab22349 🤖 If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
ZeroIntensity commented Nov 14, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Let's try the build bots again and see if it holds up now. |
ZeroIntensity commented Nov 14, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Buildbots are all good, the one failure is a problem with the DRC API that got merged the other day, which is unrelated. I'll submit a (separate) patch to fix that. I think we're good to merge! |
…-126555) For dlsym(), a return value of NULL does not necessarily indicate an error [1]. Therefore, to avoid using stale (or NULL) dlerror() values, we must: 1. clear the previous error state by calling dlerror() 2. call dlsym() 3. call dlerror() If the return value of dlerror() is not NULL, an error occured. In ctypes we choose to treat a NULL return value from dlsym() as a "not found" error. This is the same as the fallback message we use on Windows, Cygwin or when getting/formatting the error reason fails. [1]: https://man7.org/linux/man-pages/man3/dlsym.3.html (cherry picked from commit 8717f79) Co-authored-by: George Alexopoulos <[email protected]> Signed-off-by: Georgios Alexopoulos <[email protected]> Signed-off-by: Georgios Alexopoulos <[email protected]> Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]> Co-authored-by: Petr Viktorin <[email protected]>
Sorry, @grgalex and @encukou, I could not cleanly backport this to |
GH-126861 is a backport of this pull request to the 3.13 branch. |
efimov-mikhail commented Nov 15, 2024
@grgalex, you've finished it! Congrats with your first contribution to CPython! |
grgalex commented Nov 15, 2024
Thanks for the guidance @efimov-mikhail, @ZeroIntensity, @encukou. Looking forward to collaborating again in the future! |
…) (#126861) For dlsym(), a return value of NULL does not necessarily indicate an error [1]. Therefore, to avoid using stale (or NULL) dlerror() values, we must: 1. clear the previous error state by calling dlerror() 2. call dlsym() 3. call dlerror() If the return value of dlerror() is not NULL, an error occured. In ctypes we choose to treat a NULL return value from dlsym() as a "not found" error. This is the same as the fallback message we use on Windows, Cygwin or when getting/formatting the error reason fails. [1]: https://man7.org/linux/man-pages/man3/dlsym.3.html (cherry picked from commit 8717f79) Signed-off-by: Georgios Alexopoulos <[email protected]> Co-authored-by: George Alexopoulos <[email protected]> Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]> Co-authored-by: Petr Viktorin <[email protected]>
…-126555) For dlsym(), a return value of NULL does not necessarily indicate an error [1]. Therefore, to avoid using stale (or NULL) dlerror() values, we must: 1. clear the previous error state by calling dlerror() 2. call dlsym() 3. call dlerror() If the return value of dlerror() is not NULL, an error occured. In ctypes we choose to treat a NULL return value from dlsym() as a "not found" error. This is the same as the fallback message we use on Windows, Cygwin or when getting/formatting the error reason fails. [1]: https://man7.org/linux/man-pages/man3/dlsym.3.html Signed-off-by: Georgios Alexopoulos <[email protected]> Signed-off-by: Georgios Alexopoulos <[email protected]> Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]> Co-authored-by: Petr Viktorin <[email protected]>
picnixz commented Dec 8, 2024
3.13 backports were done but not 3.12; @encukou Do you want the 3.12 backport in the end or can we close the issue with only 3.13 and 3.14 being affected? |
encukou commented Dec 9, 2024
As I said, I'm fine not backporting. If anyone disagrees, open the PR :) |
GH-127764 is a backport of this pull request to the 3.12 branch. |
…) (GH-127764) For dlsym(), a return value of NULL does not necessarily indicate an error [1]. Therefore, to avoid using stale (or NULL) dlerror() values, we must: 1. clear the previous error state by calling dlerror() 2. call dlsym() 3. call dlerror() If the return value of dlerror() is not NULL, an error occured. In ctypes we choose to treat a NULL return value from dlsym() as a "not found" error. This is the same as the fallback message we use on Windows, Cygwin or when getting/formatting the error reason fails. [1]: https://man7.org/linux/man-pages/man3/dlsym.3.html Signed-off-by: Georgios Alexopoulos <[email protected]> Signed-off-by: Georgios Alexopoulos <[email protected]> Co-authored-by: George Alexopoulos <[email protected]> Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]> Co-authored-by: Petr Viktorin <[email protected]>
…-126555) For dlsym(), a return value of NULL does not necessarily indicate an error [1]. Therefore, to avoid using stale (or NULL) dlerror() values, we must: 1. clear the previous error state by calling dlerror() 2. call dlsym() 3. call dlerror() If the return value of dlerror() is not NULL, an error occured. In ctypes we choose to treat a NULL return value from dlsym() as a "not found" error. This is the same as the fallback message we use on Windows, Cygwin or when getting/formatting the error reason fails. [1]: https://man7.org/linux/man-pages/man3/dlsym.3.html Signed-off-by: Georgios Alexopoulos <[email protected]> Signed-off-by: Georgios Alexopoulos <[email protected]> Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]> Co-authored-by: Petr Viktorin <[email protected]>
For dlsym(), a return value of NULL does not necessarily indicate an error [1].
Therefore, to avoid using stale (or NULL) dlerror() values, we must:
If the return value of dlerror() is not NULL, an error occured.
In our case of ctypes, I understand that we subjectively treat a NULL return value from dlsym() as an error, so we must format a special string for that.
[1] https://man7.org/linux/man-pages/man3/dlsym.3.html