Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
gh-135552: Clear weakrefs to types in GC after garbage finalization not before#135728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-135552: Clear weakrefs to types in GC after garbage finalization not before #135728
Uh oh!
There was an error while loading. Please reload this page.
Conversation
sergey-miryanov commented Jun 19, 2025 • edited by bedevere-app bot
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by bedevere-app bot
Uh oh!
There was an error while loading. Please reload this page.
sergey-miryanov commented Jun 19, 2025
@ZeroIntensity@sobolevn@Eclips4@efimov-mikhail@fxeqxmulfx Please take a look. |
sobolevn left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add a test case and a NEWS entry.
I think that there's a minimal repro available in the issue.
The solution seems reasonable!
Zheaoli commented Jun 20, 2025
Would you mind adding a benchmark for this commit by comment? |
sergey-miryanov commented Jun 20, 2025
ZeroIntensity commented Jun 20, 2025
Yeah, as I said, this seems to be a problem with the GIL-ful garbage collector, not the object model. Something like this would also cause problems on the free-threaded build, but FT is working perfectly. |
sergey-miryanov commented Jun 20, 2025
@ZeroIntensity The root cause of this error is the absence of subclasses for BaseNode class. Therefore, the I have the following code, and I believe the problem is common to both: As you can see, the weakref is None, but self has not been destroyed yet. I believe the garbage collector has a mechanism to handle weakrefs, but it runs too early, before |
ZeroIntensity commented Jun 20, 2025
Ah, and there's no type cache under free-threading. Interesting! |
sergey-miryanov commented Jun 21, 2025
This fails on The call to |
efimov-mikhail commented Jun 22, 2025
In this variant I have crash even with proposed fix: |
sergey-miryanov commented Jun 22, 2025
@efimov-mikhail Yeah, thanks! This is great example that proposed solution is wrong. |
python-cla-botbot commented Jun 22, 2025 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
sergey-miryanov commented Jun 23, 2025 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
On my local machine I have following results: Geometric mean - 1.01x slowerBenchmark hidden because not significant (37): 2to3, comprehensions, bench_mp_pool, bench_thread_pool, coroutines, crypto_pyaes, dask, dulwich_log, float, create_gc_cycles, gc_traversal, generators, go, hexiom, html5lib, json_dumps, json_loads, logging_format, logging_silent, mako, mdp, nbody, pathlib, pickle_list, pidigits, python_startup, python_startup_no_site, raytrace, regex_dna, regex_v8, scimark_lu, scimark_monte_carlo, scimark_sparse_mat_mult, sqlglot_transpile, tomli_loads, unpack_sequence, unpickle_list CPUCPU |
sergey-miryanov commented Jun 23, 2025
@Zheaoli@ZeroIntensity@sobolevn@Eclips4@efimov-mikhail@fxeqxmulfx Please take a look, it is ready to review. |
Misc/NEWS.d/next/Core_and_Builtins/2025-06-23-01-07-09.gh-issue-135552.eG9vzP.rst Outdated Show resolvedHide resolved
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
…e-135552.eG9vzP.rst Co-authored-by: sobolevn <mail@sobolevn.me>
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Misc/NEWS.d/next/Core_and_Builtins/2025-06-23-01-07-09.gh-issue-135552.eG9vzP.rst Outdated Show resolvedHide resolved
Uh oh!
There was an error while loading. Please reload this page.
…e-135552.eG9vzP.rst Co-authored-by: Mikhail Efimov <efimov.mikhail@gmail.com>
…iryanov/cpython into pythongh-135552-fix-gc-segfault
sergey-miryanov commented Jun 23, 2025
Updated results: Geometric mean - 1.00x fasterBenchmark hidden because not significant (56): async_generators, comprehensions, bench_mp_pool, bench_thread_pool, coverage, crypto_pyaes, dask, deepcopy, deepcopy_reduce, deepcopy_memo, deltablue, django_template, docutils, float, create_gc_cycles, genshi_text, genshi_xml, go, hexiom, html5lib, json_dumps, json_loads, logging_format, logging_silent, logging_simple, mako, nbody, nqueens, pathlib, pickle_list, pickle_pure_python, pidigits, pprint_safe_repr, pyflate, python_startup, python_startup_no_site, raytrace, regex_compile, regex_dna, regex_effbot, regex_v8, richards, richards_super, scimark_monte_carlo, scimark_sparse_mat_mult, sqlglot_normalize, sqlglot_optimize, sqlglot_transpile, sympy_expand, sympy_integrate, sympy_sum, sympy_str, telco, tomli_loads, unpack_sequence, xml_etree_generate |
efimov-mikhail left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
But IMO it will be better to provide more changes into comments in gc.c
There're some places when unreachable is used as variable name, and unreachable_types should be properly mentioned there.
| @@ -0,0 +1,4 @@ | |||
| Split the handling of weak references in the GC if both types and instances | |||
| within a generation are unreachable. This prevent the clearing of the | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| within a generation are unreachable. This prevent the clearing of the | |
| within a generation are unreachable. This prevent the clearing of the |
nascheme commented Jun 28, 2025
I'm still investigating this bug. As of now, I think the key problem is that |
sergey-miryanov commented Jun 28, 2025
Thanks!
Yeah, we understand problem the same way. This is a reason why I split handling types and other objects.
I believe you have much more experience with this than I do (one of the articles I read about Python GC was yours from early 00). So, if you have another way to fix this then it will great opportunity to learn something new :) |
sergey-miryanov commented Jul 3, 2025
Closing in favor of #136189 |
While trying to find the root cause of the linked issue, I observed that the type cache was holding an obsolete reference. Because the type cache holds borrowed references, this leads to a use-after-free error.
I'm resetting the cache in this PR. But I suspect that this may affect overall performance.