Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
gh-106931: Intern Statically Allocated Strings Globally#107272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-106931: Intern Statically Allocated Strings Globally #107272
Uh oh!
There was an error while loading. Please reload this page.
Conversation
ericsnowcurrently commented Jul 25, 2023 • edited by bedevere-bot
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by bedevere-bot
Uh oh!
There was an error while loading. Please reload this page.
Objects/unicodeobject.c Outdated
| { | ||
| returnPyObject_Length(get_interned_dict(_PyInterpreterState_GET())); | ||
| PyObject*dict=get_interned_dict(_PyInterpreterState_GET()); | ||
| return_Py_hashtable_len(INTERNED_STRINGS) +PyObject_Length(dict); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return_Py_hashtable_len(INTERNED_STRINGS) +PyObject_Length(dict); | |
| return_Py_hashtable_len(INTERNED_STRINGS) +PyDict_GET_SIZE(dict); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Yhg1s commented Jul 27, 2023
FWIW, I wouldn't be too sad if we didn't do this in 3.12 (considering the current state is that strings just aren't interned in isolated subinterpreters), but if you feel confident enough to backport this before next monday, I'm okay with that. |
ericsnowcurrently commented Jul 27, 2023
Honestly, I was a little on the fence myself. My rationale is IMHO reasonable but clearly not the strongest: leaving a performance penalty, however small, because of subinterpreters is frustrating. If the solution were any more complex I'd probably lean the other way.
Regardless of what I said above, I'll stew on it a bit more. Thanks for the clarity. |
miss-islington commented Jul 27, 2023
Thanks @ericsnowcurrently for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12. |
…gh-107272) We tried this before with a dict and for all interned strings. That ran into problems due to interpreter isolation. However, exclusively using a per-interpreter cache caused some inconsistency that can eliminate the benefit of interning. Here we circle back to using a global cache, but only for statically allocated strings. We also use a more-basic _Py_hashtable_t for that global cache instead of a dict. Ideally we would only have the global cache, but the optional isolation of each interpreter's allocator means that a non-static string object must not outlive its interpreter. Thus we would have to store a copy of each such interned string in the global cache, tied to the main interpreter. (cherry picked from commit b72947a) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
bedevere-bot commented Jul 27, 2023
GH-107358 is a backport of this pull request to the 3.12 branch. |
bedevere-bot commented Jul 27, 2023
|
bedevere-bot commented Jul 27, 2023
|
bedevere-bot commented Jul 27, 2023
|
ericsnowcurrently commented Jul 27, 2023
See gh-107362. |
…ythongh-107272) We tried this before with a dict and for all interned strings. That ran into problems due to interpreter isolation. However, exclusively using a per-interpreter cache caused some inconsistency that can eliminate the benefit of interning. Here we circle back to using a global cache, but only for statically allocated strings. We also use a more-basic _Py_hashtable_t for that global cache instead of a dict. Ideally we would only have the global cache, but the optional isolation of each interpreter's allocator means that a non-static string object must not outlive its interpreter. Thus we would have to store a copy of each such interned string in the global cache, tied to the main interpreter. (cherry-picked from commit b72947a)
…ythongh-107272) We tried this before with a dict and for all interned strings. That ran into problems due to interpreter isolation. However, exclusively using a per-interpreter cache caused some inconsistency that can eliminate the benefit of interning. Here we circle back to using a global cache, but only for statically allocated strings. We also use a more-basic _Py_hashtable_t for that global cache instead of a dict. Ideally we would only have the global cache, but the optional isolation of each interpreter's allocator means that a non-static string object must not outlive its interpreter. Thus we would have to store a copy of each such interned string in the global cache, tied to the main interpreter. (cherry-picked from commit b72947a)
GH-110713 is a backport of this pull request to the 3.12 branch. |
…7272) (gh-110713) We tried this before with a dict and for all interned strings. That ran into problems due to interpreter isolation. However, exclusively using a per-interpreter cache caused some inconsistency that can eliminate the benefit of interning. Here we circle back to using a global cache, but only for statically allocated strings. We also use a more-basic _Py_hashtable_t for that global cache instead of a dict. Ideally we would only have the global cache, but the optional isolation of each interpreter's allocator means that a non-static string object must not outlive its interpreter. Thus we would have to store a copy of each such interned string in the global cache, tied to the main interpreter. (cherry-picked from commit b72947a)
We tried this before with a dict and for all interned strings. That ran into problems due to interpreter isolation. However, exclusively using a per-interpreter cache caused some inconsistency that can eliminate the benefit of interning. Here we circle back to using a global cache, but only for statically allocated strings. We also use a more-basic
_Py_hashtable_tfor that global cache instead of a dict.Ideally we would only have the global cache, but the optional isolation of each interpreter's allocator means that a non-static string object must not outlive its interpreter. Thus we would have to store a copy of each such interned string in the global cache, tied to the main interpreter.