Skip to content

Conversation

@nascheme
Copy link
Member

@naschemenascheme commented Dec 11, 2024

  • Fix thread safety issues with specialized opcodes (STORE_ATTR_INSTANCE_VALUE, STORE_ATTR_SLOT, STORE_ATTR_WITH_HINT). Need a combination of locks and atomics to be safe.
  • Fix thread safety issues with _Py_Specialize_StoreAttr . Avoid using borrowed references. Save and store the tp_version_tag from the beginning of the specialization process since it might change. Use helper functions to update opcode.
  • Add unit tests to ensure specialization is happening in free-threaded builds. Add addtional tests to try to trigger data races.

@naschemenascheme changed the title gh-115999: Enable specialization of STORE_ATTR free-threaded builds.gh-115999: Specialize STORE_ATTR in free-threaded builds.Dec 11, 2024
@naschemenascheme marked this pull request as ready for review December 11, 2024 21:54
@naschemenascheme requested a review from mpageDecember 11, 2024 21:54
Copy link
Contributor

@mpagempage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on! This looks like a regression on the default build. I haven't had a chance to dig into it, but I suspect it might either be due to the check that Sam flagged in _STORE_ATTR_INSTANCE_VALUE or the change to when we read the type version in _Py_Specialize_StoreAttr. It looks like the richards benchmark is the most heavily affected, so that might be a good isolated benchmark to use for debugging.

@nascheme
Copy link
MemberAuthor

nascheme commented Dec 13, 2024

Thanks for taking this on! This looks like a regression on the default build. I haven't had a chance to dig into it, but I suspect it might either be due to the check that Sam flagged in _STORE_ATTR_INSTANCE_VALUE or the change to when we read the type version in _Py_Specialize_StoreAttr. It looks like the richards benchmark is the most heavily affected, so that might be a good isolated benchmark to use for debugging.

Based on my benchmarking my second commit, the regression with richards is gone. Likely it was the fact that STORE_ATTR_INSTANCE_VALUE was not actually working.

Regarding the tp_version_tag not getting set due to type_get_version() being hoisted, fixing it as you suggest by using _PyType_LookupRefAndVersion is a bit complex to do. So, I deferred doing that for now. I'll look at it again.

@nascheme
Copy link
MemberAuthor

Regarding the tp_version_tag not getting set due to type_get_version() being hoisted, fixing it as you suggest by using _PyType_LookupRefAndVersion is a bit complex to do. So, I deferred doing that for now. I'll look at it again.

This was actually not hard to fix. I was thinking that _PyType_LookupRefAndVersion would not set the tag in the case the lookup fails. However, that's not the case.

@naschemenaschemeforce-pushed the gh-115999-specialize-store-attr branch from f920dcd to 4c484abCompareDecember 13, 2024 19:17
@nascheme
Copy link
MemberAuthor

I rebased on main to resolve merge conflicts and force pushed.

Copy link
Contributor

@mpagempage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of small comments inline, but LGTM overall.

* Fix locking for `STORE_ATTR_INSTANCE_VALUE`. Create `_GUARD_TYPE_VERSION_AND_LOCK` so that object stays locked and `tp_version_tag` cannot change. Fix inverted logic bug that caused erroneous deopt. * Fix locking for `_STORE_ATTR_WITH_HINT`. Double check that `_PyObject_GetManagedDict()` hasn't disappeared since we locked the dict. - Pass `tp_version_tag` to `specialize_dict_access()`, ensuring the version we store on the cache is the correct one (in case of it changing during the specalize analysis). - Split `analyze_descriptor` into `analyze_descriptor_load` and `analyze_descriptor_store` since those don't share much logic. Add `descriptor_is_class` helper function. - In `specialize_dict_access`, double check `_PyObject_GetManagedDict()` in case we race and dict was materialized before the lock.
If the type is new and a version tag hasn't yet been assigned, we would fail to specialize it. Use `_PyType_LookupRefAndVersion()` instead of `type_get_version()`, which will assign a version.
Use provided value of `tp_version` to store in cache.
This also fixes the case if the dict is replaced with a different one.
The function is only used in with-GIL builds.
For `specialize_dict_access_inline()`, we need to lock the keys object. * Add `_PyDictKeys_StringLookupSplit` which does required locking and use in place of `_PyDictKeys_StringLookup`. * Change `_PyObject_TryGetInstanceAttribute` to use that function in the case of split keys. * Add `unicodekeys_lookup_split` helper which allows code sharing between `_Py_dict_lookup` and `_PyDictKeys_StringLookupSplit`.
@naschemenaschemeforce-pushed the gh-115999-specialize-store-attr branch from 699f4e9 to 6bf9016CompareDecember 18, 2024 06:16
Copy link
Contributor

@mpagempage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just a couple of small comments inline.

Copy link
Contributor

@mpagempage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Benchmark results:

@naschemenascheme merged commit 1b15c89 into python:mainDec 19, 2024
55 checks passed
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Dec 23, 2024
…thongh-127838) * Add `_PyDictKeys_StringLookupSplit` which does locking on dict keys and use in place of `_PyDictKeys_StringLookup`. * Change `_PyObject_TryGetInstanceAttribute` to use that function in the case of split keys. * Add `unicodekeys_lookup_split` helper which allows code sharing between `_Py_dict_lookup` and `_PyDictKeys_StringLookupSplit`. * Fix locking for `STORE_ATTR_INSTANCE_VALUE`. Create `_GUARD_TYPE_VERSION_AND_LOCK` uop so that object stays locked and `tp_version_tag` cannot change. * Pass `tp_version_tag` to `specialize_dict_access()`, ensuring the version we store on the cache is the correct one (in case of it changing during the specalize analysis). * Split `analyze_descriptor` into `analyze_descriptor_load` and `analyze_descriptor_store` since those don't share much logic. Add `descriptor_is_class` helper function. * In `specialize_dict_access`, double check `_PyObject_GetManagedDict()` in case we race and dict was materialized before the lock. * Avoid borrowed references in `_Py_Specialize_StoreAttr()`. * Use `specialize()` and `unspecialize()` helpers. * Add unit tests to ensure specializing happens as expected in FT builds. * Add unit tests to attempt to trigger data races (useful for running under TSAN). * Add `has_split_table` function to `_testinternalcapi`.
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Jan 8, 2025
…thongh-127838) * Add `_PyDictKeys_StringLookupSplit` which does locking on dict keys and use in place of `_PyDictKeys_StringLookup`. * Change `_PyObject_TryGetInstanceAttribute` to use that function in the case of split keys. * Add `unicodekeys_lookup_split` helper which allows code sharing between `_Py_dict_lookup` and `_PyDictKeys_StringLookupSplit`. * Fix locking for `STORE_ATTR_INSTANCE_VALUE`. Create `_GUARD_TYPE_VERSION_AND_LOCK` uop so that object stays locked and `tp_version_tag` cannot change. * Pass `tp_version_tag` to `specialize_dict_access()`, ensuring the version we store on the cache is the correct one (in case of it changing during the specalize analysis). * Split `analyze_descriptor` into `analyze_descriptor_load` and `analyze_descriptor_store` since those don't share much logic. Add `descriptor_is_class` helper function. * In `specialize_dict_access`, double check `_PyObject_GetManagedDict()` in case we race and dict was materialized before the lock. * Avoid borrowed references in `_Py_Specialize_StoreAttr()`. * Use `specialize()` and `unspecialize()` helpers. * Add unit tests to ensure specializing happens as expected in FT builds. * Add unit tests to attempt to trigger data races (useful for running under TSAN). * Add `has_split_table` function to `_testinternalcapi`.
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

@nascheme@mpage@colesbury