Skip to content

Conversation

@corona10
Copy link
Member

@corona10corona10 commented Nov 24, 2024

@corona10
Copy link
MemberAuthor

I am going to write test codes at this week.

Comment on lines 1799 to 1800
ht->_spec_cache.getitem=descriptor;
ht->_spec_cache.getitem_version=version;
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should make to be thread-safe by using LOCK_TYPE(), but it will be handled with separated PRs.

@corona10corona10 changed the title gh-115999: Add partial free-thread specialzation for BINARY_SUBSCRgh-115999: Add partial free-thread specialization for BINARY_SUBSCRNov 24, 2024
@corona10corona10force-pushed the gh-115999-binary-subscr branch from 1dacf5d to 1a901a3CompareNovember 27, 2024 15:26
@corona10
Copy link
MemberAuthor

@mpage Would you like to take a look?

Copy link
Contributor

@mpagempage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! I left one comment inline, but otherwise looks good to me. I kicked off benchmark runs for default/free-threaded builds and will post the results once they're finished.

@corona10
Copy link
MemberAuthor

@mpage Done thanks!

@corona10
Copy link
MemberAuthor

corona10 commented Nov 29, 2024

FYI, aarch64 linux CI failure is not related to this PR, #127332 will fix the issue.

@mpage
Copy link
Contributor

I should have caught this last time, but I think we want to use the fast-path for list accesses in BINARY_SUBSCR_LIST_INT.

@corona10
Copy link
MemberAuthor

I should have caught this last time, but I think we want to use the fast-path for list accesses in BINARY_SUBSCR_LIST_INT.

You mean that the fast-path that can avoid locking object?

@mpage
Copy link
Contributor

You mean that the fast-path that can avoid locking object?

Yep

@corona10
Copy link
MemberAuthor

corona10 commented Nov 30, 2024

What about checking _Py_IsOwnedByCurrentThread and _PyObject_GC_IS_SHARED?

@mpage
Copy link
Contributor

What about checking _Py_IsOwnedByCurrentThread and _PyObject_GC_IS_SHARED?

Yes, those are needed for correctness. I think for now we probably want to have two slightly different code paths for the free-threaded and default builds. The default build should continue to use the existing logic to fetch the item from the list. The free-threaded builds should use PyList_GetItemRef to take advantage of the fast-path.

@corona10
Copy link
MemberAuthor

Matt, I define a new _PyList_GetItemRef to avoid unnecessary type checking, which is guaranteed when the function is called by a specializer.

@corona10corona10 requested a review from mpageNovember 30, 2024 13:03
Py_ssize_tindex= ((PyLongObject*)sub)->long_value.ob_digit[0];
#ifdefPy_GIL_DISABLED
STAT_INC(BINARY_SUBSCR, hit);
PyObject*res_o=_PyList_GetItemRef(list, index);
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default build of list_get_item_ref is almost same as current implementation,
we might able to this implementation to the default build also.

#else
staticinlinePyObject*
list_get_item_ref(PyListObject*op, Py_ssize_ti)
{
if (!valid_index(i, Py_SIZE(op))){
returnNULL;
}
returnPy_NewRef(PyList_GET_ITEM(op, i));
}
#endif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would certainly be cleaner, but it might come with some performance cost in the default build.

Copy link
Contributor

@mpagempage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I kicked off new benchmark runs and will post the results here once they're finished. They take a few hours to run and there are other runs queued ahead of them, so that will probably be sometime tomorrow.

There are a couple tests that should be enabled using requires_specialization_ft now that we're specializing BINARY_SUBSCR in free-threaded builds:

@corona10corona10 requested a review from mpageDecember 1, 2024 13:06
Copy link
Contributor

@mpagempage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Most recent performance results look good:


PyAPI_FUNC(PyObject*) _PyList_Extend(PyListObject*, PyObject*);
externvoid_PyList_DebugMallocStats(FILE*out);
// _PyList_GetItemRef should be used only when the object is known to be a list.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would also be worth mentioning that this does not raise errors whereas PyList_GetItemRef does.

@corona10corona10 merged commit e271340 into python:mainDec 2, 2024
49 of 51 checks passed
@corona10corona10 deleted the gh-115999-binary-subscr branch December 2, 2024 01:38
picnixz pushed a commit to picnixz/cpython that referenced this pull request Dec 2, 2024
@vstinner
Copy link
Member

It seems like the change introduced a regression: #127521.

srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Jan 8, 2025
ebonnal pushed a commit to ebonnal/cpython that referenced this pull request Jan 12, 2025
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

@corona10@mpage@vstinner