Skip to content

Conversation

@eendebakpt
Copy link
Contributor

@eendebakpteendebakpt commented Nov 25, 2024

The function operator.methodcaller is not thread-safe since the additional of the vectorcall method in ##107201. In the free-threading build the issue is easy to trigger, for the normal build harder.

We could either remove the vectorcall implementation, or adapt the vectorcall implementation to be thread safe (for both the normal and free-threading build). In this PR we make the methodcaller safe by

  • Replacing the lazy initialiation with initialization in the constructor
  • Using a stack allocated space for the vectorcall arguments, and falling back to tp_call for more than 8 arguments

Benchmark results:

call: Mean +- std dev: [no_vectorcall] 353 ns +- 8 ns -> [pr] 149 ns +- 5 ns: 2.37x faster creation: Mean +- std dev: [no_vectorcall] 317 ns +- 8 ns -> [pr] 439 ns +- 31 ns: 1.39x slower creation+call: Mean +- std dev: [no_vectorcall] 652 ns +- 17 ns -> [pr] 613 ns +- 14 ns: 1.06x faster call kwarg: Mean +- std dev: [no_vectorcall] 599 ns +- 88 ns -> [pr] 199 ns +- 6 ns: 3.01x faster creation kwarg: Mean +- std dev: [no_vectorcall] 451 ns +- 4 ns -> [pr] 653 ns +- 59 ns: 1.45x slower creation+call kwarg: Mean +- std dev: [no_vectorcall] 1.05 us +- 0.02 us -> [pr] 858 ns +- 9 ns: 1.22x faster Geometric mean: 1.29x faster 

(old machine, no PGO, comparing this PR with the same PR but the vectorcall disabled)

Benchmark script
import pyperf setup = """ from operator import methodcaller as mc arr = [] call = mc('sort') call_kwarg = mc('sort', reverse=True) """ runner = pyperf.Runner() runner.timeit(name="call", stmt="call(arr)", setup=setup) runner.timeit(name="creation", stmt="call = mc('sort')", setup=setup) runner.timeit(name="creation+call", stmt="call = mc('sort'); call(arr)", setup=setup) runner.timeit(name="call kwarg", stmt="call_kwarg(arr)", setup=setup) runner.timeit(name="creation kwarg", stmt="call = mc('sort', reverse=True)", setup=setup) runner.timeit(name="creation+call kwarg", stmt="call = mc('sort', reverse=True); call(arr)", setup=setup) 

Copy link
Contributor

@colesburycolesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments:

  • I don't think _methodcaller_initialize_vectorcall is thread safe without the GIL. Lazily initializing arguments complicates the code and thread-safety -- I don't think it's worth it.
  • Generally, I think it's better to support some fixed max number of arguments (like 8), and stack allocate the temporary array. (i.e., PyObject **tmp_args[MAX_ARGS];). If the methodcaller needs more arguments than that, just don't set the vectorcall field and let it fall back to the tp_call.

(PyTuple_GET_SIZE(mc->xargs)) | PY_VECTORCALL_ARGUMENTS_OFFSET,
mc->vectorcall_kwnames);

PyMem_Free(tmp_args);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyMem_Free is after the return statement

@eendebakpteendebakpt changed the title Draft: gh-127065: Make methodcaller thread-safe and re-entrantgh-127065: Make methodcaller thread-safe and re-entrantDec 1, 2024
Comment on lines 1600 to 1601
PyObject**vectorcall_args; /* Borrowed references */
PyObject*vectorcall_kwnames;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ownership here seems a bit complicated and I think it can be simplified. As I understand it, vectorcall_kwnames only exists to ensure that some entries in vectorcall_args stay alive.

Instead, I'd suggest:

  • Make PyObject *vectorcall_args a tuple (that holds strong references to its contents as usual)
  • Get rid of vectorcall_kwnames
  • Use _PyTuple_ITEMS for fast access to the contents of the tuple (for memcpy())
  • Visit vectorcall_args in methodcaller_traverse

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vectorcall_kwnames is needed as an argument for PyObject_VectorcallMethod in methodcaller_vectorcall (https://github.com/python/cpython/blob/main/Modules/_operator.c#L1666), so we cannot get rid of it.

The ownership is not too hard I think: the objects in vectorcall_args have references borrowed from either mc->args or (the keys from) mc->kwds. I added a comment to clarify this.

Making the vectorcall_args a tuple is still an option though. It requires a bit more memory and a tiny bit of computation in the initialization. It would be the C equivalent of vectorcall_args = args + tuple(kwds). I'll work it out in a branch to see how it compares

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, in that case don't worry about it unless you prefer it as a tuple.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff between the two approaches is this:

eendebakpt/cpython@methodcaller_ft...eendebakpt:cpython:methodcaller_ft_v2

What is nice about making vectorcall_args a tuple is that if there are no keyword arguments, we can reuse mc->args. It does require more operations in the construction though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either approach is fine! My guess is that the vast majority of uses of methodcaller() do not use keyword arguments.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running benchmarks shows what is to be expected from the implementations: using a tuple for vectorcall_args is a bit slower in initializing, except when there are no keyword arguments (since then we reuse the arg tuple). Differences are small though.

Since using a tuple leads to cleaner code and the majority of uses is without keywords I slightly prefer the tuple approach. I will open a new PR for it.

Copy link
Contributor

@colesburycolesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good to me. A few minor formatting suggestions below.

Comment on lines 1609 to 1610
methodcaller_vectorcall(
methodcallerobject*mc, PyObject*const*args, size_tnargsf, PyObject*kwnames)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the more common formatting looks like:

methodcaller_vectorcall(methodcallerobject*mc, PyObject*const*args, size_tnargsf, PyObject*kwnames)

assert(1+number_of_arguments <= _METHODCALLER_MAX_ARGS);
memcpy(tmp_args+1, mc->vectorcall_args, sizeof(PyObject*) *number_of_arguments);

PyObject*result=PyObject_VectorcallMethod(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you, but I'd write this without the temporary result variable like:

return PyObject_VectorcallMethod(...); 

returnresult;
}

staticint_methodcaller_initialize_vectorcall(methodcallerobject*mc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
staticint_methodcaller_initialize_vectorcall(methodcallerobject*mc)
staticint
_methodcaller_initialize_vectorcall(methodcallerobject*mc)

eendebakptand others added 4 commits December 6, 2024 21:53
Co-authored-by: Sam Gross <colesbury@gmail.com>
Co-authored-by: Sam Gross <colesbury@gmail.com>
@eendebakpt
Copy link
ContributorAuthor

Closing in favor of #127746

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

@eendebakpt@colesbury