Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
bpo-45367: Specialize BINARY_MULTIPLY#28727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
sweeneyde commented Oct 5, 2021 • edited by bedevere-bot
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by bedevere-bot
Uh oh!
There was an error while loading. Please reload this page.
bedevere-bot commented Oct 5, 2021
🤖 New build scheduled with the buildbot fleet by @sweeneyde for commit 300a0ca 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
sweeneyde commented Oct 5, 2021 • edited by bedevere-bot
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by bedevere-bot
Uh oh!
There was an error while loading. Please reload this page.
The failing |
sweeneyde commented Oct 5, 2021
On and on |
Uh oh!
There was an error while loading. Please reload this page.
markshannon commented Oct 5, 2021
Looks promising. Apologies for the merge conflicts from #28723. |
Fidget-Spinner left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Dennis, this looks good and I think it's almost ready to merge :). Just needs some stable benchmarks on non-Windows platforms.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
| goto success; | ||
| } | ||
| else{ | ||
| SPECIALIZATION_FAIL(BINARY_MULTIPLY, SPEC_FAIL_OTHER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be refined further. In another PR, though.
Fidget-SpinnerOct 5, 2021 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I don't think it will touch pyperformance, but it seems that in our own test suite almost 10% aren't hits #28727 (comment).
This probably varies wildly, I'd imagine test_string has lots of str * num too.
Oh gosh I just realized I'd read the entire PR as BINARY_ADD not BINARY_MULTIPLY. Please ignore my delusional ramblings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the non-hits are deferred, not specialization failure, so I think that's because the nature of a lot of test code is to only be run once, without many tight loops.
Fidget-SpinnerOct 5, 2021 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out Dennis. I can't believe I missed that :). That means nearly 100% actual specialization on hot code. Hooray!
Off-topic: I've recently wondered if pyperformance is somewhat out of touch (no offence intended to its contributors). Many of its benchmarks (nbody, nqueens, etc.) have been found by the JS folks to not be realistic. The Pyston benchmark suite seems way more comprehensive.
markshannonOct 5, 2021 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"deferred" means that the ADAPTIVE instruction is counting down until its next specialization attempt.
This only happens after a specialization failure.
Look at the numbers. deferred ≈ sum(specialization-failures) * ADAPTIVE_CACHE_BACKOFF
markshannon commented Oct 5, 2021
It does show speedups for the benchmarks we would expect, but there are significant slowdowns for some other benchmarks; notably the pickle benchmarks. |
Fidget-Spinner commented Oct 5, 2021
I've personally disregarded anything from pyperformance in the range of +-1.05x. That said, I wholly believe nbody sped up, I just don't trust pickle slowing down. There's no |
sweeneyde commented Oct 5, 2021
I ran I'm guessing specializing int/float mixtures would be beneficial. I wonder if it's better to have a single slightly-more-complex opcode or add more opcodes. I was thinking something like TARGET(BINARY_MULTIPLY_FLOAT){PyObject*left=SECOND(); PyObject*right=TOP(); doubledleft, dright; if (PyFloat_CheckExact(left)){dleft= ((PyFloatObject*)left)->ob_fval; if (PyFloat_CheckExact(right)){dright= ((PyFloatObject*)right)->ob_fval} elseif (PyLong_CheckExact(right) &&IS_MEDIUM_VALUE(right)){dright= (double)medium_value(right)} else{DEOPT_IF(1, BINARY_MULTIPLY)} } elseif (PyLong_CheckExact(left) &&IS_MEDIUM_VALUE(left)){DEOPT_IF(!PyFloat_CheckExact(right), BINARY_MULTIPLY); dleft= (double)medium_value(left); dright= ((PyFloatObject*)right)->ob_fval} else{DEOPT_IF(1, BINARY_MULTIPLY)} STAT_INC(BINARY_MULTIPLY, hit); record_hit_inline(next_instr, oparg); PyObject*prod=PyFloat_FromDouble(dleft*dright); SET_SECOND(prod); Py_DECREF(right); Py_DECREF(left); STACK_SHRINK(1); if (prod==NULL){goto error} DISPATCH()} |
sweeneyde commented Oct 5, 2021
The most recent change gets the results of |
markshannon commented Oct 6, 2021
What types are you specializing for, and why? Please read https://github.com/python/cpython/blob/main/Python/adaptive.md, specifically https://github.com/python/cpython/blob/main/Python/adaptive.md#gathering-data |
sweeneyde commented Oct 6, 2021
My thinking was that two To clarify, are you advising that the increased complexity of the opcode is not worth the increased percentage of instructions specialized? Is the solution to
I am struggling to run the whole pyperformance suite at once, but as an example, bm_spectral_norm goes from to when incorporating I suppose we could gather specialization stats for what happens if |
Fidget-Spinner commented Oct 6, 2021 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
nbody sped up, everything else looks in the realm of noise for pyperformance: https://gist.github.com/Fidget-Spinner/6fd3149fc82497d028b02046765ba8d8 |
markshannon commented Oct 7, 2021
Yes.
I don't think there is sufficient evidence that these are useful. At least, not yet.
The deferred operations only represent about 1%. Not worth worrying about. |
markshannon commented Oct 7, 2021
In general, it is not the number of branches that matters, as much as their predictability. |
sweeneyde commented Oct 10, 2021
I am beginning to think this PR may not be worth it at all: even on microbenchmarks, after PGO, there's not that much benefit: Program: frompyperfimportRunnerrunner=Runner() runner.timeit( "int: x*x", setup="from itertools import repeat; it = repeat(1, 1_000_000)", stmt="for x in it: x*x") runner.timeit( "float: x*x", setup="from itertools import repeat; it = repeat(1.0, 1_000_000)", stmt="for x in it: x*x") runner.timeit( "int: x*...*x", setup="from itertools import repeat; it = repeat(1, 1_000_000)", stmt="for x in it: x*x*x*x*x*x*x*x*x*x*x*x*x*x*x*x*x*x*x*x*x") runner.timeit( "float: x*...*x", setup="from itertools import repeat; it = repeat(1.0, 1_000_000)", stmt="for x in it: x*x*x*x*x*x*x*x*x*x*x*x*x*x*x*x*x*x*x*x*x")Results from PGO on WSL:
Results from pgo using build.bat (MSVC):
|
markshannon commented Oct 11, 2021
@sweeneyde The speedups are worthwhile for less than 100 additional lines of code, IMO. The speedups on the micro benchmark shows that it works. Don't forget that |
sweeneyde commented Oct 11, 2021
Okay, that makes sense, thanks. Re-opened. |
bedevere-bot commented Oct 13, 2021
🤖 New build scheduled with the buildbot fleet by @markshannon for commit 31c3bb9 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
markshannon commented Oct 14, 2021
Buildbot failures seem to be the usual suspects. |
bedevere-bot commented Oct 14, 2021
|
https://bugs.python.org/issue45367