Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
bpo-45885: Specialize COMPARE_OP#29734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
sweeneyde commented Nov 23, 2021 • edited by bedevere-bot
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by bedevere-bot
Uh oh!
There was an error while loading. Please reload this page.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
markshannon commented Nov 24, 2021
You microbenchmark is only testing |
Python/ceval.c Outdated
| PyObject*left=SECOND(); | ||
| DEOPT_IF(!PyUnicode_CheckExact(left), COMPARE_OP); | ||
| DEOPT_IF(!PyUnicode_CheckExact(right), COMPARE_OP); | ||
| DEOPT_IF(!PyUnicode_IS_READY(left), COMPARE_OP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? If left == right then they are equal regardless of whether the string is ready or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect that almost all strings are ready. I suppose the alternative is
DEOPT_IF(!PyUnicode_CheckExact(left), COMPARE_OP); DEOPT_IF(!PyUnicode_CheckExact(right), COMPARE_OP); int res = 1; if (left != right){DEOPT_IF(!PyUnicode_IS_READY(left), COMPARE_OP); DEOPT_IF(!PyUnicode_IS_READY(left), COMPARE_OP); res = comare_the_strings_assuming_theyre_ready(left, right)} is that what you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Assuming that the identity shortcut is worthwhile, then it makes sense to shortcut as much work as possible.
You are also missing out of the quick inequality test using lengths len(a) != len(b) implies that a != b.
Maybe wrap unicode_compare_eq in an equality function that readies the strings (I'm surprised that no such thing exists) and use that?
int res = 1; if (left != right){res = _PyUnicode_Equal(left, right)} Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
markshannon commented Nov 24, 2021
There are a lot of branches in the int and float specializations, which is not good for performance. Because neither comparison can fail (ints have a total ordering, and floats have a total ordering plus a special not-equal-if-either-is-a-nan result) we can avoid the branching by masking the result of a comparison with a code for the operator. Also see my comment on the issue about avoiding the need to push and pop objects. |
This comment has been minimized.
This comment has been minimized.
sweeneyde commented Nov 25, 2021
Here's one idea for ints: Then the mask can be chosen depending on whether the following instruction is pop_jump_if_true/false. There would still be the branch depending on whether Py_SIZE(left) - Py_SIZE(right) == 0 and a loop over digits and a branch to check for Py_SIZE(left) < 0, but at least it would save the switch over oparg, the push/pop/incref/decref, and one DISPATCH(). Maybe the float equivalent would deopt if either arg is NaN and then do |
sweeneyde commented Nov 28, 2021
These three combined opcodes have slightly worse hit percentages on some benchmarks. |
sweeneyde commented Nov 28, 2021 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Newer Pyperformance numbers (Note that my setup is not very stable): Slower (20):
Faster (17):
Benchmark hidden because not significant (18): chaos, crypto_pyaes, dulwich_log, json_dumps, json_loads, logging_format, raytrace, regex_compile, regex_v8, richards, sympy_expand, sympy_integrate, sympy_sum, sympy_str, xml_etree_parse, xml_etree_iterparse, xml_etree_generate, xml_etree_process Geometric mean: 1.00x slower |
sweeneyde commented Nov 28, 2021
Newest microbenchmarks: pretty good, even including more comparisons frompyperfimportRunnerrunner=Runner() runner.timeit( "float_loop", setup="arr = [float(x) for x in range(5_000_000)]", stmt="for x in arr:\n"" if x > 1e6 or x == 1e6 or x < 0.0:\n"" break\n", ) runner.timeit( "int_loop", setup="arr = list(range(5_000_000))", stmt="for i in arr:\n"" if i > 6_000_000 or i == 6_000_000 or i < 0:\n"" break\n", ) runner.timeit( "str_loop", setup="arr = ['Py'] * 5_000_000", stmt="for word in arr:\n"" if word == 'Python' or word != 'Py':\n"" break\n", ) """float_loop: Mean +- std dev: [micro_main] 41.5 ms +- 0.4 ms -> [micro_combined3] 23.7 ms +- 0.8 ms: 1.75x fasterint_loop: Mean +- std dev: [micro_main] 202 ms +- 2 ms -> [micro_combined3] 133 ms +- 1 ms: 1.52x fasterstr_loop: Mean +- std dev: [micro_main] 140 ms +- 7 ms -> [micro_combined3] 88.1 ms +- 3.9 ms: 1.59x fasterGeometric mean: 1.62x faster""" |
This comment has been minimized.
This comment has been minimized.
sweeneyde commented Dec 2, 2021
Slower (19):
Faster (20):
Benchmark hidden because not significant (16): chaos, float, logging_format, pickle, pidigits, python_startup_no_site, regex_compile, regex_dna, richards, sympy_integrate, telco, tornado_http, unpickle, xml_etree_parse, xml_etree_generate, xml_etree_process Geometric mean: 1.00x slower |
markshannon commented Dec 2, 2021
My results are less noisy and show a ~1% speedup. Given that some of the largest improvements are from notoriously noisy benchmarks, 1% may be overstating the speedup but it does seem to be real. |
Uh oh!
There was an error while loading. Please reload this page.
| doubledleft=PyFloat_AS_DOUBLE(left); | ||
| doubledright=PyFloat_AS_DOUBLE(right); | ||
| intsign= (dleft>dright) - (dleft<dright); | ||
| DEOPT_IF(isnan(dleft), COMPARE_OP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially we could use the fact that nans are not equal to everything including themselves to add a fourth bit to the mask. Probably leave that for another PR as the maths starts getting a bit convoluted.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
| goto success; | ||
| } | ||
| } | ||
| SPECIALIZATION_FAIL(COMPARE_OP, SPEC_FAIL_OTHER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like more information here.
Is this a builtin class, or a Python class?
Does it override the comparison, or (in the case of ==/!==) does it rely on the default identity comparison?
Doesn't need to be in this PR, though.
markshannon commented Dec 2, 2021
Looks good in general. The approach seems sound and gives a speedup. There a couple of things that need to be tightened up, and a couple of possible enhancements (for future PRs). |
markshannon commented Dec 3, 2021
Updated results are good. |
markshannon commented Dec 3, 2021
👍 |
bedevere-bot commented Dec 3, 2021 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
|
joaoe commented Feb 27, 2022 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
@sweeneyde |
https://bugs.python.org/issue45885