Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
gh-119396: Optimize PyUnicode_FromFormat() UTF-8 decoder#119398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
vstinner commented May 22, 2024 • edited by bedevere-app bot
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by bedevere-app bot
Uh oh!
There was an error while loading. Please reload this page.
vstinner commented May 22, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Benchmark: diff --git a/Modules/_testcapimodule.c b/Modules/_testcapimodule.c index f99ebf0dde..0752b2b1d2 100644 --- a/Modules/_testcapimodule.c+++ b/Modules/_testcapimodule.c@@ -3312,6 +3312,14 @@ function_set_warning(PyObject *Py_UNUSED(module), PyObject *Py_UNUSED(args)) Py_RETURN_NONE} +static PyObject *+bench(PyObject *Py_UNUSED(module), PyObject *Py_UNUSED(args))+{+ return PyUnicode_FromFormat(+ "%s %s %s %s %s.",+ "format", "multiple", "utf8", "short", "strings");+}+ static PyMethodDef TestMethods[] ={{"set_errno", set_errno, METH_VARARGS},{"test_config", test_config, METH_NOARGS}, @@ -3454,6 +3462,7 @@ static PyMethodDef TestMethods[] ={{"check_pyimport_addmodule", check_pyimport_addmodule, METH_VARARGS},{"test_weakref_capi", test_weakref_capi, METH_NOARGS},{"function_set_warning", function_set_warning, METH_NOARGS}, +{"bench", bench, METH_NOARGS},{NULL, NULL} /* sentinel */ }; Command: ./python -m venv env env/bin/python -m pip install pyperf env/bin/python -m pyperf timeit -s 'import _testcapi; func=_testcapi.bench''func()' -v -o ref.jsonResult, Python built with
|
vstinner commented May 22, 2024
Oh, there was a performance regression on Benchmark: importpyperfimport_testcapirunner=pyperf.Runner() utf8=b'abc'runner.bench_func('abc', utf8.decode) utf8='abcé'.encode() runner.bench_func('abc + UTF-8', utf8.decode) utf8='éabc'.encode() runner.bench_func('UTF-8 + abc', utf8.decode) utf8=b'x'* (1024*1024) runner.bench_func('ASCII 1 MiB', utf8.decode) utf8= ('x'* (1024*1024) +'é').encode() runner.bench_func('ASCII 1 MiB + UTF-8', utf8.decode) utf8= ('é'+'x'* (1024*1024)).encode() runner.bench_func('UTF-8 + ASCII 1 MiB', utf8.decode) utf8= ('€'+'x'* (1024*1024)).encode() runner.bench_func('UTF-8 euro + ASCII 1 MiB', utf8.decode)Results, Python built with => There is no significant impact on |
vstinner commented May 22, 2024
serhiy-storchaka left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Add unicode_decode_utf8_writer() to write directly characters into a _PyUnicodeWriter writer: avoid the creation of a temporary string. Optimize PyUnicode_FromFormat() by using the new unicode_decode_utf8_writer(). Rename unicode_fromformat_write_cstr() to unicode_fromformat_write_utf8(). Microbenchmark on the code: return PyUnicode_FromFormat( "%s %s %s %s %s.", "format", "multiple", "utf8", "short", "strings"); Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.
vstinner commented May 22, 2024
I enabled automerge. Thanks for the review @serhiy-storchaka. |
…n#119398) Add unicode_decode_utf8_writer() to write directly characters into a _PyUnicodeWriter writer: avoid the creation of a temporary string. Optimize PyUnicode_FromFormat() by using the new unicode_decode_utf8_writer(). Rename unicode_fromformat_write_cstr() to unicode_fromformat_write_utf8(). Microbenchmark on the code: return PyUnicode_FromFormat( "%s %s %s %s %s.", "format", "multiple", "utf8", "short", "strings"); Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.
Add unicode_decode_utf8_writer() to write directly characters into a _PyUnicodeWriter writer: avoid the creation of a temporary string. Optimize PyUnicode_FromFormat() by using the new
unicode_decode_utf8_writer().
Rename unicode_fromformat_write_cstr() to
unicode_fromformat_write_utf8().
Microbenchmark on the code:
Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.