Skip to content

Conversation

@vstinner
Copy link
Member

@vstinnervstinner commented May 22, 2024

Add unicode_decode_utf8_writer() to write directly characters into a _PyUnicodeWriter writer: avoid the creation of a temporary string. Optimize PyUnicode_FromFormat() by using the new
unicode_decode_utf8_writer().

Rename unicode_fromformat_write_cstr() to
unicode_fromformat_write_utf8().

Microbenchmark on the code:

return PyUnicode_FromFormat( "%s %s %s %s %s.", "format", "multiple", "utf8", "short", "strings"); 

Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.

@vstinner
Copy link
MemberAuthor

vstinner commented May 22, 2024

Benchmark:

diff --git a/Modules/_testcapimodule.c b/Modules/_testcapimodule.c index f99ebf0dde..0752b2b1d2 100644 --- a/Modules/_testcapimodule.c+++ b/Modules/_testcapimodule.c@@ -3312,6 +3312,14 @@ function_set_warning(PyObject *Py_UNUSED(module), PyObject *Py_UNUSED(args)) Py_RETURN_NONE} +static PyObject *+bench(PyObject *Py_UNUSED(module), PyObject *Py_UNUSED(args))+{+ return PyUnicode_FromFormat(+ "%s %s %s %s %s.",+ "format", "multiple", "utf8", "short", "strings");+}+ static PyMethodDef TestMethods[] ={{"set_errno", set_errno, METH_VARARGS},{"test_config", test_config, METH_NOARGS}, @@ -3454,6 +3462,7 @@ static PyMethodDef TestMethods[] ={{"check_pyimport_addmodule", check_pyimport_addmodule, METH_VARARGS},{"test_weakref_capi", test_weakref_capi, METH_NOARGS},{"function_set_warning", function_set_warning, METH_NOARGS}, +{"bench", bench, METH_NOARGS},{NULL, NULL} /* sentinel */ }; 

Command:

./python -m venv env env/bin/python -m pip install pyperf env/bin/python -m pyperf timeit -s 'import _testcapi; func=_testcapi.bench''func()' -v -o ref.json

Result, Python built with gcc -O3:

620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster

@vstinner
Copy link
MemberAuthor

Oh, there was a performance regression on b"abc".decode(): I fixed it.

Benchmark:

importpyperfimport_testcapirunner=pyperf.Runner() utf8=b'abc'runner.bench_func('abc', utf8.decode) utf8='abcé'.encode() runner.bench_func('abc + UTF-8', utf8.decode) utf8='éabc'.encode() runner.bench_func('UTF-8 + abc', utf8.decode) utf8=b'x'* (1024*1024) runner.bench_func('ASCII 1 MiB', utf8.decode) utf8= ('x'* (1024*1024) +'é').encode() runner.bench_func('ASCII 1 MiB + UTF-8', utf8.decode) utf8= ('é'+'x'* (1024*1024)).encode() runner.bench_func('UTF-8 + ASCII 1 MiB', utf8.decode) utf8= ('€'+'x'* (1024*1024)).encode() runner.bench_func('UTF-8 euro + ASCII 1 MiB', utf8.decode)

Results, Python built with gcc -O3, CPU isolation.

+---------------------+---------+-----------------------+ | Benchmark | ref | change | +=====================+=========+=======================+ | abc | 73.7 ns | 74.7 ns: 1.01x slower | +---------------------+---------+-----------------------+ | abc + UTF-8 | 167 ns | 172 ns: 1.03x slower | +---------------------+---------+-----------------------+ | ASCII 1 MiB | 118 us | 118 us: 1.00x faster | +---------------------+---------+-----------------------+ | ASCII 1 MiB + UTF-8 | 1.08 ms | 1.07 ms: 1.00x faster | +---------------------+---------+-----------------------+ | UTF-8 + ASCII 1 MiB | 572 us | 570 us: 1.00x faster | +---------------------+---------+-----------------------+ | Geometric mean | (ref) | 1.00x slower | +---------------------+---------+-----------------------+ Benchmark hidden because not significant (2): UTF-8 + abc, UTF-8 euro + ASCII 1 MiB 

=> There is no significant impact on bytes.decode() performance (no slow down).

@vstinner
Copy link
MemberAuthor

cc @serhiy-storchaka

Copy link
Member

@serhiy-storchakaserhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Add unicode_decode_utf8_writer() to write directly characters into a _PyUnicodeWriter writer: avoid the creation of a temporary string. Optimize PyUnicode_FromFormat() by using the new unicode_decode_utf8_writer(). Rename unicode_fromformat_write_cstr() to unicode_fromformat_write_utf8(). Microbenchmark on the code: return PyUnicode_FromFormat( "%s %s %s %s %s.", "format", "multiple", "utf8", "short", "strings"); Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.
@vstinnervstinner enabled auto-merge (squash) May 22, 2024 19:20
@vstinner
Copy link
MemberAuthor

I enabled automerge. Thanks for the review @serhiy-storchaka.

@vstinnervstinner disabled auto-merge May 22, 2024 20:45
@vstinnervstinner enabled auto-merge (squash) May 22, 2024 20:45
@vstinnervstinner changed the title gh-119182: Optimize PyUnicode_FromFormat() UTF-8 decodergh-119398: Optimize PyUnicode_FromFormat() UTF-8 decoderMay 22, 2024
@vstinnervstinner changed the title gh-119398: Optimize PyUnicode_FromFormat() UTF-8 decodergh-119396: Optimize PyUnicode_FromFormat() UTF-8 decoderMay 22, 2024
@vstinnervstinner merged commit 9b422fc into python:mainMay 22, 2024
@vstinnervstinner deleted the utf8_writer branch May 22, 2024 21:05
estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024
…n#119398) Add unicode_decode_utf8_writer() to write directly characters into a _PyUnicodeWriter writer: avoid the creation of a temporary string. Optimize PyUnicode_FromFormat() by using the new unicode_decode_utf8_writer(). Rename unicode_fromformat_write_cstr() to unicode_fromformat_write_utf8(). Microbenchmark on the code: return PyUnicode_FromFormat( "%s %s %s %s %s.", "format", "multiple", "utf8", "short", "strings"); Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

@vstinner@serhiy-storchaka