Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
gh-133968: Use private unicode writer for json#133832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
nineteendo commented May 10, 2025 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
ZeroIntensity commented May 10, 2025
What's the point? This just adds more maintenance if we make changes to how |
nineteendo commented May 10, 2025
They might have caused a performance regression compared to 3.13: faster-cpython/ideas#726 |
nineteendo commented May 10, 2025
ZeroIntensity left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a good idea.
jsonoptimizations have been rejected in the past--useujsonor something like that if performance is critical.- If we change how
PyUnicodeWriterworks, it adds more maintenance, especially if we use this as precedent for doing this elsewhere. - We should aim for optimizing
PyUnicodeWriteras a broader change, not speed up each individual case by using the private API.
vstinner left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to only replace PyUnicodeWriter_WriteUTF8() with _PyUnicodeWriter_WriteASCIIString()? Do you get similar performance in this case?
vstinner commented May 11, 2025
See also #133186. |
ZeroIntensity commented May 11, 2025
If |
vstinner commented May 11, 2025
I chose to not expose it since it generates an invalid string if the input string contains non-ASCII characters. But yeah, maybe we should expose it. The function only validates the input string in debug mode for best performance. |
nineteendo commented May 11, 2025 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Maybe, but my current benchmark has too much overhead to measure this accurately. I'll have to rewrite it first. I hope we can figure out how to get the performance of the public API very close to the private one, such that everyone feels comfortable using it. |
nineteendo commented May 11, 2025
I updated the benchmark, but I don't understand why:
Does this have something to do with the |
vstinner commented May 11, 2025
The private API doesn't enable overallocation by default. |
vstinner commented May 12, 2025
I replaced PyUnicodeWriter_WriteUTF8() with _PyUnicodeWriter_WriteASCIIString() in Modules/_json.c and ran a benchmark:
Benchmark hidden because not significant (15): encode 100 integers, encode 100 floats, encode 100 "ascii" strings, encode ascii string len=100, encode escaped string len=128, encode Unicode string len=100, encode 1000 integers, encode 1000 floats, encode 1000 "ascii" strings, encode ascii string len=1000, encode Unicode string len=1000, encode 10000 "ascii" strings, encode ascii string len=10000, encode escaped string len=9984, encode Unicode string len=10000 I built Python with Benchmark code: Detailsimportjsonimportpyperfrunner=pyperf.Runner() forcountin (100, 1_000, 10_000): runner.bench_func(f'encode {count} booleans', json.dumps, [True, False] * (count//2)) runner.bench_func(f'encode {count} integers', json.dumps, list(range(count))) runner.bench_func(f'encode {count} floats', json.dumps, [1.0] *count) runner.bench_func(f'encode {count} "ascii" strings', json.dumps, ['ascii'] *count) text='ascii'text*= (count//len(text) or1) runner.bench_func(f'encode ascii string len={len(text)}', json.dumps, text) text=''.join(chr(ch) forchinrange(128)) text*= (count//len(text) or1) runner.bench_func(f'encode escaped string len={len(text)}', json.dumps, text) text='abcd€'text*= (count//len(text) or1) runner.bench_func(f'encode Unicode string len={len(text)}', json.dumps, text) |
vstinner commented May 12, 2025
I also ran my benchmark on this PR:
Benchmark hidden because not significant (1): encode Unicode string len=1000 |
nineteendo commented May 12, 2025
Yeah, but both the old code and the public API do, so it's not that. (And you really don't want to turn it off) Details
|
nineteendo commented May 12, 2025 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
This line is inefficient for exact string instances ( cpython/Objects/unicodeobject.c Line 13936 in 86c1d43
|
nineteendo commented May 13, 2025 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Here's the comparison with a minimal PR:
|
vstinner commented May 13, 2025
Merged. I confirmed with two benchmarks that this small optimization makes a big difference on some use cases such as encoding short strings in JSON. |
vstinner commented May 13, 2025
Ok, I created #133973 to add PyUnicodeWriter_WriteASCII(). |
nineteendo commented May 13, 2025
If that's merged we would use this aproach in 3.14, right? |
nineteendo commented May 15, 2025
@vstinner, it looks like the regression in |
vstinner commented May 16, 2025
Are you sure about that? The implementation uses a freelist which avoids the heap allocation in most cases. |
vstinner commented May 16, 2025
Would you mind to create a separated PR just for that? |
ZeroIntensity commented May 16, 2025
Are the benchmarks creating an unrealistic number of concurrent writers? That would starve the freelist and create some allocation overhead, but only on the benchmarks. |
nineteendo commented May 16, 2025 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
You're right, it does seem to be using the only entry of the freelist. (I disabled the malloc to check) |
vstinner commented May 30, 2025
I suggest closing this PR. It's not worth it anymore (according to the benchmark below) and I prefer to stick to the public C API. I made two small optimizations in the public
With these optimizations, it seems like this PR is less appealing. I ran a benchmark to compare this PR to the current main branch:
The best speedup is 1.07x faster for "encode 10000 integers". The worst slowdown is 1.10x slower for "encode Unicode string len=10000". Overall, the impact is "1.00x faster" which is not impressive. |
vstinner commented May 30, 2025
Hum. I would be interested by a change which would just remove _PyUnicodeWriter_IsEmpty(), without touching WriteUTF8/WriteASCII calls. |
nineteendo commented May 30, 2025
According to the pyperformance benchmark, json_loads is still 10% slower because of the freelist. And after I've updated the PR, it will only be using the public API. |
nineteendo commented May 30, 2025
Done. Decoding empty strings is now 17% faster. Annoyingly, decoding strings with escapes is 7% slower. |
c81446a into python:mainUh oh!
There was an error while loading. Please reload this page.
vstinner commented May 31, 2025
I merged your change, thanks. |
nineteendo commented May 31, 2025
This still needs to be backported. |
ZeroIntensity commented May 31, 2025
Hm, do we backport performance improvements? |
nineteendo commented May 31, 2025
We backported the other fixes for the performance regression. See the issue. |
ZeroIntensity commented May 31, 2025
I thought the introduction of |
vstinner commented May 31, 2025
I would prefer to not backport this change. |
pyperformance (with
--enable-optimizationsand--with-lto)jsonyx-performance-tests (with
--enable-optimizationsand--with-lto)