Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
gh-125196: Use PyUnicodeWriter in error handlers#125262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
vstinner commented Oct 10, 2024 • edited by bedevere-app bot
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by bedevere-app bot
Uh oh!
There was an error while loading. Please reload this page.
PyCodec_ReplaceErrors() and PyCodec_BackslashReplaceErrors() now use the public PyUnicodeWriter API.
| } | ||
| staticinlineint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to move this inline function into the header file? That way it can omit the declaration in the header file and integrate well. It looks like a public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I implemented #125201 which should replace this static inline function.
vstinner commented Oct 10, 2024
Benchmark: importcodecsimportpyperfrunner=pyperf.Runner() replace=codecs.lookup_error('replace') backslashreplace=codecs.lookup_error('backslashreplace') forLENin (1, 1000): text="\xff"*LENexc=UnicodeEncodeError("ascii", text, 0, len(text), "reason") runner.bench_func(f"replace encode len={LEN}", replace, exc) text="\xff"*LENexc=UnicodeTranslateError(text, 0, len(text), "reason") runner.bench_func(f"replace translate len={LEN}", replace, exc) data=b"\x20\x80\xff"*LENexc=UnicodeDecodeError("ascii", data, 0, len(data), "reason") runner.bench_func(f"backslashreplace decode len={LEN}", backslashreplace, exc) text="\x20\xff\u20ac\U0010ffff"*LENexc=UnicodeEncodeError("ascii", text, 0, len(text), "reason") runner.bench_func(f"backslashreplace encode len={LEN}", backslashreplace, exc) text="\x20\xff\u20ac\U0010ffff"*LENexc=UnicodeTranslateError(text, 0, len(text), "reason") runner.bench_func(f"backslashreplace translate len={LEN}", backslashreplace, exc)Results, Python built with It's way slower :-( This PR has a naive PyUnicodeWriter_Fill() implementation calling PyUnicodeWriter_WriteChar() in a loop. I wrote PR gh-125201 which is way more efficient: |
vstinner commented Oct 10, 2024
For the benchmark, I had to call directly the error handler functions. Using ASCII doesn't work (it doesn't measure my change), since |
vstinner commented Oct 15, 2024
Let's keep the private API for now, since it's faster. |
PyCodec_ReplaceErrors() and PyCodec_BackslashReplaceErrors() now use the public PyUnicodeWriter API.