Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
gh-137855: email.quoprimime removing re import#132046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
Marius-Juston commented Apr 3, 2025 • edited by bedevere-app bot
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by bedevere-app bot
Uh oh!
There was an error while loading. Please reload this page.
re importemail.quoprimime removing re importMarius-Juston commented Apr 3, 2025 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
The PR: will probably drastically improve the speed as well as once |
Marius-Juston commented Apr 3, 2025
I did not notice that the warmup needed for regex: 153.9974 ± 35.97 (103 to 1778; n=10000) non_regex: 148.4565 ± 25.48 (125 to 991; n=10000) |
Marius-Juston commented Apr 3, 2025
( the new _HEX_TO_CHAR cache could also be used for the # Decode if in form =ABelifi+2<nandline[i+1] inhexdigitsandline[i+2] inhexdigits: decoded+=unquote(line[i:i+3]) |
Marius-Juston commented Apr 3, 2025
Slightly faster |
Marius-Juston commented Apr 3, 2025
Adding the '=' check now speeds things up:
|
Marius-Juston commented Apr 3, 2025 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
As a comparison (if you compile the regex for the function + add early exit) c=re.compile("=[a-fA-F0-9]{2}", flags=re.ASCII) defheader_decode_re(s): """Decode a string using regex."""s=s.replace('_', ' ') # Replace underscores with spacesif'='ins: returnc.sub(_unquote_match, s) returns
|
Uh oh!
There was an error while loading. Please reload this page.
Co-authored-by: Adam Turner <[email protected]>
Marius-Juston commented Apr 4, 2025
@AA-Turner, what's your opinion on replacing this regex expression (even though it sometimes makes the algorithm slower)? |
Marius-Juston commented Apr 4, 2025
Very slight improvement (mainly on the
|
hauntsaninja left a comment • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is slower and harder to maintain, so I'm -1 on this PR
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
hauntsaninja commented Apr 6, 2025 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Actually I don't understand this PR. It looks like we still import |
AA-Turner commented Apr 6, 2025 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
I'm a bit lost on the current benchmarks, but the most recent comment (with
Through A |
AA-Turner commented Apr 6, 2025 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
I agree this is odd. I've been using the below (rough) script to benchmark import times, for more data points than just a single run.
|
email.quoprimime removing re importemail.quoprimime removing re import| if '=' not in s: | ||
| return s | ||
| result = '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repeatedly appending to a string in a loop is O(n**2). The standard idiom is to make a list of pieces (result=[]) and join after the loop. I suspect that re.sub does the C equivalent.
In any case, I agree that replacing an re call with this much code seems dubious (a bad tradeoff), so closing this might be best.
This pull request removes the
remodule from theemail.quoprimime, thus increasing the import speed from5676us to3669us (a 60% import speed increase );From
To
however, the new implementation does increase the compute time
So it is very possible that this is not worth it.
Issues:
reuses #130167