gh-137855: `email.quoprimime` removing `re` import#132046

Marius-Juston · 2025-04-03T10:22:34Z

This pull request removes the re module from the email.quoprimime, thus increasing the import speed from 5676 us to 3669 us (a 60% import speed increase );

From

marius@DESKTOP-IOUM5DH:~/cpython$ ./python -X importtime -c "import email.quoprimime" import time: self [us] | cumulative | imported package import time: 88 | 88 | _io import time: 19 | 19 | marshal import time: 143 | 143 | posix import time: 332 | 580 | _frozen_importlib_external import time: 42 | 42 |time import time: 125 | 166 | zipimport import time: 25 | 25 | _codecs import time: 290 | 315 | codecs import time: 190 | 190 | encodings.aliases import time: 417 | 921 | encodings import time: 90 | 90 | encodings.utf_8 import time: 44 | 44 | _signal import time: 22 | 22 | _abc import time: 93 | 114 | abc import time: 484 | 484 | _collections_abc import time: 136 | 733 | io import time: 22 | 22 | _stat import time: 59 | 80 | stat import time: 31 | 31 | errno import time: 43 | 43 | genericpath import time: 87 | 160 | posixpath import time: 283 | 523 | os import time: 50 | 50 | _sitebuiltins import time: 86 | 86 | sitecustomize import time: 30 | 30 | usercustomize import time: 216 | 902 | site import time: 114 | 114 | linecache import time: 203 | 203 | email import time: 22 | 22 | _string import time: 140 | 140 | types import time: 795 | 935 | enum import time: 34 | 34 | _sre import time: 130 | 130 | re._constants import time: 181 | 311 | re._parser import time: 49 | 49 | re._casefix import time: 198 | 591 | re._compiler import time: 57 | 57 | itertools import time: 75 | 75 | keyword import time: 41 | 41 | _operator import time: 153 | 194 | operator import time: 98 | 98 | reprlib import time: 32 | 32 | _collections import time: 553 | 1006 | collections import time: 30 | 30 | _functools import time: 346 | 1381 | functools import time: 95 | 95 | copyreg import time: 311 | 3311 | re import time: 365 | 3697 | string import time: 1777 | 5676 | email.quoprimime

To

marius@DESKTOP-IOUM5DH:~/cpython$ ./python -X importtime -c "import email.quoprimime" import time: self [us] | cumulative | imported package import time: 89 | 89 | _io import time: 18 | 18 | marshal import time: 130 | 130 | posix import time: 305 | 541 | _frozen_importlib_external import time: 37 | 37 |time import time: 115 | 152 | zipimport import time: 24 | 24 | _codecs import time: 273 | 296 | codecs import time: 175 | 175 | encodings.aliases import time: 387 | 857 | encodings import time: 83 | 83 | encodings.utf_8 import time: 40 | 40 | _signal import time: 16 | 16 | _abc import time: 88 | 103 | abc import time: 422 | 422 | _collections_abc import time: 125 | 649 | io import time: 20 | 20 | _stat import time: 54 | 73 | stat import time: 29 | 29 | errno import time: 39 | 39 | genericpath import time: 81 | 148 | posixpath import time: 269 | 490 | os import time: 48 | 48 | _sitebuiltins import time: 80 | 80 | sitecustomize import time: 28 | 28 | usercustomize import time: 199 | 842 | site import time: 106 | 106 | linecache import time: 189 | 189 | email import time: 18 | 18 | _string import time: 124 | 124 | types import time: 667 | 791 | enum import time: 33 | 33 | _sre import time: 130 | 130 | re._constants import time: 177 | 307 | re._parser import time: 49 | 49 | re._casefix import time: 179 | 566 | re._compiler import time: 58 | 58 | itertools import time: 74 | 74 | keyword import time: 36 | 36 | _operator import time: 148 | 183 | operator import time: 96 | 96 | reprlib import time: 31 | 31 | _collections import time: 494 | 934 | collections import time: 27 | 27 | _functools import time: 315 | 1274 | functools import time: 87 | 87 | copyreg import time: 284 | 3000 | re import time: 297 | 3314 | string import time: 168 | 3669 | email.quoprimime

however, the new implementation does increase the compute time

TEST_CASES={"empty": "Dracula", "empty_medium": "Dracula"*10, "empty_long": "Dracula"*100, "short": "Hello=20World=21", "medium": "This_is_a_test=3F=3D=2E"*10, "long": "Some_long_text_with_encoding=20"*100, "mixed": "A=2Equick=20brown=5Ffox=21=3F"*50, "edge_case_short": "=20=21=3F=2E=5F", "edge_case_long": "=20=21=3F=2E=5F"*200 }

Benchmark	regex	non_regex
empty	284 ns	382 ns: 1.34x slower
empty_medium	302 ns	2.99 us: 9.91x slower
empty_long	371 ns	28.6 us: 77.20x slower
short	731 ns	902 ns: 1.23x slower
medium	6.24 us	11.8 us: 1.89x slower
long	25.5 us	137 us: 5.37x slower
mixed	57.0 us	71.5 us: 1.25x slower
edge_case_short	1.36 us	916 ns: 1.48x faster
edge_case_long	178 us	160 us: 1.11x faster
Geometric mean	(ref)	2.78x slower

So it is very possible that this is not worth it.

Issues:

Issue: Improve import time of various stdlib modules #137855

Marius-Juston · 2025-04-03T10:27:38Z

The PR:

gh-118761: Optimise import time for string #132037

will probably drastically improve the speed as well as once string lazy imports re it will drastically speed up the string import and this module only uses the string module to import constants from string import ascii_letters, digits, hexdigits

Marius-Juston · 2025-04-03T10:41:06Z

I did not notice that the warmup needed for ./python -X importtime -c 'import email.quoprimime' and so the more accurate timings are actually:

regex: 153.9974 ± 35.97 (103 to 1778; n=10000) non_regex: 148.4565 ± 25.48 (125 to 991; n=10000)

…ices

Marius-Juston · 2025-04-03T11:05:11Z

( the new _HEX_TO_CHAR cache could also be used for the decode function as well afterwards since it checks for more or less the same thing)

# Decode if in form =ABelifi+2<nandline[i+1] inhexdigitsandline[i+2] inhexdigits: decoded+=unquote(line[i:i+3])

Marius-Juston · 2025-04-03T17:26:46Z

Benchmark	regex	non_regex_2
empty	288 ns	259 ns: 1.11x faster
empty_medium	299 ns	1.74 us: 5.81x slower
empty_long	375 ns	16.3 us: 43.61x slower
short	725 ns	714 ns: 1.01x faster
medium	6.22 us	7.97 us: 1.28x slower
long	22.0 us	85.9 us: 3.91x slower
mixed	49.5 us	56.3 us: 1.14x slower
edge_case_short	1.26 us	744 ns: 1.69x faster
edge_case_long	177 us	125 us: 1.41x faster
Geometric mean	(ref)	2.01x slower

Slightly faster

Marius-Juston · 2025-04-03T17:43:14Z

Adding the '=' check now speeds things up:

Benchmark	regex	non_regex
empty	288 ns	53.6 ns: 5.37x faster
empty_medium	299 ns	54.1 ns: 5.53x faster
empty_long	375 ns	62.5 ns: 6.00x faster
short	725 ns	722 ns: 1.00x faster
medium	6.22 us	8.09 us: 1.30x slower
long	22.0 us	86.7 us: 3.94x slower
mixed	49.5 us	58.6 us: 1.18x slower
edge_case_short	1.26 us	767 ns: 1.64x faster
edge_case_long	177 us	127 us: 1.39x faster
Geometric mean	(ref)	1.60x faster

Marius-Juston · 2025-04-03T17:47:21Z

As a comparison (if you compile the regex for the function + add early exit)

c=re.compile("=[a-fA-F0-9]{2}", flags=re.ASCII) defheader_decode_re(s): """Decode a string using regex."""s=s.replace('_', ' ') # Replace underscores with spacesif'='ins: returnc.sub(_unquote_match, s) returns

Benchmark	regex	regex2	non_regex
empty	288 ns	51.4 ns: 5.60x faster	53.6 ns: 5.37x faster
empty_medium	299 ns	52.0 ns: 5.76x faster	54.1 ns: 5.53x faster
empty_long	375 ns	61.0 ns: 6.15x faster	62.5 ns: 6.00x faster
short	725 ns	560 ns: 1.29x faster	722 ns: 1.00x faster
medium	6.22 us	6.44 us: 1.04x slower	8.09 us: 1.30x slower
long	22.0 us	22.9 us: 1.04x slower	86.7 us: 3.94x slower
mixed	49.5 us	52.6 us: 1.06x slower	58.6 us: 1.18x slower
edge_case_short	1.26 us	1.12 us: 1.13x faster	767 ns: 1.64x faster
edge_case_long	177 us	189 us: 1.07x slower	127 us: 1.39x faster
Geometric mean	(ref)	1.83x faster	1.60x faster

Lib/email/quoprimime.py

Co-authored-by: Adam Turner <[email protected]>

Marius-Juston · 2025-04-04T03:24:52Z

@AA-Turner, what's your opinion on replacing this regex expression (even though it sometimes makes the algorithm slower)?

Marius-Juston · 2025-04-04T04:20:27Z

Very slight improvement (mainly on the edge_case_short and short where string concatenation is faster than using "".join()

Benchmark	regex	regex2	non_regex	non_regex_add
empty	288 ns	51.4 ns: 5.60x faster	53.6 ns: 5.37x faster	51.6 ns: 5.58x faster
empty_medium	299 ns	52.0 ns: 5.76x faster	54.1 ns: 5.53x faster	51.6 ns: 5.80x faster
empty_long	375 ns	61.0 ns: 6.15x faster	62.5 ns: 6.00x faster	59.9 ns: 6.26x faster
short	725 ns	560 ns: 1.29x faster	722 ns: 1.00x faster	674 ns: 1.08x faster
medium	6.22 us	6.44 us: 1.04x slower	8.09 us: 1.30x slower	7.82 us: 1.26x slower
long	22.0 us	22.9 us: 1.04x slower	86.7 us: 3.94x slower	83.5 us: 3.79x slower
mixed	49.5 us	52.6 us: 1.06x slower	58.6 us: 1.18x slower	60.6 us: 1.22x slower
edge_case_short	1.26 us	1.12 us: 1.13x faster	767 ns: 1.64x faster	699 ns: 1.80x faster
edge_case_long	177 us	189 us: 1.07x slower	127 us: 1.39x faster	127 us: 1.39x faster
Geometric mean	(ref)	1.83x faster	1.60x faster	1.66x faster

hauntsaninja

This is slower and harder to maintain, so I'm -1 on this PR

bedevere-app · 2025-04-06T05:33:51Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

hauntsaninja · 2025-04-06T05:42:54Z

Actually I don't understand this PR.

It looks like we still import re transitively? Also I don't understand why the "self" time reported by -X importtime in your PR body goes from 1777 to 168, if anything looks like quoprimime.py does more work at import time now

AA-Turner · 2025-04-06T05:42:57Z

I'm a bit lost on the current benchmarks, but the most recent comment (with non_regex_add) appears to indicate this is slightly faster. That said, I agree with @hauntsaninja that the algorithm in the PR is too complicated and will be difficult to maintain, in contrast to the one-liner regular expression.

It looks like we still import re transitively?

Through string, see #132037 to help there.

A

AA-Turner · 2025-04-06T05:51:08Z

Also I don't understand why the "self" time reported by -X importtime in your PR body goes from 1777 to 168, if anything looks like quoprimime.py does more work at import time now

I agree this is odd. I've been using the below (rough) script to benchmark import times, for more data points than just a single run.

bench.py

importsubprocess, sysimportstatisticsBASE_CMD= (sys.executable, '-Ximporttime', '-S', '-c',) defrun_importtime(mod: str) ->str: returnsubprocess.run(BASE_CMD+ (f'import {mod}',), check=True, capture_output=True, encoding='utf-8').stderrformodinsys.argv[1:]: for_inrange(5): # warmuplines=run_importtime(mod) print(lines.partition('\n')[0]) own_times= [] cum_times= [] for_inrange(50): lines=run_importtime(mod) final_line=lines.rstrip().rpartition('\n')[-1] # print(final_line)# import time:{own} |{cum} |{mod}own, cum=map(int, final_line.split()[2:5:2]) own_times.append(own) cum_times.append(cum) own_times.sort() cum_times.sort() own_times[:] =own_times[10:-10] cum_times[:] =cum_times[10:-10] forlabel, timesin [('own', own_times), ('cumulative', cum_times)]: print() print(f'import {mod}: {label} time') print(f'mean: {statistics.mean(times):.3f} µs') print(f'median: {statistics.median(times):.3f} µs') print(f'stdev: {statistics.stdev(times):.3f}') print('min:', min(times)) print('max:', max(times))

python-cla-bot · 2025-04-06T13:55:49Z

All commit authors signed the Contributor License Agreement.

terryjreedy · 2025-08-16T14:56:01Z

Lib/email/quoprimime.py

+ if '=' not in s:
+ return s
+
+ result = ''


Repeatedly appending to a string in a loop is O(n**2). The standard idiom is to make a list of pieces (result=[]) and join after the loop. I suspect that re.sub does the C equivalent.
In any case, I agree that replacing an re call with this much code seems dubious (a bad tradeoff), so closing this might be best.

Marius-Juston added 5 commits April 3, 2025 04:15

Removed re hex
c670b11

implace replace, removed valid_hex parameter
08cdc0b

joined to big if statement
a3ef550

added news
22e6d9e

inline character assigment
232bb55

Marius-Juston requested a review from a team as a code owner April 3, 2025 10:22

bedevere-appbot added the awaiting review label Apr 3, 2025

bedevere-appbot mentioned this pull request Apr 3, 2025
Improve import time of various stdlib modules #118761
Closed

Marius-Juston changed the title ~~gh-118761: Quoprimime removing re import~~gh-118761: email.quoprimime removing re importApr 3, 2025

use cache for hex to char + instead of single character append use sl…
3ada67a
…ices

inplace assignment with walrus
9e3cc1f

ZeroIntensity added the topic-email label Apr 3, 2025

Marius-Juston added 2 commits April 3, 2025 12:27

removed news since should probably be "skip news" tagged
3287564

fast pass for no '='
8362a2e

AA-Turner reviewed Apr 4, 2025
View reviewed changes

Lib/email/quoprimime.py Outdated Show resolvedHide resolved

Update Lib/email/quoprimime.py
81ae23a
Co-authored-by: Adam Turner <[email protected]>

faster string concatenation
8659486

hauntsaninja requested changes Apr 6, 2025
View reviewed changes

bedevere-appbot added awaiting changes and removed awaiting review labels Apr 6, 2025

ofek mentioned this pull request Aug 16, 2025
Improve import time of various stdlib modules #137855
Open

hugovk changed the title ~~gh-118761: email.quoprimime removing re import~~gh-137855: email.quoprimime removing re importAug 16, 2025

terryjreedy reviewed Aug 16, 2025
View reviewed changes

hauntsaninja closed this Aug 16, 2025

Uh oh!

gh-137855: email.quoprimime removing re import#132046

gh-137855: email.quoprimime removing re import #132046

Uh oh!

Conversation

Marius-Juston commented Apr 3, 2025• edited by bedevere-app botLoading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Marius-Juston commented Apr 3, 2025• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Marius-Juston commented Apr 3, 2025

Uh oh!

Marius-Juston commented Apr 3, 2025

Uh oh!

Marius-Juston commented Apr 3, 2025

Uh oh!

Marius-Juston commented Apr 3, 2025

Uh oh!

Marius-Juston commented Apr 3, 2025• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Marius-Juston commented Apr 4, 2025

Uh oh!

Marius-Juston commented Apr 4, 2025

Uh oh!

hauntsaninja left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bedevere-appbot commented Apr 6, 2025

Uh oh!

hauntsaninja commented Apr 6, 2025• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AA-Turner commented Apr 6, 2025• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AA-Turner commented Apr 6, 2025• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

python-cla-botbot commented Apr 6, 2025

Uh oh!

terryjreedyAug 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gh-137855: `email.quoprimime` removing `re` import#132046

gh-137855: `email.quoprimime` removing `re` import #132046

Marius-Juston commented Apr 3, 2025•
edited by bedevere-app bot
Loading

Marius-Juston commented Apr 3, 2025•
edited
Loading

Marius-Juston commented Apr 3, 2025•
edited
Loading

hauntsaninja left a comment •
edited
Loading

hauntsaninja commented Apr 6, 2025•
edited
Loading

AA-Turner commented Apr 6, 2025•
edited
Loading

AA-Turner commented Apr 6, 2025•
edited
Loading