Skip to content

Conversation

@Marius-Juston
Copy link
Contributor

@Marius-JustonMarius-Juston commented Apr 3, 2025

This pull request removes the re module from the email.quoprimime, thus increasing the import speed from 5676 us to 3669 us (a 60% import speed increase );

From

marius@DESKTOP-IOUM5DH:~/cpython$ ./python -X importtime -c "import email.quoprimime" import time: self [us] | cumulative | imported package import time: 88 | 88 | _io import time: 19 | 19 | marshal import time: 143 | 143 | posix import time: 332 | 580 | _frozen_importlib_external import time: 42 | 42 |time import time: 125 | 166 | zipimport import time: 25 | 25 | _codecs import time: 290 | 315 | codecs import time: 190 | 190 | encodings.aliases import time: 417 | 921 | encodings import time: 90 | 90 | encodings.utf_8 import time: 44 | 44 | _signal import time: 22 | 22 | _abc import time: 93 | 114 | abc import time: 484 | 484 | _collections_abc import time: 136 | 733 | io import time: 22 | 22 | _stat import time: 59 | 80 | stat import time: 31 | 31 | errno import time: 43 | 43 | genericpath import time: 87 | 160 | posixpath import time: 283 | 523 | os import time: 50 | 50 | _sitebuiltins import time: 86 | 86 | sitecustomize import time: 30 | 30 | usercustomize import time: 216 | 902 | site import time: 114 | 114 | linecache import time: 203 | 203 | email import time: 22 | 22 | _string import time: 140 | 140 | types import time: 795 | 935 | enum import time: 34 | 34 | _sre import time: 130 | 130 | re._constants import time: 181 | 311 | re._parser import time: 49 | 49 | re._casefix import time: 198 | 591 | re._compiler import time: 57 | 57 | itertools import time: 75 | 75 | keyword import time: 41 | 41 | _operator import time: 153 | 194 | operator import time: 98 | 98 | reprlib import time: 32 | 32 | _collections import time: 553 | 1006 | collections import time: 30 | 30 | _functools import time: 346 | 1381 | functools import time: 95 | 95 | copyreg import time: 311 | 3311 | re import time: 365 | 3697 | string import time: 1777 | 5676 | email.quoprimime

To

marius@DESKTOP-IOUM5DH:~/cpython$ ./python -X importtime -c "import email.quoprimime" import time: self [us] | cumulative | imported package import time: 89 | 89 | _io import time: 18 | 18 | marshal import time: 130 | 130 | posix import time: 305 | 541 | _frozen_importlib_external import time: 37 | 37 |time import time: 115 | 152 | zipimport import time: 24 | 24 | _codecs import time: 273 | 296 | codecs import time: 175 | 175 | encodings.aliases import time: 387 | 857 | encodings import time: 83 | 83 | encodings.utf_8 import time: 40 | 40 | _signal import time: 16 | 16 | _abc import time: 88 | 103 | abc import time: 422 | 422 | _collections_abc import time: 125 | 649 | io import time: 20 | 20 | _stat import time: 54 | 73 | stat import time: 29 | 29 | errno import time: 39 | 39 | genericpath import time: 81 | 148 | posixpath import time: 269 | 490 | os import time: 48 | 48 | _sitebuiltins import time: 80 | 80 | sitecustomize import time: 28 | 28 | usercustomize import time: 199 | 842 | site import time: 106 | 106 | linecache import time: 189 | 189 | email import time: 18 | 18 | _string import time: 124 | 124 | types import time: 667 | 791 | enum import time: 33 | 33 | _sre import time: 130 | 130 | re._constants import time: 177 | 307 | re._parser import time: 49 | 49 | re._casefix import time: 179 | 566 | re._compiler import time: 58 | 58 | itertools import time: 74 | 74 | keyword import time: 36 | 36 | _operator import time: 148 | 183 | operator import time: 96 | 96 | reprlib import time: 31 | 31 | _collections import time: 494 | 934 | collections import time: 27 | 27 | _functools import time: 315 | 1274 | functools import time: 87 | 87 | copyreg import time: 284 | 3000 | re import time: 297 | 3314 | string import time: 168 | 3669 | email.quoprimime

however, the new implementation does increase the compute time

TEST_CASES={"empty": "Dracula", "empty_medium": "Dracula"*10, "empty_long": "Dracula"*100, "short": "Hello=20World=21", "medium": "This_is_a_test=3F=3D=2E"*10, "long": "Some_long_text_with_encoding=20"*100, "mixed": "A=2Equick=20brown=5Ffox=21=3F"*50, "edge_case_short": "=20=21=3F=2E=5F", "edge_case_long": "=20=21=3F=2E=5F"*200 }
Benchmarkregexnon_regex
empty284 ns382 ns: 1.34x slower
empty_medium302 ns2.99 us: 9.91x slower
empty_long371 ns28.6 us: 77.20x slower
short731 ns902 ns: 1.23x slower
medium6.24 us11.8 us: 1.89x slower
long25.5 us137 us: 5.37x slower
mixed57.0 us71.5 us: 1.25x slower
edge_case_short1.36 us916 ns: 1.48x faster
edge_case_long178 us160 us: 1.11x faster
Geometric mean(ref)2.78x slower

So it is very possible that this is not worth it.

Issues:

@Marius-JustonMarius-Juston requested a review from a team as a code ownerApril 3, 2025 10:22
@Marius-JustonMarius-Juston changed the title gh-118761: Quoprimime removing re importgh-118761: email.quoprimime removing re importApr 3, 2025
@Marius-Juston
Copy link
ContributorAuthor

Marius-Juston commented Apr 3, 2025

The PR:

will probably drastically improve the speed as well as once string lazy imports re it will drastically speed up the string import and this module only uses the string module to import constants from string import ascii_letters, digits, hexdigits

@Marius-Juston
Copy link
ContributorAuthor

I did not notice that the warmup needed for ./python -X importtime -c 'import email.quoprimime' and so the more accurate timings are actually:

regex: 153.9974 ± 35.97 (103 to 1778; n=10000) non_regex: 148.4565 ± 25.48 (125 to 991; n=10000)

@Marius-Juston
Copy link
ContributorAuthor

( the new _HEX_TO_CHAR cache could also be used for the decode function as well afterwards since it checks for more or less the same thing)

# Decode if in form =ABelifi+2<nandline[i+1] inhexdigitsandline[i+2] inhexdigits: decoded+=unquote(line[i:i+3])

@Marius-Juston
Copy link
ContributorAuthor

Benchmarkregexnon_regex_2
empty288 ns259 ns: 1.11x faster
empty_medium299 ns1.74 us: 5.81x slower
empty_long375 ns16.3 us: 43.61x slower
short725 ns714 ns: 1.01x faster
medium6.22 us7.97 us: 1.28x slower
long22.0 us85.9 us: 3.91x slower
mixed49.5 us56.3 us: 1.14x slower
edge_case_short1.26 us744 ns: 1.69x faster
edge_case_long177 us125 us: 1.41x faster
Geometric mean(ref)2.01x slower

Slightly faster

@Marius-Juston
Copy link
ContributorAuthor

Adding the '=' check now speeds things up:

Benchmarkregexnon_regex
empty288 ns53.6 ns: 5.37x faster
empty_medium299 ns54.1 ns: 5.53x faster
empty_long375 ns62.5 ns: 6.00x faster
short725 ns722 ns: 1.00x faster
medium6.22 us8.09 us: 1.30x slower
long22.0 us86.7 us: 3.94x slower
mixed49.5 us58.6 us: 1.18x slower
edge_case_short1.26 us767 ns: 1.64x faster
edge_case_long177 us127 us: 1.39x faster
Geometric mean(ref)1.60x faster

@Marius-Juston
Copy link
ContributorAuthor

Marius-Juston commented Apr 3, 2025

As a comparison (if you compile the regex for the function + add early exit)

c=re.compile("=[a-fA-F0-9]{2}", flags=re.ASCII) defheader_decode_re(s): """Decode a string using regex."""s=s.replace('_', ' ') # Replace underscores with spacesif'='ins: returnc.sub(_unquote_match, s) returns
Benchmarkregexregex2non_regex
empty288 ns51.4 ns: 5.60x faster53.6 ns: 5.37x faster
empty_medium299 ns52.0 ns: 5.76x faster54.1 ns: 5.53x faster
empty_long375 ns61.0 ns: 6.15x faster62.5 ns: 6.00x faster
short725 ns560 ns: 1.29x faster722 ns: 1.00x faster
medium6.22 us6.44 us: 1.04x slower8.09 us: 1.30x slower
long22.0 us22.9 us: 1.04x slower86.7 us: 3.94x slower
mixed49.5 us52.6 us: 1.06x slower58.6 us: 1.18x slower
edge_case_short1.26 us1.12 us: 1.13x faster767 ns: 1.64x faster
edge_case_long177 us189 us: 1.07x slower127 us: 1.39x faster
Geometric mean(ref)1.83x faster1.60x faster

@Marius-Juston
Copy link
ContributorAuthor

@AA-Turner, what's your opinion on replacing this regex expression (even though it sometimes makes the algorithm slower)?

@Marius-Juston
Copy link
ContributorAuthor

Very slight improvement (mainly on the edge_case_short and short where string concatenation is faster than using "".join()

Benchmarkregexregex2non_regexnon_regex_add
empty288 ns51.4 ns: 5.60x faster53.6 ns: 5.37x faster51.6 ns: 5.58x faster
empty_medium299 ns52.0 ns: 5.76x faster54.1 ns: 5.53x faster51.6 ns: 5.80x faster
empty_long375 ns61.0 ns: 6.15x faster62.5 ns: 6.00x faster59.9 ns: 6.26x faster
short725 ns560 ns: 1.29x faster722 ns: 1.00x faster674 ns: 1.08x faster
medium6.22 us6.44 us: 1.04x slower8.09 us: 1.30x slower7.82 us: 1.26x slower
long22.0 us22.9 us: 1.04x slower86.7 us: 3.94x slower83.5 us: 3.79x slower
mixed49.5 us52.6 us: 1.06x slower58.6 us: 1.18x slower60.6 us: 1.22x slower
edge_case_short1.26 us1.12 us: 1.13x faster767 ns: 1.64x faster699 ns: 1.80x faster
edge_case_long177 us189 us: 1.07x slower127 us: 1.39x faster127 us: 1.39x faster
Geometric mean(ref)1.83x faster1.60x faster1.66x faster

Copy link
Contributor

@hauntsaninjahauntsaninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slower and harder to maintain, so I'm -1 on this PR

@bedevere-app
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@hauntsaninja
Copy link
Contributor

hauntsaninja commented Apr 6, 2025

Actually I don't understand this PR.

It looks like we still import re transitively? Also I don't understand why the "self" time reported by -X importtime in your PR body goes from 1777 to 168, if anything looks like quoprimime.py does more work at import time now

@AA-Turner
Copy link
Member

AA-Turner commented Apr 6, 2025

I'm a bit lost on the current benchmarks, but the most recent comment (with non_regex_add) appears to indicate this is slightly faster. That said, I agree with @hauntsaninja that the algorithm in the PR is too complicated and will be difficult to maintain, in contrast to the one-liner regular expression.

It looks like we still import re transitively?

Through string, see #132037 to help there.

A

@AA-Turner
Copy link
Member

AA-Turner commented Apr 6, 2025

Also I don't understand why the "self" time reported by -X importtime in your PR body goes from 1777 to 168, if anything looks like quoprimime.py does more work at import time now

I agree this is odd. I've been using the below (rough) script to benchmark import times, for more data points than just a single run.

bench.py
importsubprocess, sysimportstatisticsBASE_CMD= (sys.executable, '-Ximporttime', '-S', '-c',) defrun_importtime(mod: str) ->str: returnsubprocess.run(BASE_CMD+ (f'import {mod}',), check=True, capture_output=True, encoding='utf-8').stderrformodinsys.argv[1:]: for_inrange(5): # warmuplines=run_importtime(mod) print(lines.partition('\n')[0]) own_times= [] cum_times= [] for_inrange(50): lines=run_importtime(mod) final_line=lines.rstrip().rpartition('\n')[-1] # print(final_line)# import time:{own} |{cum} |{mod}own, cum=map(int, final_line.split()[2:5:2]) own_times.append(own) cum_times.append(cum) own_times.sort() cum_times.sort() own_times[:] =own_times[10:-10] cum_times[:] =cum_times[10:-10] forlabel, timesin [('own', own_times), ('cumulative', cum_times)]: print() print(f'import {mod}: {label} time') print(f'mean: {statistics.mean(times):.3f} µs') print(f'median: {statistics.median(times):.3f} µs') print(f'stdev: {statistics.stdev(times):.3f}') print('min:', min(times)) print('max:', max(times))

@python-cla-bot
Copy link

All commit authors signed the Contributor License Agreement.

CLA signed

@hugovkhugovk changed the title gh-118761: email.quoprimime removing re importgh-137855: email.quoprimime removing re importAug 16, 2025
if '=' not in s:
return s

result = ''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeatedly appending to a string in a loop is O(n**2). The standard idiom is to make a list of pieces (result=[]) and join after the loop. I suspect that re.sub does the C equivalent.

In any case, I agree that replacing an re call with this much code seems dubious (a bad tradeoff), so closing this might be best.

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

@Marius-Juston@hauntsaninja@AA-Turner@terryjreedy@ZeroIntensity