gh-139871: Optimize bytearray construction with encoding#142243

cmaloney · 2025-12-03T22:52:01Z

When a str is encoded in bytearray.__init__ the encoder tends to create a new unique bytes object. Rather than allocate new memory and copy the bytes use the already created bytes object as bytearray backing. The bigger the str the bigger the saving.

Mean +- std dev: [main_encoding] 497 us +- 9 us -> [encoding] 14.2 us +- 0.3 us: 34.97x faster

importpyperfrunner=pyperf.Runner() runner.timeit( name="encode", setup="a = 'a' * 1_000_000", stmt="bytearray(a, encoding='utf8')")

Issue: Add .take_bytes([n]) a zero-copy path from bytearray to bytes #139871

When a `str` is encoded in `bytearray.__init__` the encoder tends to create a new unique bytes object. Rather than allocate new memory and copy the bytes use the already created bytes object as bytearray backing. The bigger the `str` the bigger the saving. Mean +- std dev: [main_encoding] 497 us +- 9 us -> [encoding] 14.2 us +- 0.3 us: 34.97x faster ```python import pyperf runner = pyperf.Runner() runner.timeit( name="encode", setup="a = 'a' * 1_000_000", stmt="bytearray(a, encoding='utf8')") ```

cmaloney · 2025-12-11T06:20:41Z

cc: @vstinner this construction form doesn't appear a lot in the CPython codebase but does exist in other codebases.

This one I think is a safe subset from #141862; hope to revisit that eventually but it's definitely many-step to get working just right.

vstinner

LGTM. It seems to be safe to pick the bytes object in this case.

vstinner · 2025-12-15T12:10:49Z

Merged, thanks.

…n#142243) When a `str` is encoded in `bytearray.__init__` the encoder tends to create a new unique bytes object. Rather than allocate new memory and copy the bytes use the already created bytes object as bytearray backing. The bigger the `str` the bigger the saving. Mean +- std dev: [main_encoding] 497 us +- 9 us -> [encoding] 14.2 us +- 0.3 us: 34.97x faster ```python import pyperf runner = pyperf.Runner() runner.timeit( name="encode", setup="a = 'a' * 1_000_000", stmt="bytearray(a, encoding='utf8')") ```

bedevere-appbot added the awaiting review label Dec 3, 2025

bedevere-appbot mentioned this pull request Dec 3, 2025
Add .take_bytes([n]) a zero-copy path from bytearray to bytes#139871
Closed

cmaloney added the skip news label Dec 3, 2025

cmaloney mentioned this pull request Dec 3, 2025
gh-139871: Optimize bytearray unique bytes iconcat #141862
Open

Merge branch 'main' into ba_tb_encoding
663ed88

vstinner approved these changes Dec 11, 2025
View reviewed changes

bedevere-appbot added awaiting merge and removed awaiting review labels Dec 11, 2025

vstinner merged commit 14e6052 into python:mainDec 15, 2025
83 of 85 checks passed

bedevere-appbot removed the awaiting merge label Dec 15, 2025

cmaloney deleted the ba_tb_encoding branch December 15, 2025 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-139871: Optimize bytearray construction with encoding#142243

gh-139871: Optimize bytearray construction with encoding #142243

Uh oh!

cmaloney commented Dec 3, 2025•
edited by bedevere-app bot
Loading

Uh oh!

cmaloney commented Dec 11, 2025

Uh oh!

vstinner left a comment

Uh oh!

Uh oh!

vstinner commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

gh-139871: Optimize bytearray construction with encoding#142243

gh-139871: Optimize bytearray construction with encoding #142243

Uh oh!

Conversation

cmaloney commented Dec 3, 2025• edited by bedevere-app botLoading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmaloney commented Dec 11, 2025

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vstinner commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cmaloney commented Dec 3, 2025•
edited by bedevere-app bot
Loading