Skip to content

Conversation

@cmaloney
Copy link
Contributor

@cmaloneycmaloney commented Dec 3, 2025

When a str is encoded in bytearray.__init__ the encoder tends to create a new unique bytes object. Rather than allocate new memory and copy the bytes use the already created bytes object as bytearray backing. The bigger the str the bigger the saving.

Mean +- std dev: [main_encoding] 497 us +- 9 us -> [encoding] 14.2 us +- 0.3 us: 34.97x faster

importpyperfrunner=pyperf.Runner() runner.timeit( name="encode", setup="a = 'a' * 1_000_000", stmt="bytearray(a, encoding='utf8')")

When a `str` is encoded in `bytearray.__init__` the encoder tends to create a new unique bytes object. Rather than allocate new memory and copy the bytes use the already created bytes object as bytearray backing. The bigger the `str` the bigger the saving. Mean +- std dev: [main_encoding] 497 us +- 9 us -> [encoding] 14.2 us +- 0.3 us: 34.97x faster ```python import pyperf runner = pyperf.Runner() runner.timeit( name="encode", setup="a = 'a' * 1_000_000", stmt="bytearray(a, encoding='utf8')") ```
@cmaloney
Copy link
ContributorAuthor

cc: @vstinner this construction form doesn't appear a lot in the CPython codebase but does exist in other codebases.

This one I think is a safe subset from #141862; hope to revisit that eventually but it's definitely many-step to get working just right.

Copy link
Member

@vstinnervstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It seems to be safe to pick the bytes object in this case.

@vstinnervstinner merged commit 14e6052 into python:mainDec 15, 2025
83 of 85 checks passed
@vstinner
Copy link
Member

Merged, thanks.

@cmaloneycmaloney deleted the ba_tb_encoding branch December 15, 2025 17:01
fatelei pushed a commit to fatelei/cpython that referenced this pull request Dec 16, 2025
…n#142243) When a `str` is encoded in `bytearray.__init__` the encoder tends to create a new unique bytes object. Rather than allocate new memory and copy the bytes use the already created bytes object as bytearray backing. The bigger the `str` the bigger the saving. Mean +- std dev: [main_encoding] 497 us +- 9 us -> [encoding] 14.2 us +- 0.3 us: 34.97x faster ```python import pyperf runner = pyperf.Runner() runner.timeit( name="encode", setup="a = 'a' * 1_000_000", stmt="bytearray(a, encoding='utf8')") ```
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

@cmaloney@vstinner