Skip to content

Conversation

@cmaloney
Copy link
Contributor

@cmaloneycmaloney commented Feb 1, 2025

Add bytearray.resize() which wraps PyByteArray_Resize.

Make negative size passed to resize exception/error rather than crash in optimized builds.


📚 Documentation preview 📚: https://cpython-previews--129560.org.readthedocs.build/

Add `bytearray.resize()` which wraps `PyByteArray_Resize`
@cmaloney
Copy link
ContributorAuthor

@vstinner I think this is ready for review if bytearray.resize seems like a reasonable feature add.

@vstinner
Copy link
Member

Please initialize new bytes to zero. We don't accept undefined behavior in Python.

cmaloneyand others added 2 commits February 2, 2025 09:55
Co-authored-by: Victorien <65306057+Viicos@users.noreply.github.com>
@vstinner
Copy link
Member

I'm not sure that adding this API is needed since there is already a way to truncate a bytearray and to extend a bytearray using the existing API. You should run a benchmark to show that it's worth it, especially to extend a bytearray.

@cmaloney
Copy link
ContributorAuthor

What is the way to extend the bytearray without copying byte by byte or having the length of the extension extra memory allocated?

@vstinner
Copy link
Member

What is the way to extend the bytearray without copying byte by byte or having the length of the extension extra memory allocated?

What you say. In short, ba[len(ba):] = b'\0' * extend.

Please run a benchmark to measure this compared to ba.extend().

Copy link
Member

@ZeroIntensityZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation mostly looks fine, but my primary concern is that we are indeed exposing an implementation detail on our end. Alternate Python implementations that either don't use an internal buffer, or can't control the size of it, will have a hard time effectively providing resize.

For example, Brython implements bytearray using JavaScript arrays, which don't have an explicit way to resize. I don't think they'll like this method.

I'd go for one of two options:

  • Mark resize as a CPython implementation detail.
  • Make resize a hint, allowing the runtime to ignore the request for resizing if it wants to.

I'd personally choose the latter, but I'd also like to hear Victor's thoughts.

@vstinner
Copy link
Member

@ZeroIntensity: It's already possible to implement extend() with the current bytearray API, it's just a convenient helper for: ba[len(ba):] = b'\0' * extend. So I don't see why other Python implementation couldn't implement it. They should already be able to resize a bytearray.

There are many ways to "extend" a bytearray. Another example:

>>>ba=bytearray(b'abc') >>>ba+=b'def'>>>ba, len(ba) (bytearray(b'abcdef'), 6)

@ZeroIntensity
Copy link
Member

Yeah, but I think that's a bit different than an explicit "resize." You can emulate it with that kind of thing, but it feels different than an actual method saying it will reallocate.

@vstinner
Copy link
Member

The method documentation should not promise any performance guarantee in general, it can be implemented in different ways (the implementation doesn't matter).

@ZeroIntensity
Copy link
Member

Can we clarify that with a CPython implementation detail note?

@cmaloney
Copy link
ContributorAuthor

cmaloney commented Feb 4, 2025

@ZeroIntensity can definitely add a note of "This is equivalent to bytearray += b'\0' * size". For more efficiency in JavaScript in particular I think this could map to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/fill -- nevermind, just sets elements to 0, can't append.

(Planning to do next iteration + perf testing follow up today, yesterday got sucked into a BytesIO.readfrom(file, *, limit: int | None = None, expected: int | None = None) experiment)

cmaloneyand others added 2 commits February 4, 2025 10:23
Co-authored-by: Peter Bierma <zintensitydev@gmail.com> Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Victor Stinner <vstinner@python.org>
@vstinner
Copy link
Member

test_bytes and test_capi are failing, you need to update the exception.

@cmaloney
Copy link
ContributorAuthor

Working on them; Unfortunate part of "add suggestions", it adds "this person contributed" but doesn't let you do that + local changes before making push visible / alerting reviewers... Should have soon

.. method:: resize(size)

Resize the :class:`bytearray` to contain *size* bytes.
If :class:`bytearray` needs to grow, all new bytes will be set to null bytes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also describe the "obvious-to-us" behavior of what happens when it shrinks: the data at the end is truncated, the remaining data should be equivalent to a [:size] slice.

... and consider if the Python version should also support negative size to mean the same thing it would in a slice notation.
if (size < 0): size = max(0, len(self) + size) ?

Copy link
ContributorAuthor

@cmaloneycmaloneyFeb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for the truncation.

I don't like negative, because the function operates in absolute buffer size, and requesting a negative buffer size sounds like a bug I'd write and I'd prefer an exception / exit rather than debug the symptom "my buffer shrank".

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my original equivalent to is wrong, which make it look like a delta, current doc one I'm validating:

iflen(self) >size: delself[size:] else: self+=b'\0'* (size-len(self))

@cmaloney
Copy link
ContributorAuthor

cmaloney commented Feb 5, 2025

I think I've addressed all review comments.

Tested performance on an optimized build (--with-lto --enable-optimizations)

$ python bm_bytearray_resize.py ..................... extend_bytes: Mean +- std dev: 142 ms +- 1 ms ..................... extend_range: Mean +- std dev: 1.89 sec +- 0.07 sec ..................... iadd: Mean +- std dev: 143 ms +- 1 ms ..................... resize: Mean +- std dev: 48.7 ms +- 0.5 ms 
importitertoolsimportpyperfrunner=pyperf.Runner() # Aiming for what a file readall does in terms of resizing as get more data.# Non-linear resizing up to a reaonably large read size.blocksize=1024sizes= [ blocksize, blocksize*4, blocksize*16, 1_000_000, 100_000_000, 1_000_000_000, ] defextend_bytes(): """Extend by appending a temporary byte string"""ba=bytearray() forsizeinsizes: ba.extend(b'\0'*size) defextend_range(): """Resize by an iterator with length_hint"""ba=bytearray() forsizeinsizes: ba.extend(itertools.repeat(0, size)) defiadd(): """Resize by using += """ba=bytearray() forsizeinsizes: ba+=b'\0'*sizedefresize(): """Use resize"""ba=bytearray() forsizeinsizes: ba.resize(size) runner.bench_func("extend_bytes", extend_bytes) runner.bench_func("extend_range", extend_range) runner.bench_func("iadd", iadd) runner.bench_func("resize", resize)

Comment on lines 2863 to 2870
>>> shrink =bytearray(5)
>>> resize(shrink, 0)
>>> shrink
bytearray(b'')
>>> grow =bytearray(2)
>>> resize(grow, 7)
>>> grow
bytearray(b'\x00\x00\x00\x00\x00\x00\x00')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to use resize in examples:

Suggested change
>>> shrink =bytearray(5)
>>> resize(shrink, 0)
>>> shrink
bytearray(b'')
>>> grow =bytearray(2)
>>> resize(grow, 7)
>>> grow
bytearray(b'\x00\x00\x00\x00\x00\x00\x00')
Examples:
>>> shrink =bytearray(b'abc')
>>> shrink.resize(1)
>>> (shrink, len(shrink))
(bytearray(b'a'), 1)
>>> grow =bytearray(b'abc')
>>> grow.resize(5)
>>> (grow, len(grow))
(bytearray(b'abc\x00\x00'), 5)

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Making locally so I can get the doctest passing)

@vstinner
Copy link
Member

The change mostly LGTM. I just made a new review for some details.

extend_bytes: Mean +- std dev: 142 ms +- 1 ms
extend_range: Mean +- std dev: 1.89 sec +- 0.07 sec
iadd: Mean +- std dev: 143 ms +- 1 ms
resize: Mean +- std dev: 48.7 ms +- 0.5 ms

Thanks for the benchmark. So resize is 2.9x faster than the existing fastest method to grow a bytearray (extend_bytes). So it's worth it to add a dedicated method.

@gpshead
Copy link
Member

Nice, I'm not surprised it is faster. Though what I care more about is that code using this is more readable than the other hacks. :)

@gpsheadgpshead merged commit 5fb019f into python:mainFeb 5, 2025
46 checks passed
@cmaloneycmaloney deleted the resize_pr branch February 5, 2025 20:03
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Feb 7, 2025
Add bytearray.resize() which wraps PyByteArray_Resize. Make negative size passed to resize exception/error rather than crash in optimized builds.
cmaloney added a commit to cmaloney/cpython that referenced this pull request Feb 8, 2025
Add bytearray.resize() which wraps PyByteArray_Resize. Make negative size passed to resize exception/error rather than crash in optimized builds.
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

@cmaloney@vstinner@ZeroIntensity@gpshead@Viicos