Skip to content

Conversation

@cmaloney
Copy link
Contributor

@cmaloneycmaloney commented Feb 5, 2025

Utilize bytearray.resize() and os.readinto() to reduce copies and match behavior of _io.FileIO.readall().

There is still an extra copy which means twice the memory required compared to FileIO because there isn't a zero-copy path from bytearray -> bytes currently.

On my system reading a 2GB file
./python -m test -M8g -uall test_largefile -m test.test_largefile.PyLargeFileTest.test_large_read -v

Goes from ~2.7 seconds -> ~2.2 seconds. The C _io implementation is ~1.2 seconds, so still some performance gap, but less.

Utilize `bytearray.resize()` and `os.readinto()` to reduce copies and match behavior of `_io.FileIO.readall()`. There is still an extra copy which means twice the memory required compared to FileIO because there isn't a zero-copy path from `bytearray` -> `bytes` currently. On my system reading a 2GB file `./python -m test -M8g -uall test_largefile -m test.test_largefile.PyLargeFileTest.test_large_read -v` Goes from ~2.7 seconds -> ~2.2 seconds
@cmaloneycmaloney changed the title gh-12005: Align FileIO.readall between _pyio and _iogh-129005: Align FileIO.readall between _pyio and _ioFeb 5, 2025
result+=chunk

bytes_read+=n
result.resize(bytes_read)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result = memoryview(result)[bytes_read:] would avoid a truncation which can imply a memory copy in the worst case, no?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the resize "shrink" in bytearray doesn't actually resize unless the buffer's "capacity" is 2x the requested size (https://github.com/python/cpython/blob/main/Objects/bytearrayobject.c#L201-L214). Just updates its internal "this is how long the bytes is" counter (which for things like full-file readall with known size, this should already be just one byte over the right size).

My plan currently is to make it so bytes(bytearray(10)) and bytearray(b'\0' * 10) both don't copy (Ongoing discussion in https://discuss.python.org/t/add-zero-copy-conversion-of-bytearray-to-bytes-by-providing-bytes/79164). Having a memoryview would mean there's more than one reference to the bytearray, and I couldn't do / use that optimization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm fine with using result.resize() here.

Co-authored-by: Victor Stinner <vstinner@python.org>
@cmaloney
Copy link
ContributorAuthor

Hypothesis test failure in binascii / pretty sure unrelated

Lib/_pyio.py Outdated
bufsize+=max(bufsize, DEFAULT_BUFFER_SIZE)
n=bufsize-len(result)
ifbytes_read>=bufsize:
# Parallels _io/fileio.c new_buffersize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the C code, new_buffersize() argument is bytes_read, not bufsize. You may keep new_buffersize() as a private module-level function.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the loop to no longer use bufsize at all, this is the only line that used it, and it feels more Pythonic to me to just use len(result).

That enables rewriting to:

try: # Read until EOF (n == 0)whilen:=os.readinto(self._fd, memoryview(result)[bytes_read:]): bytes_read+=nifbytes_read>=len(result): result.resize(_new_buffersize(bytes_read)) exceptBlockingIOError: ifnotbytes_read: returnNoneassertlen(result) -bytes_read>=1, \ "os.readinto buffer size 0 will result in erroneous EOF / returns 0"result.resize(bytes_read) returnbytes(result)

which feels cleaner, but also starts changing structure relative to _io version.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decided to refactor to this. Control flow feels a lot simpler to me and a lot more readable than the branches and breaks.

@cmaloney
Copy link
ContributorAuthor

Tests / Windows / build and test (Win32) failure is a urllib.error.HTTPError : HTTP error 504: Gateway Timeout [D:\a\cpython\cpython\PCbuild\pythoncore.vcxproj], believe unrelated

Copy link
Member

@vstinnervstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vstinnervstinner merged commit a3d5aab into python:mainFeb 7, 2025
43 checks passed
@vstinner
Copy link
Member

Merged, thank you.

@cmaloneycmaloney deleted the fileio_readall branch February 7, 2025 18:46
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

@cmaloney@vstinner