Skip to content

Conversation

@cmaloney
Copy link
Contributor

@cmaloneycmaloney commented Jun 19, 2024

This reduces the system call count of a simple program (see commits) that reads all the .rst files in Doc by over 10% (5706 -> 4734 system calls on my linux system, 5813 -> 4875 on my macOS)

This reduces the number of fstat() calls always and seek calls most the time. Stat was always called twice, once at open (to error early on directories), and a second time to get the size of the file to be able to read the whole file in one read. Now the size is cached with the first call.

This reduces the system call count of a simple program[0] that reads all the `.rst` files in Doc by over 10% (5706 -> 4734 system calls on my linux system, 5813 -> 4875 on my macOS) This reduces the number of `fstat()` calls always and seek calls most the time. Stat was always called twice, once at open (to error early on directories), and a second time to get the size of the file to be able to read the whole file in one read. Now the size is cached with the first call. The code keeps an optimization that if the user had previously read a lot of data, the current position is subtracted from the number of bytes to read. That is somewhat expensive so only do it on larger files, otherwise just try and read the extra bytes and resize the PyBytes as needeed. I built a little test program to validate the behavior + assumptions around relative costs and then ran it under `strace` to get a log of the system calls. Full samples below[1]. After the changes, this is everything in one `filename.read_text()`: ```python3 openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3` fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0` ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` This does make some tradeoffs 1. If the file size changes between open() and readall(), this will still get all the data but might have more read calls. 2. I experimented with avoiding the stat + cached result for small files in general, but on my dev workstation at least that tended to reduce performance compared to using the fstat(). [0] ```python3 from pathlib import Path nlines = [] for filename in Path("cpython/Doc").glob("**/*.rst"): nlines.append(len(filename.read_text())) ``` [1] before small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` after small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` before large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3,{st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` after large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ```
@ghost
Copy link

ghost commented Jun 19, 2024

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-app
Copy link

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@nineteendo
Copy link
Contributor

nineteendo commented Jun 19, 2024

Could you add some tests? And share benchmark results compared against the main branch?

@cmaloney
Copy link
ContributorAuthor

Is there a standard way to add tests for "this set of system calls is made" or "this many system calls is made"? I tried hunting through the existing tests but couldn't find anything like that or a good way to do that for underlying C code. Would definitely be nice to have a test around open().read() doesn't get more system calls added unintentionally.

re: Benchmarking, I did some with a test program and included details in the initial commit: 78c4de0, wall clock on my dev machine changes were generally in the noise. Happy to work on running a more general suite.

@nineteendo
Copy link
Contributor

nineteendo commented Jun 19, 2024

I simply meant to test that the code still works correctly with the changes you made.

Set up git worktree, build the main branch and readall_faster and then run the benchmark for both buillds.

@cmaloney
Copy link
ContributorAuthor

For testing, the existing test_fileio checks basic behavior of .read() (https://github.com/python/cpython/blob/main/Lib/test/test_fileio.py#L133-L139). As an additional check I ran the test program from the first commit under strace and diffed the call log, validating in the diff all the read() calls were the same, and that changes to fstat() and lseek() calls were as expected.

@cmaloney
Copy link
ContributorAuthor

cmaloney commented Jun 20, 2024

I ran pyperformance benchmark and didn't get any big swings / just noise. Writing a little pyperf benchmark around "read whole file:

importpyperffrompathlibimportPathdefread_file(path_obj): path_obj.read_text() runner=pyperf.Runner() runner.bench_func('read_file_small', read_file, Path("Doc/howto/clinic.rst")) runner.bench_func('read_file_large', read_file, Path("Doc/c-api/typeobj.rst"))

cmaloney/readall_faster

..................... read_file_small: Mean +- std dev: 7.92 us +- 0.07 us ..................... read_file_large: Mean +- std dev: 21.2 us +- 0.6 us 

main

python ../benchmark.py ..................... read_file_small: Mean +- std dev: 8.43 us +- 0.12 us ..................... read_file_large: Mean +- std dev: 24.0 us +- 0.4 us 

for my particular Mac

@hauntsaninja
Copy link
Contributor

Thanks, this is excellent.
Regarding writing a test, not sure there's really a standard thing, but you could pattern match

@unittest.skipIf(sys.platform!="linux", "Linux only, requires strace.")

@cmaloney
Copy link
ContributorAuthor

@hauntsaninja planning to try and make a separate PR for that (list item per what I think will be separate commits)

  1. Pull out the skip if not linux, is strace available + run this under strace + parse strace results to test.support
  2. Change the existing test over to it
  3. Add a new test for file reading (small_file, big_file x binary, text). Extend the strace helper pieces to have "marker" support so I can separate out the "read file" I want from interpreter startup (which reads lots of imports using the same code)

Then use that infrastructure here (So this PR will get a merge commit + new commit which updates the test for the less system calls). I don't think that needs a separate GH Issue to track, if it does let me know.

Copy link
Member

@picnixzpicnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a standard way to add tests for "this set of system calls is made" or "this many system calls is made

The tests for IO are spread around multiple files but I think test_fileio is the best one for that. If you want to emulate the number of calls being made, you could try to align the Python implementation with the C implementation (which is usually what we try to achieve). Note that the python implementation calls read/lseek/fstat directly for FileIO, so you may also try to mock them as well. For the C implementation, yes, the strace alternative is probably the best, but I think it's a nice idea to see whether you could also improve the Python implementation itself.

@cmaloney
Copy link
ContributorAuthor

re: _pyio I'll look at how far its behavior is currently from _io when I add the system call test. I would like not to pull getting them to match into the scope of work for this PR. Longer term I would really like to make os.read and all the I/O layers on top fast and more python native as I think that could enable some really cool potential optimizations, like constant folding away compiler-checked redundant checks, while also making the code more legible and debuggable. Currently at least in the code I read things like the _pyio buffer resizing is fairly different between _io and _pyio.

Copy link
Contributor

@hauntsaninjahauntsaninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good to me!
It might make sense to just use DEFAULT_BUFFER_SIZE for your threshold, especially so if #118144 is merged
I agree that pyio and a strace test can be done in another PR
I requested review from Serhiy in case he has time to take a look, if not I'll merge soon

@cmaloney
Copy link
ContributorAuthor

Re: DEFAULT_BUFFER_SIZE, I actually experimented with "just allocate and try and try to read DEFAULT_BUFFER_SIZE always", and found that for both small and large files it was slower. Not entirely sure what the slowdown was, but led me to the "cache the size" approach which is uniformly faster. Definitely an interesting constant to raise, and I think fairly important on the write side. Would be curious to see numbers for read.

@cmaloney
Copy link
ContributorAuthor

cmaloney commented Jun 29, 2024

Updated with changes to make _pyio.FileIO system calls match, tested locally with added strace syscall test #121143 (diff on top of that PR to get to passing with these changes 0606677)

size_t is too small (and read would cap it anyways) to read the whole file
Copy link
Contributor

@erlend-aaslanderlend-aasland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

A minor nit regarding the comments: I'm going to align them to the existing style used in this file; hope you don't mind :)

unsigned intclosefd : 1;
charfinalizing;
unsigned intblksize;
Py_off_tsize_estimated;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to use the same name in the C and Python implementation, I suggest to rename this member to: estimated_size.

bufsize=_PY_READ_MAX;
}
else{
bufsize=Py_SAFE_DOWNCAST(end, Py_off_t, size_t) +1;
Copy link
Member

@vstinnervstinnerJul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this cast is safe, Py_off_t can be bigger than size_t. You should do something like:

bufsize= (size_t)Py_MIN(end, SIZE_MAX); bufsize++;

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into issues in test_largefile on Windows x86 which caused me to add this. Py_off_t is long long on that while size_t is int

#ifdefMS_WINDOWS
/* Windows uses long long for offsets */
typedeflong longPy_off_t;
# definePyLong_AsOff_t PyLong_AsLongLong
# definePyLong_FromOff_t PyLong_FromLongLong
# definePY_OFF_T_MAX LLONG_MAX
# definePY_OFF_T_MIN LLONG_MIN
# definePY_OFF_T_COMPAT long long /* type compatible with off_t */
# definePY_PRIdOFF "lld" /* format to use for that type */
#else

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oop, misread this. The if end >= _PY_READ_MAX just before should catch this. (_PY_READ_MAX <= SIZE_MAX).

https://github.com/python/cpython/blob/main/Include/internal/pycore_fileutils.h#L65-L76

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, in fact the maximum is PY_SSIZE_T_MAX:

bufsize= (size_t)Py_MIN(end, PY_SSIZE_T_MAX); if (bufsize<PY_SSIZE_T_MAX){bufsize++}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, replace bufsize = Py_SAFE_DOWNCAST(end, Py_off_t, size_t) + 1; with just bufsize = (size_t)end + 1;. I just dislike Py_SAFE_DOWNCAST() macro, it's not safe, the name is misleading.

ifself._estimated_size<=0:
bufsize=DEFAULT_BUFFER_SIZE
else:
bufsize=self._estimated_size+1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of the "+1"? It may overallocate 1 byte which is inefficient.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The read loop currently needs to do a os.read() / _py_Read which is a single byte which returns 0 size to find the end of the file and exit the loop. The very beginning of that loop does a check for "if buffer is full, grow buffer" so not over-allocating by one byte results in a much bigger allocation by that. In the _io case it then shrinks it back down at the end, whereas in the _pyio case the EOF read is never appended.

Could avoid the extra byte by writing a specialized "read known size" (w/ fallback to "read until EOF"), but was trying to avoid making more variants of the read loop and limit risk a bit.

As an aside: the _pyio implementation seems to have a lot of extra memory allocation and copy in the default case because os.read() internally allocates a buffer which it then copies into its bytearray...

Copy link
ContributorAuthor

@cmaloneycmaloney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll work on renaming the members to be consistent tomorrow

bufsize=_PY_READ_MAX;
}
else{
bufsize=Py_SAFE_DOWNCAST(end, Py_off_t, size_t) +1;
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oop, misread this. The if end >= _PY_READ_MAX just before should catch this. (_PY_READ_MAX <= SIZE_MAX).

https://github.com/python/cpython/blob/main/Include/internal/pycore_fileutils.h#L65-L76

ifself._estimated_size<=0:
bufsize=DEFAULT_BUFFER_SIZE
else:
bufsize=self._estimated_size+1
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The read loop currently needs to do a os.read() / _py_Read which is a single byte which returns 0 size to find the end of the file and exit the loop. The very beginning of that loop does a check for "if buffer is full, grow buffer" so not over-allocating by one byte results in a much bigger allocation by that. In the _io case it then shrinks it back down at the end, whereas in the _pyio case the EOF read is never appended.

Could avoid the extra byte by writing a specialized "read known size" (w/ fallback to "read until EOF"), but was trying to avoid making more variants of the read loop and limit risk a bit.

As an aside: the _pyio implementation seems to have a lot of extra memory allocation and copy in the default case because os.read() internally allocates a buffer which it then copies into its bytearray...

Copy link
ContributorAuthor

@cmaloneycmaloney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per review, update range checks to be more clear and accurate

Co-authored-by: Victor Stinner <vstinner@python.org>
Copy link
Member

@vstinnervstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vstinnervstinner merged commit 2f5f19e into python:mainJul 4, 2024
@vstinner
Copy link
Member

Merged, thank you. It's a nice optimization.

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot AMD64 Ubuntu Shared 3.x has failed when building commit 2f5f19e.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/506/builds/8282) and take a look at the build logs.
  4. Check if the failure is related to this commit (2f5f19e) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/506/builds/8282

Failed tests:

  • test_largefile

Failed subtests:

  • test_truncate - test.test_largefile.CLargeFileTest.test_truncate

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last): File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_largefile.py", line 144, in test_truncateself.assertEqual(len(f.read()), 1) # else wasn't truncated~~~~~~^^ MemoryError 

@vstinner
Copy link
Member

@cmaloney: Oh, test_largefile failed. Can you investigate?

@cmaloney
Copy link
ContributorAuthor

[1/1/1] test_largefile failed (1 error) Re-running test_largefile in verbose mode (matching: test_truncate) test_truncate (test.test_largefile.CLargeFileTest.test_truncate) ... ERROR test_truncate (test.test_largefile.PyLargeFileTest.test_truncate) ... ok ====================================================================== ERROR: test_truncate (test.test_largefile.CLargeFileTest.test_truncate) ---------------------------------------------------------------------- Traceback (most recent call last): File "/srv/buildbot/buildarea/3.x.bolen-ubuntu/build/Lib/test/test_largefile.py", line 144, in test_truncate self.assertEqual(len(f.read()), 1) # else wasn't truncated ~~~~~~^^ MemoryError ---------------------------------------------------------------------- Ran 2 tests in 0.005s FAILED (errors=1) test test_largefile failed 1 test failed again: test_largefile 

Looks like just the C implementation (CLargeFileTest) failed after truncate + seek on a very large file (https://github.com/python/cpython/blob/main/Lib/test/test_largefile.py#L144). My guess would be an underflow/overflow. In this PR I updated _pyio on truncate to set a new estimated size. Making FileIO match that or updating both _pyio and _iomodule if .truncate is used to set estimated_size to -1 so it will use default buffer size + size increasing logic will likely fix. Going to experiment/try and reproduce locally on my AMD64 Arch

@cmaloneycmaloney deleted the cmaloney/readall_faster branch July 4, 2024 08:53
@cmaloney
Copy link
ContributorAuthor

@vstinner I think #121357 will fix the failure, although I'm unable to reproduce locally so far. estimated_size definitely in this case is significantly larger than the actual file size, and that results in a much bigger than necessary allocation which on a memory constrained machine could lead to an OOM / MemoryError. #121357 reduces maximum resident set size from 2464692 kbytes to 24532 kbytes

cmaloney added a commit to cmaloney/cpython that referenced this pull request Jul 4, 2024
noahbkim pushed a commit to hudson-trading/cpython that referenced this pull request Jul 11, 2024
…se (python#120755) This reduces the system call count of a simple program[0] that reads all the `.rst` files in Doc by over 10% (5706 -> 4734 system calls on my linux system, 5813 -> 4875 on my macOS) This reduces the number of `fstat()` calls always and seek calls most the time. Stat was always called twice, once at open (to error early on directories), and a second time to get the size of the file to be able to read the whole file in one read. Now the size is cached with the first call. The code keeps an optimization that if the user had previously read a lot of data, the current position is subtracted from the number of bytes to read. That is somewhat expensive so only do it on larger files, otherwise just try and read the extra bytes and resize the PyBytes as needeed. I built a little test program to validate the behavior + assumptions around relative costs and then ran it under `strace` to get a log of the system calls. Full samples below[1]. After the changes, this is everything in one `filename.read_text()`: ```python3 openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3` fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0` ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` This does make some tradeoffs 1. If the file size changes between open() and readall(), this will still get all the data but might have more read calls. 2. I experimented with avoiding the stat + cached result for small files in general, but on my dev workstation at least that tended to reduce performance compared to using the fstat(). [0] ```python3 from pathlib import Path nlines = [] for filename in Path("cpython/Doc").glob("**/*.rst"): nlines.append(len(filename.read_text())) ``` [1] Before small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` After small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` Before large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3,{st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` After large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com> Co-authored-by: Victor Stinner <vstinner@python.org>
estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024
…se (python#120755) This reduces the system call count of a simple program[0] that reads all the `.rst` files in Doc by over 10% (5706 -> 4734 system calls on my linux system, 5813 -> 4875 on my macOS) This reduces the number of `fstat()` calls always and seek calls most the time. Stat was always called twice, once at open (to error early on directories), and a second time to get the size of the file to be able to read the whole file in one read. Now the size is cached with the first call. The code keeps an optimization that if the user had previously read a lot of data, the current position is subtracted from the number of bytes to read. That is somewhat expensive so only do it on larger files, otherwise just try and read the extra bytes and resize the PyBytes as needeed. I built a little test program to validate the behavior + assumptions around relative costs and then ran it under `strace` to get a log of the system calls. Full samples below[1]. After the changes, this is everything in one `filename.read_text()`: ```python3 openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3` fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0` ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` This does make some tradeoffs 1. If the file size changes between open() and readall(), this will still get all the data but might have more read calls. 2. I experimented with avoiding the stat + cached result for small files in general, but on my dev workstation at least that tended to reduce performance compared to using the fstat(). [0] ```python3 from pathlib import Path nlines = [] for filename in Path("cpython/Doc").glob("**/*.rst"): nlines.append(len(filename.read_text())) ``` [1] Before small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` After small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` Before large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3,{st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` After large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com> Co-authored-by: Victor Stinner <vstinner@python.org>
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Speed up open().read() pattern by reducing the number of system calls

7 participants

@cmaloney@nineteendo@hauntsaninja@vstinner@bedevere-bot@picnixz@erlend-aasland