Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
gh-120754: Reduce system calls in full-file readall case#120755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
cmaloney commented Jun 19, 2024 • edited by bedevere-app bot
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by bedevere-app bot
Uh oh!
There was an error while loading. Please reload this page.
This reduces the system call count of a simple program[0] that reads all the `.rst` files in Doc by over 10% (5706 -> 4734 system calls on my linux system, 5813 -> 4875 on my macOS) This reduces the number of `fstat()` calls always and seek calls most the time. Stat was always called twice, once at open (to error early on directories), and a second time to get the size of the file to be able to read the whole file in one read. Now the size is cached with the first call. The code keeps an optimization that if the user had previously read a lot of data, the current position is subtracted from the number of bytes to read. That is somewhat expensive so only do it on larger files, otherwise just try and read the extra bytes and resize the PyBytes as needeed. I built a little test program to validate the behavior + assumptions around relative costs and then ran it under `strace` to get a log of the system calls. Full samples below[1]. After the changes, this is everything in one `filename.read_text()`: ```python3 openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3` fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0` ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` This does make some tradeoffs 1. If the file size changes between open() and readall(), this will still get all the data but might have more read calls. 2. I experimented with avoiding the stat + cached result for small files in general, but on my dev workstation at least that tended to reduce performance compared to using the fstat(). [0] ```python3 from pathlib import Path nlines = [] for filename in Path("cpython/Doc").glob("**/*.rst"): nlines.append(len(filename.read_text())) ``` [1] before small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` after small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` before large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3,{st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` after large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ```ghost commented Jun 19, 2024 • edited by ghost
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by ghost
Uh oh!
There was an error while loading. Please reload this page.
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
nineteendo commented Jun 19, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Could you add some tests? And share benchmark results compared against the main branch? |
cmaloney commented Jun 19, 2024
Is there a standard way to add tests for "this set of system calls is made" or "this many system calls is made"? I tried hunting through the existing tests but couldn't find anything like that or a good way to do that for underlying C code. Would definitely be nice to have a test around re: Benchmarking, I did some with a test program and included details in the initial commit: 78c4de0, wall clock on my dev machine changes were generally in the noise. Happy to work on running a more general suite. |
nineteendo commented Jun 19, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
I simply meant to test that the code still works correctly with the changes you made. Set up git worktree, build the main branch and |
cmaloney commented Jun 20, 2024
For testing, the existing test_fileio checks basic behavior of |
cmaloney commented Jun 20, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
I ran pyperformance benchmark and didn't get any big swings / just noise. Writing a little pyperf benchmark around "read whole file: importpyperffrompathlibimportPathdefread_file(path_obj): path_obj.read_text() runner=pyperf.Runner() runner.bench_func('read_file_small', read_file, Path("Doc/howto/clinic.rst")) runner.bench_func('read_file_large', read_file, Path("Doc/c-api/typeobj.rst"))cmaloney/readall_faster main for my particular Mac |
hauntsaninja commented Jun 27, 2024
Thanks, this is excellent. cpython/Lib/test/test_subprocess.py Line 3437 in 0152dc4
|
cmaloney commented Jun 27, 2024
@hauntsaninja planning to try and make a separate PR for that (list item per what I think will be separate commits)
Then use that infrastructure here (So this PR will get a merge commit + new commit which updates the test for the less system calls). I don't think that needs a separate GH Issue to track, if it does let me know. |
picnixz left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a standard way to add tests for "this set of system calls is made" or "this many system calls is made
The tests for IO are spread around multiple files but I think test_fileio is the best one for that. If you want to emulate the number of calls being made, you could try to align the Python implementation with the C implementation (which is usually what we try to achieve). Note that the python implementation calls read/lseek/fstat directly for FileIO, so you may also try to mock them as well. For the C implementation, yes, the strace alternative is probably the best, but I think it's a nice idea to see whether you could also improve the Python implementation itself.
Uh oh!
There was an error while loading. Please reload this page.
cmaloney commented Jun 27, 2024
re: _pyio I'll look at how far its behavior is currently from _io when I add the system call test. I would like not to pull getting them to match into the scope of work for this PR. Longer term I would really like to make |
hauntsaninja left a comment • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks good to me!
It might make sense to just use DEFAULT_BUFFER_SIZE for your threshold, especially so if #118144 is merged
I agree that pyio and a strace test can be done in another PR
I requested review from Serhiy in case he has time to take a look, if not I'll merge soon
Misc/NEWS.d/next/Core and Builtins/2024-06-19-19-54-35.gh-issue-120754.uF29sj.rst Outdated Show resolvedHide resolved
Uh oh!
There was an error while loading. Please reload this page.
…e-120754.uF29sj.rst
cmaloney commented Jun 28, 2024
Re: DEFAULT_BUFFER_SIZE, I actually experimented with "just allocate and try and try to read DEFAULT_BUFFER_SIZE always", and found that for both small and large files it was slower. Not entirely sure what the slowdown was, but led me to the "cache the size" approach which is uniformly faster. Definitely an interesting constant to raise, and I think fairly important on the write side. Would be curious to see numbers for read. |
cmaloney commented Jun 29, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
size_t is too small (and read would cap it anyways) to read the whole file
erlend-aasland left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
A minor nit regarding the comments: I'm going to align them to the existing style used in this file; hope you don't mind :)
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Modules/_io/fileio.c Outdated
| unsigned intclosefd : 1; | ||
| charfinalizing; | ||
| unsigned intblksize; | ||
| Py_off_tsize_estimated; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to use the same name in the C and Python implementation, I suggest to rename this member to: estimated_size.
Modules/_io/fileio.c Outdated
| bufsize=_PY_READ_MAX; | ||
| } | ||
| else{ | ||
| bufsize=Py_SAFE_DOWNCAST(end, Py_off_t, size_t) +1; |
vstinnerJul 2, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this cast is safe, Py_off_t can be bigger than size_t. You should do something like:
bufsize= (size_t)Py_MIN(end, SIZE_MAX); bufsize++;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into issues in test_largefile on Windows x86 which caused me to add this. Py_off_t is long long on that while size_t is int
cpython/Modules/_io/_iomodule.h
Lines 95 to 106 in 4f1e1df
| #ifdefMS_WINDOWS | |
| /* Windows uses long long for offsets */ | |
| typedeflong longPy_off_t; | |
| # definePyLong_AsOff_t PyLong_AsLongLong | |
| # definePyLong_FromOff_t PyLong_FromLongLong | |
| # definePY_OFF_T_MAX LLONG_MAX | |
| # definePY_OFF_T_MIN LLONG_MIN | |
| # definePY_OFF_T_COMPAT long long /* type compatible with off_t */ | |
| # definePY_PRIdOFF "lld" /* format to use for that type */ | |
| #else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oop, misread this. The if end >= _PY_READ_MAX just before should catch this. (_PY_READ_MAX <= SIZE_MAX).
https://github.com/python/cpython/blob/main/Include/internal/pycore_fileutils.h#L65-L76
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, in fact the maximum is PY_SSIZE_T_MAX:
bufsize= (size_t)Py_MIN(end, PY_SSIZE_T_MAX); if (bufsize<PY_SSIZE_T_MAX){bufsize++}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, replace bufsize = Py_SAFE_DOWNCAST(end, Py_off_t, size_t) + 1; with just bufsize = (size_t)end + 1;. I just dislike Py_SAFE_DOWNCAST() macro, it's not safe, the name is misleading.
| ifself._estimated_size<=0: | ||
| bufsize=DEFAULT_BUFFER_SIZE | ||
| else: | ||
| bufsize=self._estimated_size+1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of the "+1"? It may overallocate 1 byte which is inefficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The read loop currently needs to do a os.read() / _py_Read which is a single byte which returns 0 size to find the end of the file and exit the loop. The very beginning of that loop does a check for "if buffer is full, grow buffer" so not over-allocating by one byte results in a much bigger allocation by that. In the _io case it then shrinks it back down at the end, whereas in the _pyio case the EOF read is never appended.
Could avoid the extra byte by writing a specialized "read known size" (w/ fallback to "read until EOF"), but was trying to avoid making more variants of the read loop and limit risk a bit.
As an aside: the _pyio implementation seems to have a lot of extra memory allocation and copy in the default case because os.read() internally allocates a buffer which it then copies into its bytearray...
cmaloney left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll work on renaming the members to be consistent tomorrow
Modules/_io/fileio.c Outdated
| bufsize=_PY_READ_MAX; | ||
| } | ||
| else{ | ||
| bufsize=Py_SAFE_DOWNCAST(end, Py_off_t, size_t) +1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oop, misread this. The if end >= _PY_READ_MAX just before should catch this. (_PY_READ_MAX <= SIZE_MAX).
https://github.com/python/cpython/blob/main/Include/internal/pycore_fileutils.h#L65-L76
| ifself._estimated_size<=0: | ||
| bufsize=DEFAULT_BUFFER_SIZE | ||
| else: | ||
| bufsize=self._estimated_size+1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The read loop currently needs to do a os.read() / _py_Read which is a single byte which returns 0 size to find the end of the file and exit the loop. The very beginning of that loop does a check for "if buffer is full, grow buffer" so not over-allocating by one byte results in a much bigger allocation by that. In the _io case it then shrinks it back down at the end, whereas in the _pyio case the EOF read is never appended.
Could avoid the extra byte by writing a specialized "read known size" (w/ fallback to "read until EOF"), but was trying to avoid making more variants of the read loop and limit risk a bit.
As an aside: the _pyio implementation seems to have a lot of extra memory allocation and copy in the default case because os.read() internally allocates a buffer which it then copies into its bytearray...
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
cmaloney left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per review, update range checks to be more clear and accurate
Co-authored-by: Victor Stinner <vstinner@python.org>
vstinner left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
vstinner commented Jul 4, 2024
Merged, thank you. It's a nice optimization. |
bedevere-bot commented Jul 4, 2024
|
vstinner commented Jul 4, 2024
@cmaloney: Oh, test_largefile failed. Can you investigate? |
cmaloney commented Jul 4, 2024
Looks like just the C implementation ( |
cmaloney commented Jul 4, 2024
@vstinner I think #121357 will fix the failure, although I'm unable to reproduce locally so far. |
…se (python#120755) This reduces the system call count of a simple program[0] that reads all the `.rst` files in Doc by over 10% (5706 -> 4734 system calls on my linux system, 5813 -> 4875 on my macOS) This reduces the number of `fstat()` calls always and seek calls most the time. Stat was always called twice, once at open (to error early on directories), and a second time to get the size of the file to be able to read the whole file in one read. Now the size is cached with the first call. The code keeps an optimization that if the user had previously read a lot of data, the current position is subtracted from the number of bytes to read. That is somewhat expensive so only do it on larger files, otherwise just try and read the extra bytes and resize the PyBytes as needeed. I built a little test program to validate the behavior + assumptions around relative costs and then ran it under `strace` to get a log of the system calls. Full samples below[1]. After the changes, this is everything in one `filename.read_text()`: ```python3 openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3` fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0` ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` This does make some tradeoffs 1. If the file size changes between open() and readall(), this will still get all the data but might have more read calls. 2. I experimented with avoiding the stat + cached result for small files in general, but on my dev workstation at least that tended to reduce performance compared to using the fstat(). [0] ```python3 from pathlib import Path nlines = [] for filename in Path("cpython/Doc").glob("**/*.rst"): nlines.append(len(filename.read_text())) ``` [1] Before small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` After small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` Before large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3,{st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` After large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com> Co-authored-by: Victor Stinner <vstinner@python.org>
…se (python#120755) This reduces the system call count of a simple program[0] that reads all the `.rst` files in Doc by over 10% (5706 -> 4734 system calls on my linux system, 5813 -> 4875 on my macOS) This reduces the number of `fstat()` calls always and seek calls most the time. Stat was always called twice, once at open (to error early on directories), and a second time to get the size of the file to be able to read the whole file in one read. Now the size is cached with the first call. The code keeps an optimization that if the user had previously read a lot of data, the current position is subtracted from the number of bytes to read. That is somewhat expensive so only do it on larger files, otherwise just try and read the extra bytes and resize the PyBytes as needeed. I built a little test program to validate the behavior + assumptions around relative costs and then ran it under `strace` to get a log of the system calls. Full samples below[1]. After the changes, this is everything in one `filename.read_text()`: ```python3 openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3` fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0` ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` This does make some tradeoffs 1. If the file size changes between open() and readall(), this will still get all the data but might have more read calls. 2. I experimented with avoiding the stat + cached result for small files in general, but on my dev workstation at least that tended to reduce performance compared to using the fstat(). [0] ```python3 from pathlib import Path nlines = [] for filename in Path("cpython/Doc").glob("**/*.rst"): nlines.append(len(filename.read_text())) ``` [1] Before small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` After small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` Before large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3,{st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` After large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3,{st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com> Co-authored-by: Victor Stinner <vstinner@python.org>
This reduces the system call count of a simple program (see commits) that reads all the
.rstfiles in Doc by over 10% (5706 -> 4734 system calls on my linux system, 5813 -> 4875 on my macOS)This reduces the number of
fstat()calls always and seek calls most the time. Stat was always called twice, once at open (to error early on directories), and a second time to get the size of the file to be able to read the whole file in one read. Now the size is cached with the first call.