Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
gh-119451: Fix a potential denial of service in http.client#119454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
serhiy-storchaka commented May 23, 2024 • edited by bedevere-app bot
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by bedevere-app bot
Uh oh!
There was an error while loading. Please reload this page.
Reading the whole body of the HTTP response could cause OOM if the Content-Length value is too large even if the server does not send a large amount of data. Now the HTTP client reads large data by chunks, therefore the amount of consumed memory is proportional to the amount of sent data.
gpshead commented May 24, 2024
I've marked this Draft for now as discussion on this on the security response team list is not complete. (we'll summarize that in a public issue once it has settled) |
encukou commented Jan 27, 2025
See #119514 (comment) for results of the PSRT discussion. |
Lib/http/client.py Outdated
| iflen(data) <cursize: | ||
| raiseIncompleteRead(data, amt-len(data)) | ||
| delta=min(cursize, amt-cursize) | ||
| data+=self.fp.read(cursize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When multiple 1 MB chunks have to be read and joined, io.BytesIO would consume less memory and do the job significantly faster.
Here's a result of a simple benchmark on Python 3.14.0:
Benchmarking with 2 iterations and chunk size 1048576 bytes... Concatenation (+=): Peak Memory = 4.00 MB, Time = 0.0026 seconds BytesIO: Peak Memory = 3.00 MB, Time = 0.0011 seconds Benchmarking with 10 iterations and chunk size 1048576 bytes... Concatenation (+=): Peak Memory = 20.00 MB, Time = 0.0316 seconds BytesIO: Peak Memory = 11.13 MB, Time = 0.0040 seconds The benchmarking script
importioimporttimeimporttracemallocdefbenchmark_concatenation(n, chunk_size): tracemalloc.start() start_time=time.time() data=b""foriinrange(n): data+=bytes([i%256]) *chunk_sizeend_time=time.time() current, peak=tracemalloc.get_traced_memory() tracemalloc.stop() returnpeak, end_time-start_timedefbenchmark_bytesio(n, chunk_size): tracemalloc.start() start_time=time.time() buffer=io.BytesIO() foriinrange(n): buffer.write(bytes([i%256]) *chunk_size) # getvalue() creates a copy of the buffer contentresult=buffer.getvalue() end_time=time.time() current, peak=tracemalloc.get_traced_memory() tracemalloc.stop() returnpeak, end_time-start_timedefmain(n): chunk_size=1024*1024# 1 MBprint(f"Benchmarking with {n} iterations and chunk size {chunk_size} bytes...") peak_concat, time_concat=benchmark_concatenation(n, chunk_size) print( f"Concatenation (+=): Peak Memory = {peak_concat/1024/1024:.2f} MB, Time = {time_concat:.4f} seconds" ) peak_bio, time_bio=benchmark_bytesio(n, chunk_size) print( f"BytesIO: Peak Memory = {peak_bio/1024/1024:.2f} MB, Time = {time_bio:.4f} seconds" ) if__name__=="__main__": whileTrue: main(int(input("Enter n: ")))There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your benchmark reads data by chunks with the constant size .
This PR reads it by chunks with geometrically increased size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That''s true. Here's the changed script with its result
Benchmarking with 2 iterations and initial chunk size 1048576 bytes... Concatenation (+=): Peak Memory = 6.00 MB, Time = 0.0044 seconds BytesIO: Peak Memory = 5.00 MB, Time = 0.0041 seconds Benchmarking with 10 iterations and initial chunk size 1048576 bytes... Concatenation (+=): Peak Memory = 2046.00 MB, Time = 4.5761 seconds BytesIO: Peak Memory = 1535.00 MB, Time = 1.7045 seconds The updated benchmarking script
importioimporttimeimporttracemallocdefbenchmark_concatenation(n, initial_chunk_size): tracemalloc.start() start_time=time.time() data=b""foriinrange(n): data+=bytes([i%256]) * (initial_chunk_size* (2**i)) end_time=time.time() current, peak=tracemalloc.get_traced_memory() tracemalloc.stop() returnpeak, end_time-start_timedefbenchmark_bytesio(n, initial_chunk_size): tracemalloc.start() start_time=time.time() buffer=io.BytesIO() foriinrange(n): buffer.write(bytes([i%256]) * (initial_chunk_size* (2**i))) # getvalue() creates a copy of the buffer contentresult=buffer.getvalue() end_time=time.time() current, peak=tracemalloc.get_traced_memory() tracemalloc.stop() returnpeak, end_time-start_timedefmain(n): chunk_size=1024*1024# 1 MBprint(f"Benchmarking with {n} iterations and initial chunk size {chunk_size} bytes...") peak_concat, time_concat=benchmark_concatenation(n, chunk_size) print( f"Concatenation (+=): Peak Memory = {peak_concat/1024/1024:.2f} MB, Time = {time_concat:.4f} seconds" ) peak_bio, time_bio=benchmark_bytesio(n, chunk_size) print( f"BytesIO: Peak Memory = {peak_bio/1024/1024:.2f} MB, Time = {time_bio:.4f} seconds" ) if__name__=="__main__": whileTrue: main(int(input("Enter n: ")))There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now why BytesIO consumes 25% less memory. If you add n new bytes to a buffer of size n, in case of the bytes concatenation you need to allocate a new buffer of size 2*n -- total 4*n bytes, but BytesIO can reallocate the original buffer -- total 3*n bytes. It also benefits from CPython-specifix optimization -- BytesIO.getvalue() just returns the internal bytes buffer without allocating a new bytes object for result.
The time difference is perhaps a derivation of this, although it is not important when we read from internet. But the peak memory consumption is important.
Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.10, 3.11, 3.12, 3.13, 3.14. |
…thonGH-119454) Reading the whole body of the HTTP response could cause OOM if the Content-Length value is too large even if the server does not send a large amount of data. Now the HTTP client reads large data by chunks, therefore the amount of consumed memory is proportional to the amount of sent data. (cherry picked from commit 5a4c4a0) Co-authored-by: Serhiy Storchaka <[email protected]>
…thonGH-119454) Reading the whole body of the HTTP response could cause OOM if the Content-Length value is too large even if the server does not send a large amount of data. Now the HTTP client reads large data by chunks, therefore the amount of consumed memory is proportional to the amount of sent data. (cherry picked from commit 5a4c4a0) Co-authored-by: Serhiy Storchaka <[email protected]>
GH-142138 is a backport of this pull request to the 3.14 branch. |
…thonGH-119454) Reading the whole body of the HTTP response could cause OOM if the Content-Length value is too large even if the server does not send a large amount of data. Now the HTTP client reads large data by chunks, therefore the amount of consumed memory is proportional to the amount of sent data. (cherry picked from commit 5a4c4a0) Co-authored-by: Serhiy Storchaka <[email protected]>
GH-142139 is a backport of this pull request to the 3.13 branch. |
…thonGH-119454) Reading the whole body of the HTTP response could cause OOM if the Content-Length value is too large even if the server does not send a large amount of data. Now the HTTP client reads large data by chunks, therefore the amount of consumed memory is proportional to the amount of sent data. (cherry picked from commit 5a4c4a0) Co-authored-by: Serhiy Storchaka <[email protected]>
GH-142140 is a backport of this pull request to the 3.12 branch. |
…thonGH-119454) Reading the whole body of the HTTP response could cause OOM if the Content-Length value is too large even if the server does not send a large amount of data. Now the HTTP client reads large data by chunks, therefore the amount of consumed memory is proportional to the amount of sent data. (cherry picked from commit 5a4c4a0) Co-authored-by: Serhiy Storchaka <[email protected]>
GH-142141 is a backport of this pull request to the 3.11 branch. |
GH-142142 is a backport of this pull request to the 3.10 branch. |
bedevere-bot commented Dec 1, 2025
|
illia-v commented Dec 1, 2025
I see, thanks for explaining and fixing the current issue! |
…H-119454) (#142138) Co-authored-by: Serhiy Storchaka <[email protected]>
…H-119454) (#142139) gh-119451: Fix a potential denial of service in http.client (GH-119454) Reading the whole body of the HTTP response could cause OOM if the Content-Length value is too large even if the server does not send a large amount of data. Now the HTTP client reads large data by chunks, therefore the amount of consumed memory is proportional to the amount of sent data. (cherry picked from commit 5a4c4a0) Co-authored-by: Serhiy Storchaka <[email protected]>
…thonGH-119454) Reading the whole body of the HTTP response could cause OOM if the Content-Length value is too large even if the server does not send a large amount of data. Now the HTTP client reads large data by chunks, therefore the amount of consumed memory is proportional to the amount of sent data.
…H-119454) (#142140) gh-119451: Fix a potential denial of service in http.client (GH-119454) Reading the whole body of the HTTP response could cause OOM if the Content-Length value is too large even if the server does not send a large amount of data. Now the HTTP client reads large data by chunks, therefore the amount of consumed memory is proportional to the amount of sent data. (cherry picked from commit 5a4c4a0) Co-authored-by: Serhiy Storchaka <[email protected]>
pythongh-119451: Fix a potential denial of service in http.client (pythonGH-119454) Reading the whole body of the HTTP response could cause OOM if the Content-Length value is too large even if the server does not send a large amount of data. Now the HTTP client reads large data by chunks, therefore the amount of consumed memory is proportional to the amount of sent data. (cherry picked from commit 5a4c4a0) Co-authored-by: Serhiy Storchaka <[email protected]>
[3.12] pythongh-119451: Fix a potential denial of service in http.client (pythonGH-119454) (python#142140) pythongh-119451: Fix a potential denial of service in http.client (pythonGH-119454) Reading the whole body of the HTTP response could cause OOM if the Content-Length value is too large even if the server does not send a large amount of data. Now the HTTP client reads large data by chunks, therefore the amount of consumed memory is proportional to the amount of sent data. (cherry picked from commit 5a4c4a0) Co-authored-by: Serhiy Storchaka <[email protected]>
…H-119454) (#142142) gh-119451: Fix a potential denial of service in http.client (GH-119454) Reading the whole body of the HTTP response could cause OOM if the Content-Length value is too large even if the server does not send a large amount of data. Now the HTTP client reads large data by chunks, therefore the amount of consumed memory is proportional to the amount of sent data. (cherry picked from commit 5a4c4a0) Co-authored-by: Serhiy Storchaka <[email protected]>
…H-119454) (#142141) gh-119451: Fix a potential denial of service in http.client (GH-119454) Reading the whole body of the HTTP response could cause OOM if the Content-Length value is too large even if the server does not send a large amount of data. Now the HTTP client reads large data by chunks, therefore the amount of consumed memory is proportional to the amount of sent data. (cherry picked from commit 5a4c4a0) Co-authored-by: Serhiy Storchaka <[email protected]>
Reading the whole body of the HTTP response could cause OOM if the Content-Length value is too large even if the server does not send a large amount of data. Now the HTTP client reads large data by chunks, therefore the amount of consumed memory is proportional to the amount of sent data.