Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
gh-109534: fix reference leak when SSL handshake fails#114074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-109534: fix reference leak when SSL handshake fails #114074
Uh oh!
There was an error while loading. Please reload this page.
Conversation
ordinary-jamie commented Jan 15, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Lib/asyncio/sslproto.py Outdated
| if self._state == SSLProtocolState.DO_HANDSHAKE: | ||
| self._on_handshake_complete(ConnectionResetError) | ||
| self._on_handshake_complete(ConnectionResetError()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mjpieters; This fix caused the ssl-over-ssl test from GH-113214 (PR #113334) to fail and want to make sure I did not break anything on your end.
The test will raise ConnectionResetError (via SSLProtocol.eof_received) and normally this won't
be caught by the exception handler in SSLProtocol._fatal_error because it is of instance OSError. But because we no longer raise the exception in SSLProtocol._on_handshake_complete(exc), the exception object is never initialised and the isinstance doesn't work as intended. Changed SSLProtocol.eof_received to pass an exception object and not class to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert in this area, but passing in an instance instead of the class looks fine by me. I did leave a comment on the removal of the try...except block however as this did more than just handle raise handshake_exc.
ordinary-jamie commented Jan 15, 2024
Was not sure how to add tests for this, the current replication from the issue is as follows importasyncioimportsslimporttracemallocasyncdefmain(certfile, keyfile): ssl_context=ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER) ssl_context.load_cert_chain(certfile, keyfile) awaitasyncio.start_server( lambda_, w: w.write(b"\0"), "127.0.0.1", "4443", ssl=ssl_context ) current_1, _=tracemalloc.get_traced_memory() proc=awaitasyncio.create_subprocess_shell( f"curl https://127.0.0.1:4443 2> /dev/null" ) awaitproc.communicate() current_2, _=tracemalloc.get_traced_memory() print(f"{(current_2-current_1)/1000:.2f}KB") if__name__=="__main__": tracemalloc.start() asyncio.run(main(certfile="server.crt", keyfile="server.key"))The delta in |
| if handshake_exc is None: | ||
| self._set_state(SSLProtocolState.WRAPPED) | ||
| peercert = sslobj.getpeercert() | ||
| except Exception as exc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception handler here is intended to (also) catch exceptions raised by sslobj.getpeercert(). That is probably going to be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words: you can't avoid using a try...except block here. Is removing this really necessary to break the cycle?
ordinary-jamieJan 15, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! This was my misunderstanding of the problem. It turns out that setting the original handshake_exc to None also fixed our problem -- which my understanding is that there are two separate instances of the exception, each with a traceback and only one was being handled.
try: ifhandshake_excisNone: self._set_state(SSLProtocolState.WRAPPED) else: raisehandshake_excpeercert=sslobj.getpeercert() exceptExceptionasexc: handshake_exc=None# <--- fixes the problemordinary-jamie commented Jan 15, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
This code snippet should replicate our issue, which I'm currently trying to convert into a unittest for this PR importasyncioimportsslimportweakrefimporttracemallocclassSSLProtocol: def__init__(self, future) ->None: self.buffer=bytearray(256*1024) self.future=futuredef_do_handshake(self): try: raisessl.SSLCertVerificationError() exceptssl.SSLErrorasexc: self._on_handshake_complete(exc) def_on_handshake_complete(self, handshake_exc): try: ifhandshake_excisNone: ... else: raisehandshake_excexceptExceptionasexc: # handshake_exc = None # <-------- Uncomment to break assertionself.future.set_exception(exc) self.future=Noneasyncdefaccept_connection(): """Mimicking selector_events.BaseSelectorEventLoop._accept_connection2"""loop=asyncio.get_running_loop() waiter=loop.create_future() ssl_proto=SSLProtocol(waiter) # selector_events._SelectorSocketTransport would schedule the protocols# connection_made methodloop.call_soon(ssl_proto._do_handshake) try: awaitwaiterexceptBaseException: waiter=None# This is needed with `handshake_exc = None`ref=weakref.ref(ssl_proto) # For testing# The protocol **should** fall out of scope here...# In _accept_connection2 the app & socket transports will release their# references to the protocol via their close methods and there will be no# references to it in this task (it falls out of scope of _make_ssl_transport)ssl_proto=Nonereturnrefif__name__=="__main__": tracemalloc.start() m_start=tracemalloc.get_traced_memory()[0] ssl_proto_ref=asyncio.run(accept_connection()) m_delta=tracemalloc.get_traced_memory()[0] -m_start# Uncomment `handshake_exc = None` to break assertionsassertssl_proto_ref() isnotNoneassertm_delta>256_000# bytes |
mjpieters commented Jan 15, 2024
Does uvloop show the same issue? The implementation here is a fairly direct port from their Cython implementation, it'd be interesting to see what is different if uvloop doesn't show the same leak. |
ordinary-jamie commented Jan 15, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Hm. The initial issue ticket says My second snippet (with the mimicked classes) will still show the same issue with And briefly looking over the uvloop implementation of SSLProtocol, I can't see any obvious differences! Let me look into this! |
ordinary-jamie commented Jan 21, 2024
Sorry for the slow response @mjpieters, it's been a busy week at the office!
uvloop doesn't appear to have the same issue. Despite the (Note, I am quite out-of-depth here, but tried my best navigating through uvloop's code base...) To me, the main difference seems to be that in uvloop the waiter (future) exception is set through a done callback allowing the waiter to fall out of scope and break the reference cycle between Exception (and traceback) and Future. In
|
hauntsaninja left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your work debugging this! This change looks good to me. It sounds like we should backport this change to 3.11 as well?
ordinary-jamie commented Jan 30, 2024
Would you like me to create the PRs for the backports or is this something best left for the core team? |
hauntsaninja commented Jan 30, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
I'll handle backports! Just wanted to mention and to leave open for a little bit in case other folks have opinions. |
ordinary-jamie commented Jan 30, 2024
Thanks @hauntsaninja! Just want to mention that the issue has some open threads RE: |
gvanrossum left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to say I haven't been able to follow all this (the issue is very noisy) but these changes look good regardless.
Thanks @ordinary-jamie for the PR, and @hauntsaninja for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11, 3.12. |
…H-114074) (cherry picked from commit 80aa7b3) Co-authored-by: Jamie Phan <[email protected]>
GH-114829 is a backport of this pull request to the 3.12 branch. |
…H-114074) (cherry picked from commit 80aa7b3) Co-authored-by: Jamie Phan <[email protected]>
GH-114830 is a backport of this pull request to the 3.11 branch. |
) (#114830) gh-109534: fix reference leak when SSL handshake fails (GH-114074) (cherry picked from commit 80aa7b3) Co-authored-by: Jamie Phan <[email protected]>
) (#114829) gh-109534: fix reference leak when SSL handshake fails (GH-114074) (cherry picked from commit 80aa7b3) Co-authored-by: Jamie Phan <[email protected]>
gvanrossum commented Feb 1, 2024
Thanks @ordinary-jamie for researching this and coming up with the fix! Thanks @hauntsaninja for taking over the review. Really appreciate it! |
hauntsaninja commented Feb 1, 2024
Thank you mjpieters as well! :-) |
ordinary-jamie commented Feb 1, 2024
Thanks everyone for the support and guidance - I've learnt a lot from you all! I will continue to monitor the issue thread to see if other users are observing other related bugs. |
Fixes a reference cycle issue between the
Futureobject used inselector_events.BaseSelectorEventLoop._accept_connection2andSSLProtocolwhich prevented theSSLProtocolfrom deallocating and causing spikes in memory usage.The root cause of the issue is that, on SSL handshake failure, the
SSLProtocolsets theFuture(waiter) exception, and the traceback frames in that exception will hold references to the protocol object and its buffers.RefactoredEdit: see comments belowSSLProtocol._on_handshake_completeto not usetry/exceptas this created a frame that kept the waiter object alive.