Skip to content

Conversation

@vstinner
Copy link
Member

@vstinnervstinner commented Oct 20, 2023

Detect email address parsing errors and return empty tuple to indicate the parsing error (old API). Add an optional 'strict' parameter to getaddresses() and parseaddr() functions. Patch by Thomas Dwyer.


📚 Documentation preview 📚: https://cpython-previews--111116.org.readthedocs.build/

@vstinner
Copy link
MemberAuthor

@gpshead@serhiy-storchaka@bitdancer@warsaw: Would you mind to review this security fix?

See issue gh-102988 for the context.

This PR is a copy of PR #108250 but I added strict=True parameter, so it's possible to get the old behavior. I added tests on both modes, strict=True and strict=False.

@vstinner
Copy link
MemberAuthor

This PR is a copy of PR #108250 but I added strict=True parameter

My colleague Lumir Balhar @frenzymadness ran an impact check of PR #108250 on Fedora: in short, there is no impact, the test suite of all Python packages (in Fedora) pass with the change. While there were some build errors, they were unrelated to the email issue. For details, see https://copr.fedorainfracloud.org/coprs/lbalhar/email-CVE/builds/ COPR which as more than 4300 builds.

Now with an additional strict parameter, if there is any impacted project, at least there is a way to "opt out".

@vstinner
Copy link
MemberAuthor

@tdwyer: Would you mind to review my change, to see if I preserved your work correctly? (code and tests)

@vstinner
Copy link
MemberAuthor

I think that we should backport the change to all branches accepting security fixes. Problem: the change refer to version numbers, which as .. versionchanged:: 3.13. I suppose that if the change is backported, we should compute the next version of each branch, so backport manually.

@vstinner
Copy link
MemberAuthor

@ambv@SethMichaelLarson: Would you mind to review this PR?

@ambvambv changed the title gh-102988: email parseaddr() now rejects malformed addressgh-102988: Reject malformed addresses in email.parseaddr()Oct 27, 2023
@ambv
Copy link
Contributor

ambv commented Oct 27, 2023

Why is this a separate PR from #108250?

parameter to these two functions: use ``strict=False`` to get the old
behavior, accept malformed inputs.
(Contributed by Thomas Dwyer for :gh:`102988` to ameliorate CVE-2023-27043
(Contributed by Thomas Dwyer for :gh:`102988` to improve the CVE-2023-27043

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL a new word.


specialsre = re.compile(r'[][\\()<>@,:;".]')
escapesre = re.compile(r'[\\"]')
realname_comma_re = re.compile(r'"[^"]*,[^"]*"')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
realname_comma_re=re.compile(r'"[^"]*,[^"]*"')
realname_comma_re=re.compile(r'"[^",]*+,[^"]*+"')

It is faster. But I am not sure that the use of such regex is correct.

def _pre_parse_validation(email_header_fields):
accepted_values = []
for v in email_header_fields:
s = v.replace('\\(', '').replace('\\)', '')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if that backslash was already escaped with a backslash? For example \\) or \\\\).

@vstinner
Copy link
MemberAuthor

Why is this a separate PR from #108250?

I'm not the author of the other PR. I copied the other PR and added strict parameter.

@ambv
Copy link
Contributor

ambv commented Oct 27, 2023

I'm not the author of this PR and I was able to make commits to it.

@vstinner
Copy link
MemberAuthor

I'm not the author of this PR and I was able to make commits to it.

I don't feel comfortable to make significant change of a PR without asking the author. I prefer to create a separated PR and ask for review.

@vstinner
Copy link
MemberAuthor

vstinner commented Oct 30, 2023

Is this behavior a bug or a feature? I don't know how ; is supposed to behave.

Details
$ python Python 3.11.6 (main, Oct 3 2023, 00:00:00) [GCC 13.2.1 20230728 (Red Hat 13.2.1-1)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from email.utils import getaddresses >>> from pprint import pprint >>> pprint(getaddresses('<[email protected]> <[email protected]>')) [('', ''), ('', 'b'), ('', 'o'), ('', 'b'), ('', ''), ('', 'e'), ('', 'x'), ('', 'a'), ('', 'm'), ('', 'p'), ('', 'l'), ('', 'e'), ('', '.'), ('', 'o'), ('', 'r'), ('', 'g'), ('', ''), ('', ''), ('', ''), ('', ''), ('', 'a'), ('', 'l'), ('', 'i'), ('', 'c'), ('', 'e'), ('', ''), ('', 'e'), ('', 'x'), ('', 'a'), ('', 'm'), ('', 'p'), ('', 'l'), ('', 'e'), ('', '.'), ('', 'o'), ('', 'r'), ('', 'g'), ('', '')] 

@vstinner
Copy link
MemberAuthor

Is this behavior a bug or a feature? I don't know how ; is supposed to behave.

Oh. getaddresses() expects a sequence, not a string :-)

@vstinner
Copy link
MemberAuthor

Except of parsedate_tz() function, Lib/email/_parseaddr.py file didn't evolve much it was added in 2002 by commit 030ddf7. The file was created from Lib/rfc822.py which was added in 1992 (commit 01ca336).

The latest major change was done in... 1997 with commit be7c45e

New address parser by Ben Escoto replaces Sjoerd Mullender's parseaddr()

The latest minor change was done in 2019 to fix CVE-2019-16056: commit 8cb65d1 of issue #78336.

@vstinner
Copy link
MemberAuthor

What if the input is '"Jane Doe" [email protected], "John Doe" [email protected]'?

Oh, realname_comma_re replaces "Jane Doe" <[email protected]>, "John Doe" <[email protected]> with "Jane Doe <[email protected]> which is invalid...

@vstinner
Copy link
MemberAuthor

@vstinnervstinner marked this pull request as draft October 30, 2023 14:20
@miss-islington-app
Copy link

Sorry, @vstinner, I could not cleanly backport this to 3.9 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 4a153a1d3b18803a684cd1bcc2cdf3ede3dbae19 3.9 

@miss-islington-app
Copy link

Sorry, @vstinner, I could not cleanly backport this to 3.8 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 4a153a1d3b18803a684cd1bcc2cdf3ede3dbae19 3.8 

encukou pushed a commit to encukou/cpython that referenced this pull request Sep 6, 2024
…n email.parseaddr() (pythonGH-111116) Detect email address parsing errors and return empty tuple to indicate the parsing error (old API). Add an optional 'strict' parameter to getaddresses() and parseaddr() functions. Patch by Thomas Dwyer. Co-Authored-By: Thomas Dwyer <[email protected]> (cherry picked from commit 4a153a1)
encukou added a commit to encukou/cpython that referenced this pull request Sep 6, 2024
…n email.parseaddr() (pythonGH-111116) Detect email address parsing errors and return empty tuple to indicate the parsing error (old API). Add an optional 'strict' parameter to getaddresses() and parseaddr() functions. Patch by Thomas Dwyer. (cherry picked from commit 4a153a1) Co-authored-by: Victor Stinner <[email protected]> Co-Authored-By: Thomas Dwyer <[email protected]>
encukou pushed a commit to encukou/cpython that referenced this pull request Sep 6, 2024
…n email.parseaddr() (pythonGH-111116) Detect email address parsing errors and return empty tuple to indicate the parsing error (old API). Add an optional 'strict' parameter to getaddresses() and parseaddr() functions. Patch by Thomas Dwyer. Co-Authored-By: Thomas Dwyer <[email protected]> (cherry picked from commit 4a153a1)
@bedevere-app
Copy link

GH-123766 is a backport of this pull request to the 3.12 branch.

@bedevere-appbedevere-appbot removed the needs backport to 3.12 only security fixes label Sep 6, 2024
encukou pushed a commit to encukou/cpython that referenced this pull request Sep 6, 2024
…n email.parseaddr() (pythonGH-111116) Detect email address parsing errors and return empty tuple to indicate the parsing error (old API). Add an optional 'strict' parameter to getaddresses() and parseaddr() functions. Patch by Thomas Dwyer. (cherry picked from commit 4a153a1) Co-authored-by: Victor Stinner <[email protected]> Co-Authored-By: Thomas Dwyer <[email protected]>
@bedevere-app
Copy link

GH-123767 is a backport of this pull request to the 3.11 branch.

@bedevere-appbedevere-appbot removed the needs backport to 3.11 only security fixes label Sep 6, 2024
encukou pushed a commit to encukou/cpython that referenced this pull request Sep 6, 2024
…n email.parseaddr() (pythonGH-111116) Detect email address parsing errors and return empty tuple to indicate the parsing error (old API). Add an optional 'strict' parameter to getaddresses() and parseaddr() functions. Patch by Thomas Dwyer. (cherry picked from commit 4a153a1) Co-authored-by: Victor Stinner <[email protected]> Co-Authored-By: Thomas Dwyer <[email protected]>
@bedevere-app
Copy link

GH-123768 is a backport of this pull request to the 3.10 branch.

@bedevere-appbedevere-appbot removed the needs backport to 3.10 only security fixes label Sep 6, 2024
ambv pushed a commit that referenced this pull request Sep 6, 2024
…l.parseaddr() (GH-111116) (#123766) Detect email address parsing errors and return empty tuple to indicate the parsing error (old API). Add an optional 'strict' parameter to getaddresses() and parseaddr() functions. Patch by Thomas Dwyer. Co-Authored-By: Thomas Dwyer <[email protected]> (cherry picked from commit 4a153a1) Co-authored-by: Victor Stinner <[email protected]>
ambv pushed a commit that referenced this pull request Sep 6, 2024
…l.parseaddr() (GH-111116) (#123767) Detect email address parsing errors and return empty tuple to indicate the parsing error (old API). Add an optional 'strict' parameter to getaddresses() and parseaddr() functions. Patch by Thomas Dwyer. (cherry picked from commit 4a153a1) Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Thomas Dwyer <[email protected]>
encukou pushed a commit to encukou/cpython that referenced this pull request Sep 6, 2024
… email.parseaddr() (pythonGH-111116) Detect email address parsing errors and return empty tuple to indicate the parsing error (old API). Add an optional 'strict' parameter to getaddresses() and parseaddr() functions. Patch by Thomas Dwyer. (cherry picked from commit 4a153a1) Co-authored-by: Victor Stinner <[email protected]> Co-Authored-By: Thomas Dwyer <[email protected]>
@bedevere-app
Copy link

GH-123769 is a backport of this pull request to the 3.9 branch.

encukou pushed a commit to encukou/cpython that referenced this pull request Sep 6, 2024
… email.parseaddr() (pythonGH-111116) Detect email address parsing errors and return empty tuple to indicate the parsing error (old API). Add an optional 'strict' parameter to getaddresses() and parseaddr() functions. Patch by Thomas Dwyer. (cherry picked from commit 4a153a1) Co-authored-by: Victor Stinner <[email protected]> Co-Authored-By: Thomas Dwyer <[email protected]>
@bedevere-app
Copy link

GH-123770 is a backport of this pull request to the 3.8 branch.

ambv pushed a commit that referenced this pull request Sep 6, 2024
….parseaddr() (GH-111116) (#123769) Detect email address parsing errors and return empty tuple to indicate the parsing error (old API). Add an optional 'strict' parameter to getaddresses() and parseaddr() functions. Patch by Thomas Dwyer. (cherry picked from commit 4a153a1) Co-authored-by: Victor Stinner <[email protected]> Co-Authored-By: Thomas Dwyer <[email protected]>
ambv pushed a commit that referenced this pull request Sep 6, 2024
….parseaddr() (GH-111116) (#123770) Detect email address parsing errors and return empty tuple to indicate the parsing error (old API). Add an optional 'strict' parameter to getaddresses() and parseaddr() functions. Patch by Thomas Dwyer. (cherry picked from commit 4a153a1) Co-authored-by: Victor Stinner <[email protected]> Co-Authored-By: Thomas Dwyer <[email protected]>
ambv pushed a commit that referenced this pull request Sep 6, 2024
…l.parseaddr() (GH-111116) (#123768) Detect email address parsing errors and return empty tuple to indicate the parsing error (old API). Add an optional 'strict' parameter to getaddresses() and parseaddr() functions. Patch by Thomas Dwyer. (cherry picked from commit 4a153a1) Co-authored-by: Victor Stinner <[email protected]> Co-Authored-By: Thomas Dwyer <[email protected]>
hrnciar pushed a commit to fedora-python/cpython that referenced this pull request Apr 23, 2025
…n email.parseaddr() (python#111116) Detect email address parsing errors and return empty tuple to indicate the parsing error (old API). Add an optional 'strict' parameter to getaddresses() and parseaddr() functions. Patch by Thomas Dwyer. Co-Authored-By: Thomas Dwyer <[email protected]>
@GPHemsley
Copy link
Contributor

supports_strict_parsing is not mentioned in the email.utils docs.

@gpshead
Copy link
Member

please open a new issue if there's a lingering docs problem.

hroncok pushed a commit to fedora-python/cpython that referenced this pull request Jul 4, 2025
…n email.parseaddr() (python#111116) Detect email address parsing errors and return empty tuple to indicate the parsing error (old API). Add an optional 'strict' parameter to getaddresses() and parseaddr() functions. Patch by Thomas Dwyer. Co-Authored-By: Thomas Dwyer <[email protected]>
frenzymadness pushed a commit to fedora-python/cpython that referenced this pull request Aug 12, 2025
…n email.parseaddr() (python#111116) Detect email address parsing errors and return empty tuple to indicate the parsing error (old API). Add an optional 'strict' parameter to getaddresses() and parseaddr() functions. Patch by Thomas Dwyer. Co-Authored-By: Thomas Dwyer <[email protected]>
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type-bugAn unexpected behavior, bug, or errortype-securityA security issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

11 participants

@vstinner@ambv@serhiy-storchaka@gpshead@bitdancer@encukou@mcepl@tdwyer@stratakis@GPHemsley@theta682