Skip to content

Conversation

@encukou
Copy link
Member

@encukouencukou commented Jul 24, 2024

Re: #121812

Hello @basbloemsaat,

I've spent the day reading through the email module, and RFCs, and I believe I found a better place to fix the issue.
This involved lots of experimentation, so I'm sending an alternative PR rather than a review on yours.

  • The generator (writer) verifies that the representation of each header is sound (a parser won't treat it as multiple headers, start-of-body, or part of another header). That should cover custom fold() implementations or Header subclasses.

    • However, some user out there is probably misusing such header injection in working code, so, I added a policy attribute to turn it back.
  • Newlines are encoded in fold(), just like undecodable bytes and other special characters.

Overall, this means that we treat newlines as valid content of headers, but “escape” them when such a header is serialized to text.

This PR is a proof of concept. It needs tests and documentation, but I'm out of time for today, and I wanted to share what I have.

Does this look reasonable to you?

encukouand others added 2 commits July 24, 2024 15:30
This should fail for custom fold() implementations that aren't careful about newlines.
Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. --- Credit for an earlier attempt: Co-Authored-By: Bas Bloemsaat <[email protected]>
@basbloemsaat
Copy link
Contributor

That sounds entirely reasonable, and conforms to the RFCs.

Two points:

  1. what would be the use of keeping this check in email.policy.header_store_parse ?
  2. I found one case that is not covered by this (contrived, I admit):
fromemailimportmessage_from_stringemail_in="""Subject: foo <bar>\nBCC: [email protected]To: [email protected]From: External Sender <[email protected]>message body"""msg=message_from_string(email_in) print(msg)

@encukou
Copy link
MemberAuthor

what would be the use of keeping this check in email.policy.header_store_parse ?

This is in the branch that handles strings (rather than custom Header object). I'm not clears what kind of format that string is supposed to be in.
Keeping the check means that if someone relied on this, they'll get the same error as before. Also, the indication that something is wrong will come earlier.

I found one case that is not covered by this (contrived, I admit)

That \n there is Python syntax, by the time it gets to message_from_string it's the same as a “real” newline.
If you use a raw string r""", or read from a file, the header remains on a single line.

@encukouencukou added needs backport to 3.12 only security fixes needs backport to 3.13 bugs and security fixes labels Jul 27, 2024
I'm not touching other instances in this file, since this PR might be backported to very old versions.
@encukouencukou marked this pull request as ready for review July 29, 2024 13:18
@encukouencukou requested a review from a team as a code ownerJuly 29, 2024 13:18
@encukou
Copy link
MemberAuthor

@serhiy-storchaka, would you like to review this?
@warsaw, @bitdancer, @maxking, as email experts, do you have any comments?

@encukouencukou added needs backport to 3.8 needs backport to 3.10 only security fixes needs backport to 3.11 only security fixes 🔨 test-with-buildbots Test PR w/ buildbots; report in status section labels Jul 29, 2024
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @encukou for commit af41733 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 29, 2024
Copy link
Member

@serhiy-storchakaserhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Co-authored-by: Serhiy Storchaka <[email protected]>
ambv pushed a commit to ambv/cpython that referenced this pull request Aug 2, 2024
… are sound (pythonGH-122233) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
@bedevere-app
Copy link

GH-122611 is a backport of this pull request to the 3.8 branch.

Yhg1s pushed a commit that referenced this pull request Aug 6, 2024
…sound (GH-122233) (#122484) gh-121650: Encode newlines in headers, and verify headers are sound (GH-122233) GH-GH- Encode header parts that contain newlines Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. GH-GH- Verify that email headers are well-formed This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
Yhg1s pushed a commit that referenced this pull request Aug 6, 2024
…sound (GH-122233) (#122599) * gh-121650: Encode newlines in headers, and verify headers are sound (GH-122233) - Encode header parts that contain newlines Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. - Verify that email headers are well-formed This should fail for custom fold() implementations that aren't careful about newlines. Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]> (cherry picked from commit 0976339) * Document changes as made in 3.12.5
hroncok pushed a commit to fedora-python/cpython that referenced this pull request Aug 6, 2024
…s are sound pythongh-121650: Encode newlines in headers, and verify headers are sound (pythonGH-122233) Encode header parts that contain newlines Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. Verify that email headers are well-formed This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
frenzymadness pushed a commit to frenzymadness/cpython that referenced this pull request Aug 13, 2024
…s are sound (pythonGH-122233) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
frenzymadness pushed a commit to fedora-python/cpython that referenced this pull request Aug 15, 2024
…s are sound (pythonGH-122233) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
stratakis pushed a commit to stratakis/cpython that referenced this pull request Aug 15, 2024
…s are sound (pythonGH-122233) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
hrnciar added a commit to hrnciar/cpython that referenced this pull request Aug 16, 2024
 headers are sound (pythonGH-122233) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]> This patch also contains modified commit cherry picked from c5bba85. This commit was backported to simplify the backport of the other commit fixing CVE. The only modification is a removal of one test case which tests multiple changes in Python 3.7 and it wasn't working properly with Python 3.6 where we backported only one change. Co-authored-by: bsiem <[email protected]>
hrnciar added a commit to fedora-python/cpython that referenced this pull request Aug 16, 2024
 headers are sound (pythonGH-122233) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) This patch also contains modified commit cherry picked from c5bba85. This commit was backported to simplify the backport of the other commit fixing CVE. The only modification is a removal of one test case which tests multiple changes in Python 3.7 and it wasn't working properly with Python 3.6 where we backported only one change. Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]> Co-authored-by: bsiem <[email protected]>
hrnciar added a commit to fedora-python/cpython that referenced this pull request Aug 20, 2024
 headers are sound (pythonGH-122233) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) This patch also contains modified commit cherry picked from c5bba85. This commit was backported to simplify the backport of the other commit fixing CVE. The only modification is a removal of one test case which tests multiple changes in Python 3.7 and it wasn't working properly with Python 3.6 where we backported only one change. Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]> Co-authored-by: bsiem <[email protected]>
blhsing pushed a commit to blhsing/cpython that referenced this pull request Aug 22, 2024
…ound (pythonGH-122233) ## Encode header parts that contain newlines Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. ## Verify that email headers are well-formed This should fail for custom fold() implementations that aren't careful about newlines. Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
ambv added a commit that referenced this pull request Sep 4, 2024
…ound (GH-122233) (#122611) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
ambv added a commit that referenced this pull request Sep 4, 2024
…sound (GH-122233) (#122608) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. Verify that email headers are well-formed. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
ambv added a commit that referenced this pull request Sep 4, 2024
…sound (GH-122233) (#122609) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
ambv added a commit that referenced this pull request Sep 4, 2024
…ound (GH-122233) (#122610) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
brainhoard-github pushed a commit to distro-core-curated-mirrors/poky-contrib that referenced this pull request Sep 16, 2024
Changelog: https://docs.python.org/release/3.12.5/whatsnew/changelog.html Include security fix CVE-2024-6923 Reference: https://nvd.nist.gov/vuln/detail/CVE-2024-6923python/cpython#122233 (From OE-Core rev: 777cad793a5b07d392b1d9875530fb5480e75863) Signed-off-by: Vijay Anusuri <[email protected]> Signed-off-by: Steve Sakoman <[email protected]>
hrnciar added a commit to fedora-python/cpython that referenced this pull request Apr 23, 2025
…s are sound (pythonGH-122233) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) This patch also contains modified commit cherry picked from c5bba85. This commit was backported to simplify the backport of the other commit fixing CVE. The only modification is a removal of one test case which tests multiple changes in Python 3.7 and it wasn't working properly with Python 3.6 where we backported only one change. Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]> Co-authored-by: bsiem <[email protected]>
hroncok pushed a commit to fedora-python/cpython that referenced this pull request Jul 4, 2025
…s are sound (pythonGH-122233) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) This patch also contains modified commit cherry picked from c5bba85. This commit was backported to simplify the backport of the other commit fixing CVE. The only modification is a removal of one test case which tests multiple changes in Python 3.7 and it wasn't working properly with Python 3.6 where we backported only one change. Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]> Co-authored-by: bsiem <[email protected]>
frenzymadness pushed a commit to fedora-python/cpython that referenced this pull request Aug 12, 2025
…s are sound (pythonGH-122233) Per RFC 2047: > [...] these encoding schemes allow the > encoding of arbitrary octet values, mail readers that implement this > decoding should also ensure that display of the decoded data on the > recipient's terminal will not cause unwanted side-effects It seems that the "quoted-word" scheme is a valid way to include a newline character in a header value, just like we already allow undecodable bytes or control characters. They do need to be properly quoted when serialized to text, though. This should fail for custom fold() implementations that aren't careful about newlines. (cherry picked from commit 0976339) This patch also contains modified commit cherry picked from c5bba85. This commit was backported to simplify the backport of the other commit fixing CVE. The only modification is a removal of one test case which tests multiple changes in Python 3.7 and it wasn't working properly with Python 3.6 where we backported only one change. Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]> Co-authored-by: bsiem <[email protected]>
sethmlarson pushed a commit to sethmlarson/cpython that referenced this pull request Jan 21, 2026
pythonGH-122233 added an implementation to `Generator` to refuse to serialize (write) headers that are unsafely folded or delimited. This revision adds the same implementation to `BytesGenerator`, so it gets the same safety protections for unsafely folded or delimited headers Co-authored-by: Denis Ledoux <[email protected]> Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]>
sethmlarson pushed a commit to sethmlarson/cpython that referenced this pull request Jan 21, 2026
pythonGH-122233 added an implementation to `Generator` to refuse to serialize (write) headers that are unsafely folded or delimited. This revision adds the same implementation to `BytesGenerator`, so it gets the same safety protections for unsafely folded or delimited headers Co-authored-by: Denis Ledoux <[email protected]> Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]>
sethmlarson pushed a commit to sethmlarson/cpython that referenced this pull request Jan 21, 2026
pythonGH-122233 added an implementation to `Generator` to refuse to serialize (write) headers that are unsafely folded or delimited. This revision adds the same implementation to `BytesGenerator`, so it gets the same safety protections for unsafely folded or delimited headers Co-authored-by: Denis Ledoux <[email protected]> Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Bas Bloemsaat <[email protected]>
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

@encukou@basbloemsaat@bedevere-bot@ambv@serhiy-storchaka