Skip to content

Conversation

@matthieucan
Copy link
Contributor

@matthieucanmatthieucan commented Jul 20, 2025

With those changes, the MIME parameter parser discards parameters with an invalid section number that uses a digit not convertible to integer such as super-script "²" or "𐩃" (Kharosthi number).

For backwards compatibility, keep accepting non-ASCII digits that can be converted to integers, such as NKO digits.

Before:

>>>importemail.headerregistry>>>reg=email.headerregistry.HeaderRegistry() >>>reg('Content-Disposition', 'inline; foo*0=bar') 'inline; foo="bar"'>>>reg('Content-Disposition', 'inline; foo*X=bar') 'inline;'>>>reg('Content-Disposition', 'inline; foo*0=bar; foo*߁=baz') 'inline; foo="barbaz"'>>>reg('Content-Disposition', 'inline; foo*²=bar') Traceback (mostrecentcalllast): File"<python-input-2>", line1, in<module>reg('Content-Disposition', 'inline; foo*²=bar') ~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File"/Users/matthieu/git/cpython/Lib/email/headerregistry.py", line604, in__call__returnself[name](name, value) ~~~~~~~~~~^^^^^^^^^^^^^File"/Users/matthieu/git/cpython/Lib/email/headerregistry.py", line192, in__new__cls.parse(value, kwds) ~~~~~~~~~^^^^^^^^^^^^^File"/Users/matthieu/git/cpython/Lib/email/headerregistry.py", line448, inparsekwds['parse_tree'] =parse_tree=cls.value_parser(value) ~~~~~~~~~~~~~~~~^^^^^^^File"/Users/matthieu/git/cpython/Lib/email/_header_value_parser.py", line2736, inparse_content_disposition_headerdisp_header.append(parse_mime_parameters(value[1:])) ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^File"/Users/matthieu/git/cpython/Lib/email/_header_value_parser.py", line2601, inparse_mime_parameterstoken, value=get_parameter(value) ~~~~~~~~~~~~~^^^^^^^File"/Users/matthieu/git/cpython/Lib/email/_header_value_parser.py", line2464, inget_parametertoken, value=get_section(value) ~~~~~~~~~~~^^^^^^^File"/Users/matthieu/git/cpython/Lib/email/_header_value_parser.py", line2417, inget_sectionsection.number=int(digits) ~~~^^^^^^^^ValueError: invalidliteralforint() withbase10: '²'

After:

>>>importemail.headerregistry>>>reg=email.headerregistry.HeaderRegistry() >>>reg('Content-Disposition', 'inline; foo*0=bar') 'inline; foo="bar"'>>>reg('Content-Disposition', 'inline; foo*X=bar') 'inline;'>>>reg('Content-Disposition', 'inline; foo*0=bar; foo*߁=baz') 'inline; foo="barbaz"'>>>reg('Content-Disposition', 'inline; foo*²=bar') 'inline;'

@picnixzpicnixz changed the title gh-87112: Do not fail when non 0-9 digit is used as section number in MIME header parametergh-87112: Ensure that only ASCII digits are accepted as section number in MIME header parameterJul 20, 2025
"found{}".format(value))
digits=''
whilevalueandvalue[0].isdigit():
whilevalueand'0'<=value[0]<='9':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
whilevalueand'0'<=value[0] <='9':
whilevalueand('0'<=value[0] <='9'):

It will a bit clearer. Or you can still use a separate function to make it even cleareer. The bottleneck won't be the function call IMO.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that, but not the separate function. It was my understanding that @StanFromIreland was leaning towards not having an inner function

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, I was against the function to check if it is in a dictionary.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Just moved it to a separate function for extra-clarity

@encukou
Copy link
Member

Could you also test with, for example, ߅ (NKO DIGIT FIVE), which intaccepts?
Given that some values worked before, IMO the HeaderParseError should only be raised if int() fails; for other non-ASCII numbers this should append an InvalidHeaderDefect.

@matthieucan
Copy link
ContributorAuthor

Could you also test with, for example, ߅ (NKO DIGIT FIVE), which intaccepts? Given that some values worked before, IMO the HeaderParseError should only be raised if int() fails; for other non-ASCII numbers this should append an InvalidHeaderDefect.

Thank you for looking into this.

In my understanding, those are the possible scenarios:

  • The section number is made of ASCII digits ('0' <= d <= '9')
    • Valid, remains unchanged with this PR
  • The section number is made of digits (isdigit(d)) which can be converted to integers (int(d)), e.g.߅ (NKO DIGIT FIVE)
    • This was accepted before, without a defect
    • This PR, in its current state, rejects those (they are ignored and trigger a defect)
    • ➡️ ❓ @encukou Do I understand correctly that they should be accepted, and not raise a defect (even if they are not RFC-valid)? Or should they be accepted and raise a defect?
  • The section number is made of digits (isdigit(d)) which cannot be converted to integers (not int(d)), e.g. ²
    • This used to fail with a raised exception
    • This PR, in its current state, rejects those (they are ignored and trigger a defect)
  • The section number is not made of digits (not isdigit(d))
    • Invalid, remains unchanged with this PR

@encukou
Copy link
Member

The section number is made of digits (isdigit(d)) which can be converted to integers (int(d)), e.g.߅ (NKO DIGIT FIVE). This was accepted before, without a defect

IMO, they should be accepted and raise a defect.

@matthieucanmatthieucan changed the title gh-87112: Ensure that only ASCII digits are accepted as section number in MIME header parametergh-87112: Ensure that only digits convertible to integers are accepted as section number in MIME header parameterJul 27, 2025
@matthieucan
Copy link
ContributorAuthor

The section number is made of digits (isdigit(d)) which can be converted to integers (int(d)), e.g.߅ (NKO DIGIT FIVE). This was accepted before, without a defect

IMO, they should be accepted and raise a defect.

Thank you, I agree. This is now implemented.

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

@matthieucan@encukou@picnixz@StanFromIreland@AA-Turner