Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
Closed
Labels
stdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Description
Background
RFC 3986 defines a scheme like this:
scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
RFC 2234 defines an ALPHA like this:
ALPHA = %x41-5A / %x61-7A
The WHATWG URL spec defines a scheme like this:
- "A URL-scheme string must be one ASCII alpha, followed by zero or more of ASCII alphanumeric, U+002B (+), U+002D (-), and U+002E (.)."
The bug
This is the scheme string parsing code from Lib/urllib/parse.py:462-468:
i=url.find(':') ifi>0: forcinurl[:i]: ifcnotinscheme_chars: breakelse: scheme, url=url[:i].lower(), url[i+1:]This is the definition of scheme_chars from Lib/urllib/parse.py:77-80:
scheme_chars= ('abcdefghijklmnopqrstuvwxyz''ABCDEFGHIJKLMNOPQRSTUVWXYZ''0123456789''+-.')This will erroneously validate schemes that begin with any of ('.', '-', '+', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'). This behavior is in violation of both specifications.
This bug is reproducible with the following snippet:
>>>fromurllib.parseimporturlparse>>>urlparse(".://") # Should error, but doesn'tParseResult(scheme='.', netloc='', path='', params='', query='', fragment='')My environment
- CPython versions tested on:
- 3.12.0a1+ (fb844e1)
- 3.10.8
- Operating system and architecture:
- Arch Linux x86_64
gpshead and RacerZ-fighting
Metadata
Metadata
Assignees
Labels
stdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error