Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
gh-135676: Add a summary of source characters#138194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
encukou commented Aug 27, 2025 • edited by github-actions bot
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by github-actions bot
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
AA-Turner left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a useful addition!
A
Doc/reference/lexical_analysis.rst Outdated
| .. note:: | ||
| A ":dfn:`stream`" is a *sequence*, in the general sense of the word | ||
| (not necessarily a Python :term:`sequence object <sequence>`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this note is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @AA-Turner. Stream and sequence are both overloaded terms that may be better unpacked by the reader in context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK; I've removed it
| .. list-table:: | ||
| :header-rows: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general for list tables it can be useful to alternate list markers, e.g. using - to denote items of the second-level list. Not essential, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All my list-tables will do that from now on :)
| * * :ref:`String literal <strings>` | ||
| * * * ASCII letter (``a``-``z``, ``A``-``Z``) | ||
| * non-ASCII character |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 'non-ASCII character' too broad here? Not all characters can form valid identifiers, especially if expanding to the full Unicode space!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is broad, but: if the tokenizer sees a non-ASCII character, the next token can only be a NAME (or error). (Except inside strings/comments, but then it's not deciding what the next token will be.)
If I remember correctly¹, the tokenizer implementation does lump non-ASCII characters with the letters, and only checks validity after it parses an identifier-like token.
¹ Maybe I don't, but it certainly could do that :)
willingc left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nice improvement @encukou. I've left a few prose suggestions but fine as is too. Thanks!
Doc/reference/lexical_analysis.rst Outdated
| .. note:: | ||
| A ":dfn:`stream`" is a *sequence*, in the general sense of the word | ||
| (not necessarily a Python :term:`sequence object <sequence>`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @AA-Turner. Stream and sequence are both overloaded terms that may be better unpacked by the reader in context.
Uh oh!
There was an error while loading. Please reload this page.
Co-authored-by: Carol Willing <carolcode@willingconsulting.com>
59a6f9d into python:mainUh oh!
There was an error while loading. Please reload this page.
Thanks @encukou for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14. |
(cherry picked from commit 59a6f9d) Co-authored-by: Petr Viktorin <encukou@gmail.com> Co-authored-by: Carol Willing <carolcode@willingconsulting.com> Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> Co-authored-by: Blaise Pabon <blaise@gmail.com> Co-authored-by: Micha Albert <info@micha.zone> Co-authored-by: KeithTheEE <kmurrayis@gmail.com>
GH-139781 is a backport of this pull request to the 3.14 branch. |
encukou commented Oct 8, 2025
Thank you for the reviews! |
…139781) (cherry picked from commit 59a6f9d) Co-authored-by: Petr Viktorin <encukou@gmail.com> Co-authored-by: Carol Willing <carolcode@willingconsulting.com> Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> Co-authored-by: Blaise Pabon <blaise@gmail.com> Co-authored-by: Micha Albert <info@micha.zone> Co-authored-by: KeithTheEE <kmurrayis@gmail.com>
The lexical analysis docs have notes like this at the end:
The period can also occur in floating-point and imaginary literals.
The following printing ASCII characters have special meaning as part of other tokens or are otherwise significant to the lexical analyzer:
' " # \The following printing ASCII characters are not used in Python. Their occurrence outside string literals and comments is an unconditional error:
$ ? `The intent behind these seems to be providing a "map" of what all the ASCII characters do in Python, but that map is incomplete as it is, and isn't really kept up to date.
This instead provides a summary of source characters -- nominally the ones that start tokens, with notes for other notable cases.
The table can also serve as an alternate "table of contents".
The presentation -- a table of bulleted lists -- is a bit wacky but I think it gets the job done.
📚 Documentation preview 📚: https://cpython-previews--138194.org.readthedocs.build/