gh-135676: Add a summary of source characters#138194

encukou · 2025-08-27T15:46:57Z

The lexical analysis docs have notes like this at the end:

The period can also occur in floating-point and imaginary literals.
The following printing ASCII characters have special meaning as part of other tokens or are otherwise significant to the lexical analyzer: ' " # \
The following printing ASCII characters are not used in Python. Their occurrence outside string literals and comments is an unconditional error: $ ? `

The intent behind these seems to be providing a "map" of what all the ASCII characters do in Python, but that map is incomplete as it is, and isn't really kept up to date.

This instead provides a summary of source characters -- nominally the ones that start tokens, with notes for other notable cases.
The table can also serve as an alternate "table of contents".

The presentation -- a table of bulleted lists -- is a bit wacky but I think it gets the job done.

Issue: Reword the Lexical Analysis chapter of the docs #135676

📚 Documentation preview 📚: https://cpython-previews--138194.org.readthedocs.build/

Doc/reference/lexical_analysis.rst

AA-Turner

I think this is a useful addition!

A

AA-Turner · 2025-10-08T06:14:22Z

Doc/reference/lexical_analysis.rst

+.. note::
+
+ A ":dfn:`stream`" is a *sequence*, in the general sense of the word
+ (not necessarily a Python :term:`sequence object <sequence>`).


I'm not sure this note is needed?

I agree with @AA-Turner. Stream and sequence are both overloaded terms that may be better unpacked by the reader in context.

OK; I've removed it

AA-Turner · 2025-10-08T06:15:34Z

Doc/reference/lexical_analysis.rst

+.. list-table::
+:header-rows: 1


In general for list tables it can be useful to alternate list markers, e.g. using - to denote items of the second-level list. Not essential, though.

All my list-tables will do that from now on :)

AA-Turner · 2025-10-08T06:19:27Z

Doc/reference/lexical_analysis.rst

+ * * :ref:`String literal <strings>`
+
+ * * * ASCII letter (``a``-``z``, ``A``-``Z``)
+ * non-ASCII character


Is 'non-ASCII character' too broad here? Not all characters can form valid identifiers, especially if expanding to the full Unicode space!

It is broad, but: if the tokenizer sees a non-ASCII character, the next token can only be a NAME (or error). (Except inside strings/comments, but then it's not deciding what the next token will be.)
If I remember correctly¹, the tokenizer implementation does lump non-ASCII characters with the letters, and only checks validity after it parses an identifier-like token.
¹ Maybe I don't, but it certainly could do that :)

willingc

A nice improvement @encukou. I've left a few prose suggestions but fine as is too. Thanks!

willingc · 2025-10-08T09:23:52Z

Doc/reference/lexical_analysis.rst

+.. note::
+
+ A ":dfn:`stream`" is a *sequence*, in the general sense of the word
+ (not necessarily a Python :term:`sequence object <sequence>`).


I agree with @AA-Turner. Stream and sequence are both overloaded terms that may be better unpacked by the reader in context.

Doc/reference/lexical_analysis.rst

Co-authored-by: Carol Willing <carolcode@willingconsulting.com>

miss-islington-app · 2025-10-08T14:34:25Z

Thanks @encukou for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

(cherry picked from commit 59a6f9d) Co-authored-by: Petr Viktorin <encukou@gmail.com> Co-authored-by: Carol Willing <carolcode@willingconsulting.com> Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> Co-authored-by: Blaise Pabon <blaise@gmail.com> Co-authored-by: Micha Albert <info@micha.zone> Co-authored-by: KeithTheEE <kmurrayis@gmail.com>

bedevere-app · 2025-10-08T14:34:41Z

GH-139781 is a backport of this pull request to the 3.14 branch.

encukou · 2025-10-08T14:34:50Z

Thank you for the reviews!

…139781) (cherry picked from commit 59a6f9d) Co-authored-by: Petr Viktorin <encukou@gmail.com> Co-authored-by: Carol Willing <carolcode@willingconsulting.com> Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> Co-authored-by: Blaise Pabon <blaise@gmail.com> Co-authored-by: Micha Albert <info@micha.zone> Co-authored-by: KeithTheEE <kmurrayis@gmail.com>

pythongh-135676: Add a summary of source characters
4f2b85b

bedevere-appbot added docs Documentation in the Doc dir skip news labels Aug 27, 2025

github-project-automationbot added this to Docs PRs Aug 27, 2025

github-project-automationbot moved this to Todo in Docs PRs Aug 27, 2025

bedevere-appbot mentioned this pull request Aug 27, 2025
Reword the Lexical Analysis chapter of the docs #135676
Open

StanFromIreland reviewed Aug 27, 2025
View reviewed changes

Doc/reference/lexical_analysis.rstShow resolvedHide resolved

StanFromIreland reviewed Aug 27, 2025
View reviewed changes

Doc/reference/lexical_analysis.rst Outdated Show resolvedHide resolved

serhiy-storchaka reviewed Aug 28, 2025
View reviewed changes

Doc/reference/lexical_analysis.rst Outdated Show resolvedHide resolved

encukou marked this pull request as ready for review September 3, 2025 14:28

encukou requested review from AA-Turner and willingc as code owners September 3, 2025 14:28

bedevere-appbot added the awaiting core review label Sep 3, 2025

Use zero-width space instead of joiner
d9157bb

encukou mentioned this pull request Sep 3, 2025
gh-135676: Reword the Operators & Delimiters section(s) #137713
Merged

AA-Turner approved these changes Oct 8, 2025
View reviewed changes

bedevere-appbot added awaiting merge and removed awaiting core review labels Oct 8, 2025

willingc approved these changes Oct 8, 2025
View reviewed changes

encukouand others added 3 commits October 8, 2025 16:05

Update Doc/reference/lexical_analysis.rst
f085358
Co-authored-by: Carol Willing <carolcode@willingconsulting.com>

Remove note explaining *stream*
a30747f

Alternate list markers in list-table
300cc8c

encukou added the needs backport to 3.14 bugs and security fixes label Oct 8, 2025

encukou merged commit 59a6f9d into python:mainOct 8, 2025
29 checks passed

github-project-automationbot moved this from Todo to Done in Docs PRs Oct 8, 2025

bedevere-appbot removed the awaiting merge label Oct 8, 2025

encukou deleted the lex-analysis-highlevel branch October 8, 2025 14:34

bedevere-appbot removed the needs backport to 3.14 bugs and security fixes label Oct 8, 2025

		.. list-table::
		:header-rows: 1

Uh oh!

gh-135676: Add a summary of source characters#138194

gh-135676: Add a summary of source characters #138194

Uh oh!

Conversation

encukou commented Aug 27, 2025• edited by github-actions botLoading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AA-Turner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

willingc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

miss-islington-appbot commented Oct 8, 2025

Uh oh!

bedevere-appbot commented Oct 8, 2025

Uh oh!

encukou commented Oct 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

encukou commented Aug 27, 2025•
edited by github-actions bot
Loading