gh-135676: Simplify docs on lexing names#140464

encukou · 2025-10-22T15:53:31Z

This simplifies the Lexical Analysis section on Names (but keeps it technically correct) by putting all the info about non-ASCII characters in a separate (and very technical) section.

It uses a mental model where the parser doesn't handle Unicode complexity “immediately”, but:

parses any non-ASCII character (outside strings/comments) as part of a name, since these can't (yet) be e.g. operators
normalizes the name
validates the name, using the id_start/id_continue sets (referred to in previous sections as “letter-like” and “number-like” characters, with a link to the details)

This also means we don't need xid_start/xid_continue to define the behaviour :)

Issue: Reword the Lexical Analysis chapter of the docs #135676

📚 Documentation preview 📚: https://cpython-previews--140464.org.readthedocs.build/

Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> Co-authored-by: Blaise Pabon <blaise@gmail.com> Co-authored-by: Micha Albert <info@micha.zone> Co-authored-by: KeithTheEE <kmurrayis@gmail.com>

willingc

Outstanding document @encukou. I had one small suggestion to be a bit more explicit on the normalization example with number.

willingc · 2025-10-22T18:24:46Z

Doc/reference/lexical_analysis.rst

+This means that, for example, some typographic variants of characters are
+converted to their "basic" form, for example::
+
+ >>> nᵘₘᵇₑʳ = 3


It would be helpful to add an explicit comment that the normalized form of nᵘₘᵇₑʳis number.

Does this look good?

encukou · 2025-11-05T10:47:49Z

There was an insightful conversation in #140269. I'll update this PR to make things even clearer.

willingc

Thanks @encukou

willingc

Nice work @encukou!

encukou · 2025-11-20T15:22:42Z

Thank you for the review!

@malemburg, do you also want to take a look?

miss-islington-app · 2025-11-26T17:19:30Z

Thanks @encukou for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

miss-islington-app · 2025-11-26T17:19:32Z

Sorry, @encukou, I could not cleanly backport this to 3.14 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 2ff8608b4da33f667960e5099a1a442197acaea4 3.14

bedevere-app · 2025-11-27T12:31:04Z

GH-142015 is a backport of this pull request to the 3.14 branch.

This simplifies the Lexical Analysis section on Names (but keeps it technically correct) by putting all the info about non-ASCII characters in a separate (and very technical) section. It uses a mental model where the parser doesn't handle Unicode complexity “immediately”, but: - parses any non-ASCII character (outside strings/comments) as part of a name, since these can't (yet) be e.g. operators - normalizes the name - validates the name, using the xid_start/xid_continue sets (cherry picked from commit 2ff8608) Co-authored-by: Petr Viktorin <encukou@gmail.com> Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> Co-authored-by: Blaise Pabon <blaise@gmail.com> Co-authored-by: Micha Albert <info@micha.zone> Co-authored-by: KeithTheEE <kmurrayis@gmail.com>

This simplifies the Lexical Analysis section on Names (but keeps it technically correct) by putting all the info about non-ASCII characters in a separate (and very technical) section. It uses a mental model where the parser doesn't handle Unicode complexity “immediately”, but: - parses any non-ASCII character (outside strings/comments) as part of a name, since these can't (yet) be e.g. operators - normalizes the name - validates the name, using the xid_start/xid_continue sets (cherry picked from commit 2ff8608) Co-authored-by: Petr Viktorin <encukou@gmail.com> Co-authored-by: Blaise Pabon <blaise@gmail.com> Co-authored-by: Micha Albert <info@micha.zone> Co-authored-by: KeithTheEE <kmurrayis@gmail.com>

This simplifies the Lexical Analysis section on Names (but keeps it technically correct) by putting all the info about non-ASCII characters in a separate (and very technical) section. It uses a mental model where the parser doesn't handle Unicode complexity “immediately”, but: - parses any non-ASCII character (outside strings/comments) as part of a name, since these can't (yet) be e.g. operators - normalizes the name - validates the name, using the xid_start/xid_continue sets Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> Co-authored-by: Blaise Pabon <blaise@gmail.com> Co-authored-by: Micha Albert <info@micha.zone> Co-authored-by: KeithTheEE <kmurrayis@gmail.com>

encukouand others added 4 commits October 8, 2025 17:58

Simplify Names section
4606120
Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> Co-authored-by: Blaise Pabon <blaise@gmail.com> Co-authored-by: Micha Albert <info@micha.zone> Co-authored-by: KeithTheEE <kmurrayis@gmail.com>

Casing; 3 dots for character ranges
6163c24

Clean-ups
de6d1af

Mention Unicode's *ID_Start* and *ID_Continue*
152e7aa

encukou requested review from AA-Turner and willingc as code owners October 22, 2025 15:53

bedevere-appbot added docs Documentation in the Doc dir skip news labels Oct 22, 2025

github-project-automationbot added this to Docs PRs Oct 22, 2025

github-project-automationbot moved this to Todo in Docs PRs Oct 22, 2025

bedevere-appbot mentioned this pull request Oct 22, 2025
Reword the Lexical Analysis chapter of the docs #135676
Open

bedevere-appbot added the awaiting core review label Oct 22, 2025

StanFromIreland linked an issue Oct 22, 2025 that may be closed by this pull request
Docs: note requirement to normalise unicode identifiers passed to globals() and locals() #86846
Closed

willingc approved these changes Oct 22, 2025
View reviewed changes

bedevere-appbot added awaiting merge and removed awaiting core review labels Oct 22, 2025

Make it clear that nᵘₘᵇₑʳ normalizes to number
fce5e98

encukou mentioned this pull request Nov 4, 2025
gh-129117: Expose _PyUnicode_IsXidContinue/Start in unicodedata#140269
Merged

encukou marked this pull request as draft November 5, 2025 10:45

bedevere-appbot removed the awaiting merge label Nov 5, 2025

willingc approved these changes Nov 10, 2025
View reviewed changes

bedevere-appbot added the awaiting merge label Nov 10, 2025

encukou added 3 commits November 12, 2025 18:06

WIP
b9fdcf0

Merge in the main branch
2e7f7c0

Reword to use XID_Start and XID_Continue
43f6091

encukou marked this pull request as ready for review November 19, 2025 16:08

bedevere-appbot added awaiting core review and removed awaiting merge labels Nov 19, 2025

willingc approved these changes Nov 19, 2025
View reviewed changes

bedevere-appbot added awaiting merge and removed awaiting core review labels Nov 19, 2025

encukou merged commit 2ff8608 into python:mainNov 26, 2025
36 checks passed

encukou deleted the lex-analysis-names-simpler branch November 26, 2025 15:10

bedevere-appbot removed the awaiting merge label Nov 26, 2025

github-project-automationbot moved this from Todo to Done in Docs PRs Nov 26, 2025

encukou added the needs backport to 3.14 bugs and security fixes label Nov 26, 2025

miss-islington-appbot assigned encukou Nov 26, 2025

bedevere-appbot removed the needs backport to 3.14 bugs and security fixes label Nov 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-135676: Simplify docs on lexing names#140464

gh-135676: Simplify docs on lexing names #140464

Uh oh!

encukou commented Oct 22, 2025•
edited by github-actions bot
Loading

Uh oh!

willingc left a comment

Uh oh!

willingcOct 22, 2025

Uh oh!

encukouOct 29, 2025

Uh oh!

encukou commented Nov 5, 2025

Uh oh!

willingc left a comment

Uh oh!

willingc left a comment

Uh oh!

encukou commented Nov 20, 2025

Uh oh!

Uh oh!

miss-islington-appbot commented Nov 26, 2025

Uh oh!

miss-islington-appbot commented Nov 26, 2025

Uh oh!

bedevere-appbot commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

gh-135676: Simplify docs on lexing names#140464

gh-135676: Simplify docs on lexing names #140464

Uh oh!

Conversation

encukou commented Oct 22, 2025• edited by github-actions botLoading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

willingc left a comment

Choose a reason for hiding this comment

Uh oh!

willingcOct 22, 2025

Choose a reason for hiding this comment

Uh oh!

encukouOct 29, 2025

Choose a reason for hiding this comment

Uh oh!

encukou commented Nov 5, 2025

Uh oh!

willingc left a comment

Choose a reason for hiding this comment

Uh oh!

willingc left a comment

Choose a reason for hiding this comment

Uh oh!

encukou commented Nov 20, 2025

Uh oh!

Uh oh!

miss-islington-appbot commented Nov 26, 2025

Uh oh!

miss-islington-appbot commented Nov 26, 2025

Uh oh!

bedevere-appbot commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

encukou commented Oct 22, 2025•
edited by github-actions bot
Loading