Skip to content

Conversation

@lysnikolaou
Copy link
Member

@lysnikolaoulysnikolaou commented Jun 15, 2023

@pablogsal
Copy link
Member

@lysnikolaou Hummm, can you quickly cover why moving this chunk solves the problem? I think I am missing some key insight here

@lysnikolaou
Copy link
MemberAuthor

lysnikolaou commented Jun 15, 2023

Sure. There's two different cases to consider here:

  1. When we're calling the tokenizer from the tokenize module, it uses the readline tokenizer, which you added in 9216e69. In that branch, update_fstring_expr wasn't being called at all from tok_underflow_readline, which resulted in the last_expr_size not being updated. That's why we were passing a negative count to PyUnicode_DecodeUTF8.
cpythononmainviaCv14.0.3-clangvia 🐍 pyenv3.11.3cattmp/t.pyprint(f'''{3=}''')%cpythononmainviaCv14.0.3-clangvia 🐍 pyenv3.11.3 ❯ ./python.exe-mtokenizetmp/t.pyunexpectederror: NegativesizepassedtoPyUnicode_NewTraceback (mostrecentcalllast): File"/Users/lysnikolaou/repos/python/cpython/Lib/runpy.py", line198, in_run_module_as_mainreturn_run_code(code, main_globals, None, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File"/Users/lysnikolaou/repos/python/cpython/Lib/runpy.py", line88, in_run_codeexec(code, run_globals) File"/Users/lysnikolaou/repos/python/cpython/Lib/tokenize.py", line547, in<module>main() File"/Users/lysnikolaou/repos/python/cpython/Lib/tokenize.py", line489, inmaintokens=list(tokenize(f.readline)) ^^^^^^^^^^^^^^^^^^^^^^^^^^File"/Users/lysnikolaou/repos/python/cpython/Lib/tokenize.py", line447, intokenizeyieldfrom_generate_tokens_from_c_tokenizer(rl_gen.__next__, encoding, extra_tokens=True) File"/Users/lysnikolaou/repos/python/cpython/Lib/tokenize.py", line537, in_generate_tokens_from_c_tokenizerforinfoinit: SystemError: NegativesizepassedtoPyUnicode_New
  1. There was a second bug with an f-string being tokenized from a file, only when the closing brace of the f-string expression part is in the last line of the file and there's no newline at the end. In that branch we use tok->cur to calculate the size of the buffer before adding an implicit newline and then tok->startafter adding the newline. That resulted in the end buffer being one character too short.
cpythononmainviaCv14.0.3-clangvia 🐍 pyenv3.11.3took15scattmp/t.pyprint(f'''{3=}''')%cpythononmainviaCv14.0.3-clangvia 🐍 pyenv3.11.3 ❯ ./python.exetmp/t.py33

@lysnikolaou
Copy link
MemberAuthor

Do you think I should split this PR in two and open a second issue for the second case above?

@lysnikolaoulysnikolaouforce-pushed the fix-expr-buffer-file-readline branch from 2efb0a4 to a83de94CompareJune 15, 2023 13:46
@pablogsal
Copy link
Member

Do you think I should split this PR in two and open a second issue for the second case above?

Nah, I think is fine, but we may want to add two NEWS entries

Copy link
Member

@pablogsalpablogsal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks for the explanation @lysnikolaou. Fantastic work as always! 💪

@pablogsal
Copy link
Member

@lysnikolaou If you want add the second NEWS entry and then land it.

@lysnikolaoulysnikolaou enabled auto-merge (squash) June 15, 2023 15:56
@lysnikolaoulysnikolaou merged commit d382ad4 into python:mainJun 15, 2023
@miss-islington
Copy link
Contributor

Thanks @lysnikolaou for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 15, 2023
…kenizer (pythonGH-105828) (cherry picked from commit d382ad4) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
@bedevere-bot
Copy link

GH-105832 is a backport of this pull request to the 3.12 branch.

@bedevere-botbedevere-bot removed the needs backport to 3.12 only security fixes label Jun 15, 2023
lysnikolaou added a commit to lysnikolaou/cpython that referenced this pull request Jun 15, 2023
lysnikolaou added a commit that referenced this pull request Jun 15, 2023
…okenizer (GH-105828) (#105832) (cherry picked from commit d382ad4) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
carljm added a commit to carljm/cpython that referenced this pull request Jun 15, 2023
* main: (57 commits) pythongh-105831: Fix NEWS blurb from pythongh-105828 (python#105833) pythongh-105820: Fix tok_mode expression buffer in file & readline tokenizer (python#105828) pythongh-105751, test_ctypes: Remove disabled tests (python#105826) pythongh-105821: Use a raw f-string in test_httpservers.py (python#105822) pythongh-105751: Remove platform usage in test_ctypes (python#105819) pythongh-105751: Reenable disable test_ctypes tests (python#105818) pythongh-105751: Remove dead code in test_ctypes (python#105817) More reorganisation of the typing docs (python#105787) Improve docs for `typing.dataclass_transform` (python#105792) pythonGH-89812: Churn `pathlib.Path` test methods (python#105807) pythongh-105800: Issue SyntaxWarning in f-strings for invalid escape sequences (python#105801) pythongh-105751: Cleanup test_ctypes imports (python#105803) pythongh-105481: add HAS_JUMP flag to opcode metadata (python#105791) pythongh-105751: test_ctypes avoids the operator module (pythonGH-105797) pythongh-105751: test_ctypes: Remove @need_symbol decorator (pythonGH-105798) pythongh-104909: Implement conditional stack effects for macros (python#105748) pythongh-75905: Remove test_xmlrpc_net: skipped since 2017 (python#105796) pythongh-105481: Fix types and a bug for pseudos (python#105788) Update DSL docs for cases generator (python#105753) pythonGH-77273: Better bytecodes for f-strings (pythonGH-6132) ...
gvanrossum pushed a commit to gvanrossum/cpython that referenced this pull request Jun 18, 2023
gvanrossum pushed a commit to gvanrossum/cpython that referenced this pull request Jun 18, 2023
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

@lysnikolaou@pablogsal@miss-islington@bedevere-bot