Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
gh-125553: Fix backslash continuation in untokenize#126010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
tomasr8 commented Oct 26, 2024 • edited by bedevere-app bot
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by bedevere-app bot
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
bedevere-bot commented Oct 27, 2024
🤖 New build scheduled with the buildbot fleet by @pablogsal for commit bf56e7d 🤖 If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
Uh oh!
There was an error while loading. Please reload this page.
pablogsal commented Oct 27, 2024
This fails tokenization of the stdlib: Normally |
pablogsal left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my last comment
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase And if you don't make the requested changes, you will be put in the comfy chair! |
tomasr8 commented Oct 27, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Right, I actually explicitly excluded cpython/Lib/test/test_traceback.py Lines 856 to 867 in dad3453
Line 864 has a backslash on a separate line. This is problematic because the tokenizer does not generate any tokens for that line so there is no way to know the indent of the backslash and the untokenization is ambiguous. We could perhaps skips this file in the test? |
pablogsal commented Oct 28, 2024
But currently is un-tokenizing correctly no? I am a bit concerned that this is going to introduce some changes to roundtrip behavior that people are relying on. We cannot just skip the file or the test as well, that would be concerning because we will be missing coverage. |
tomasr8 commented Oct 28, 2024
It's not - both main and this pr insert less whitespace than they should. The behaviour is incorrect but does not change. The reason the As a side note, with the current untokenizer, when I compare the diff of |
pablogsal commented Oct 28, 2024
Ah thanks for explaining it in detail. Ok that makes sense and looks much better 🙂 Now we just need a slightly elegant way to use token base comparison only on that particular file. Maybe we can identify if the line falls into that case and then skip the source comparison? Otherwise maybe we can just use tokens for that file and keep a list of "known less strict files" |
tomasr8 commented Oct 29, 2024
No problem 🙂
I actually added a new boolean parameter Though, detecting this automatically rather than having a list of files might be relatively simple. We'd just need to check whether the source code contains a line with just a backslash and potentially some whitespace. I'll look into that later today :) |
tomasr8 commented Oct 29, 2024
I tried adding an automated check for the standalone backslash but it actually produces false positives for multiline strings. For example: s=r"""\pass \pass"""This is not great because we would mistakenly turn off strict checking for this file. I would say keeping a list of "known less strict files" is preferable to this? FWIW, |
pablogsal commented Oct 29, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
It may, if someone suddenly adds backlashes to some random file, tests will start to fail for no reason and that will be very frustrating to core devs and contributors so I do think that is a big concern. In fact, is a deal breaker to check for this in |
tomasr8 commented Oct 30, 2024
I totally agree that this would not be ideal. In that case, we can use the automated check I suggested before. The upside is that this will never randomly fail if someone adds a backslash to a file. The downside is that the check has false positives so we might accidentally disable the more strict check for a file (there are currently two files like that). Though I think this is still overall an improvement since the vast majority of files will be compared exactly. If you think it's acceptable I'll revert the last change which added the explicit list. |
pablogsal commented Oct 30, 2024
Yeah, unfortunately I don't think we have a choice since having files randomly fail if someone changes valid code is a no-go. So any improvement even if suboptimal that doesn't have this problem is our only option |
tomasr8 commented Oct 30, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Changes made, if you'd like to have another look :) the magic phrase: |
tomasr8 commented Oct 30, 2024
Ok one more time, since the last one didn't work: I have made the requested changes; please review again |
Thanks for making the requested changes! @pablogsal: please review the changes made to this pull request. |
pablogsal commented Jan 21, 2025
Closing and opening to retrigger CI |
This reverts commit eb2e6f2.
3a25da8 to e2c9bb7Compare7ad793e into python:mainUh oh!
There was an error while loading. Please reload this page.
Thanks @tomasr8 for the PR, and @pablogsal for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13. |
Thanks @tomasr8 for the PR, and @pablogsal for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12. |
Sorry, @tomasr8 and @pablogsal, I could not cleanly backport this to |
GH-129153 is a backport of this pull request to the 3.13 branch. |
…ythonGH-126010) (cherry picked from commit 7ad793e) Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
GH-130579 is a backport of this pull request to the 3.12 branch. |
This change correctly inserts whitespace before backslash + newline in most cases.
The exception is cases where the backslash is on its own line which makes the untokenization ambiguous. For example:
However, these should be pretty rare. In fact, I ran the tokenize -> untokenize round-trip check on the whole repo (excluding files which fail to tokenize in the first place) and all of the files produce the same output.
\+\n) #125553