Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
bpo-42123: Run the parser two times and only enable invalid rules on the second run#22111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
lysnikolaou commented Sep 5, 2020 • edited by bedevere-bot
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by bedevere-bot
Uh oh!
There was an error while loading. Please reload this page.
The first parser run is only responsible for detecting whether there is a `SyntaxError` or not. If there isn't the AST gets returned. Otherwise, the parser is run a second time with all the `invalid_*` rules enabled so that all the customized error messages get produced.
lysnikolaou commented Sep 5, 2020
Two more things:
|
pablogsal commented Sep 5, 2020
I think one of the requirements for Guido's approach to work is that the memo cache is preserved between restarts of the parser.
I can benchmark this tomorrow as I am currently repairing my main PC (watercooling leak 😱 ) and is not operational. |
lysnikolaou commented Sep 5, 2020
Oooh, that's correct. Ok, let me have another look at this then. |
lysnikolaou commented Oct 22, 2020
@pablogsal Do you have the bandwidth for another review on this? If you feel it's okay, I'm going to add an issue and a NEWS blurb. |
pablogsal commented Oct 22, 2020
Will check this if I can today or tomorrow |
pablogsal commented Oct 22, 2020
After some benchmarking, it results that the parser with this PR is up to 10% (in xxl benchmark) and ~3% faster in parsing the stdlib. Let's merge it :) |
lysnikolaou commented Oct 22, 2020
@ambv Do we backport this to 3.9? It's very low-risk and gives a sizable performance boost. |
terryjreedy commented Oct 24, 2020
"Do we backport this to 3.9?" We almost never backport performance patches. Low risk is not no risk. With the yearly releases, there is less than a year to wait. With the early and repeated alphas, bleeding edge fans can get this soon enough. |
gvanrossum commented Oct 24, 2020
OTOH, for the PEG parser we've been backporting everything relentlessly, otherwise when we fix a bug in 3.10 the backport is a pain in the neck. |
lysnikolaou commented Oct 26, 2020
Ok, since there's no consensus, let's not backport this. |
terryjreedy commented Oct 27, 2020
I intentionally added 'almost' to my previous comment to suggest that carefully considered exceptions are possible. I forgot about the 3.9 escape option to run the old parser. Guido's comment, when merge conflicts really happen, is real to me from my experience with IDLE. We (I) 'relentlessly' backport everything, but have not always. Manual backports are an extra risk. So is not backporting a fix because it is too much trouble. I believe PP was once tested on a large number of other python codebases other than the stdlib. Is that done routinely, and in particular for this patch (before and after)? That would make a difference to me. |
lysnikolaou commented Oct 27, 2020
I think @pablogsal benchmarked this against something like the 10 most downloaded PyPI packages, although I'm not sure. |
pablogsal commented Oct 27, 2020
Yup, I did the benchmark again the 15 most downloaded packages. |
lysnikolaou commented Oct 27, 2020
FWIW I think it's worth it to backport, especially now that #23006 was merged into 3.9. |
pablogsal commented Oct 27, 2020
I would be supportive of the backport for the reasons previously mentioned |
gvanrossum commented Oct 27, 2020
Agreed. |
…es on the second run (pythonGH-22111) * Implement running the parser a second time for the errors messages The first parser run is only responsible for detecting whether there is a `SyntaxError` or not. If there isn't the AST gets returned. Otherwise, the parser is run a second time with all the `invalid_*` rules enabled so that all the customized error messages get produced. (cherry picked from commit bca7014)
…es on the second run (GH-22111) (GH-23011) * Implement running the parser a second time for the errors messages The first parser run is only responsible for detecting whether there is a `SyntaxError` or not. If there isn't the AST gets returned. Otherwise, the parser is run a second time with all the `invalid_*` rules enabled so that all the customized error messages get produced. (cherry picked from commit bca7014)
…lots1 * origin/master: (365 commits) bpo-42029: Remove IRIX code (pythonGH-23023) bpo-42143: Ensure PyFunction_NewWithQualName() can't fail after creating the func object (pythonGH-22953) bpo-34204: Use pickle.DEFAULT_PROTOCOL in shelve (pythonGH-19639) bpo-41805: Documentation for PEP 585 (pythonGH-22615) bpo-42161: Micro-optimize _collections._count_elements() (pythonGH-23008) bpo-42161: Remove private _PyLong_Zero and _PyLong_One (pythonGH-23003) bpo-42099: Fix reference to ob_type in unionobject.c and ceval (pythonGH-22829) bpo-41659: Disallow curly brace directly after primary (pythonGH-22996) bpo-6761: Enhance __call__ documentation (pythonGH-7987) bpo-42161: Modules/ uses _PyLong_GetZero() and _PyLong_GetOne() (pythonGH-22998) bpo-41474, Makefile: Add dependency on cpython/frameobject.h (pythonGH-22999) bpo-42157: Rename unicodedata.ucnhash_CAPI (pythonGH-22994) bpo-42161: Use _PyLong_GetZero() and _PyLong_GetOne() (pythonGH-22995) bpo-30681: Support invalid date format or value in email Date header (pythonGH-22090) bpo-42161: Add _PyLong_GetZero() and _PyLong_GetOne() (pythonGH-22993) bpo-42123: Run the parser two times and only enable invalid rules on the second run (pythonGH-22111) bpo-42157: Convert unicodedata.UCD to heap type (pythonGH-22991) bpo-42157: unicodedata avoids references to UCD_Type (pythonGH-22990) bpo-39101: Fixes BaseException hang in IsolatedAsyncioTestCase. (pythonGH-22654) bpo-1635741: _PyUnicode_Name_CAPI moves to internal C API (pythonGH-22713) ...
…the second run (pythonGH-22111) * Implement running the parser a second time for the errors messages The first parser run is only responsible for detecting whether there is a `SyntaxError` or not. If there isn't the AST gets returned. Otherwise, the parser is run a second time with all the `invalid_*` rules enabled so that all the customized error messages get produced.
Here's the easiest way I could think of to do this. Are we okay with something
as simple as this or would we like to investigate further whether we could
actually use the memo cache wherever possible?
The first parser run is only responsible for detecting whether
there is a
SyntaxErroror not. If there isn't, the AST gets returned.Otherwise, the parser is run a second time with all the
invalid_*rules enabled so that all the customized error messages get produced.
Regarding bpo-41659, we can now implement a very simple fix:
With this patch, I'm now getting:
Thoughts?
https://bugs.python.org/issue42123