bpo-42123: Run the parser two times and only enable invalid rules on the second run#22111

lysnikolaou · 2020-09-05T13:42:31Z

Here's the easiest way I could think of to do this. Are we okay with something
as simple as this or would we like to investigate further whether we could
actually use the memo cache wherever possible?

The first parser run is only responsible for detecting whether
there is a SyntaxError or not. If there isn't, the AST gets returned.
Otherwise, the parser is run a second time with all the invalid_*
rules enabled so that all the customized error messages get produced.

Regarding bpo-41659, we can now implement a very simple fix:

--- a/Grammar/python.gram +++ b/Grammar/python.gram @@ -460,6 +460,7 @@ await_primary[expr_ty] (memo): | AWAIT a=primary{CHECK_VERSION(5, "Await expressions are", _Py_Await(a, EXTRA)) } | primary primary[expr_ty]: + | primary b='{'{RAISE_SYNTAX_ERROR_KNOWN_LOCATION(b, "invalid syntax") } | a=primary '.' b=NAME{_Py_Attribute(a, b->v.Name.id, Load, EXTRA) } | a=primary b=genexp{_Py_Call(a, CHECK(_PyPegen_singleton_seq(p, b)), NULL, EXTRA) } | a=primary '(' b=[arguments] ')'{

With this patch, I'm now getting:

➜ cpygen git:(parser-second-run) ✗ ./python Python 3.10.0a0 (heads/parser-second-run-dirty:72fe7254b9, Sep 5 2020, 16:40:28) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> a{File "<stdin>", line 1 a{^ SyntaxError: invalid syntax >>> a.b.c{File "<stdin>", line 1 a.b.c{^ SyntaxError: invalid syntax

Thoughts?

https://bugs.python.org/issue42123

The first parser run is only responsible for detecting whether there is a `SyntaxError` or not. If there isn't the AST gets returned. Otherwise, the parser is run a second time with all the `invalid_*` rules enabled so that all the customized error messages get produced.

lysnikolaou · 2020-09-05T13:46:26Z

Two more things:

I actually wanted to open the PR on the we-like-parsers repo, but I messed that up. I don't think there's a problem with it being here, so I'm keeping this open.
Could someone actually benchmark this, since the benchmarks on my personal PC are not at all reliable?

pablogsal · 2020-09-05T15:41:21Z

Are we okay with something
as simple as this or would we like to investigate further whether we could
actually use the memo cache wherever possible?

I think one of the requirements for Guido's approach to work is that the memo cache is preserved between restarts of the parser.

2. Could someone actually benchmark this, since the benchmarks on my personal PC are not at all reliable?

I can benchmark this tomorrow as I am currently repairing my main PC (watercooling leak 😱 ) and is not operational.

lysnikolaou · 2020-09-05T15:55:39Z

Oooh, that's correct. Ok, let me have another look at this then.

lysnikolaou · 2020-10-22T18:21:09Z

@pablogsal Do you have the bandwidth for another review on this? If you feel it's okay, I'm going to add an issue and a NEWS blurb.

pablogsal · 2020-10-22T18:22:31Z

@pablogsal Do you have the bandwidth for another review on this? If you feel it's okay, I'm going to add an issue and a NEWS blurb.

Will check this if I can today or tomorrow

pablogsal · 2020-10-22T23:27:35Z

After some benchmarking, it results that the parser with this PR is up to 10% (in xxl benchmark) and ~3% faster in parsing the stdlib. Let's merge it :)

lysnikolaou · 2020-10-22T23:59:34Z

@ambv Do we backport this to 3.9? It's very low-risk and gives a sizable performance boost.

terryjreedy · 2020-10-24T01:22:21Z

"Do we backport this to 3.9?" We almost never backport performance patches. Low risk is not no risk. With the yearly releases, there is less than a year to wait. With the early and repeated alphas, bleeding edge fans can get this soon enough.

gvanrossum · 2020-10-24T04:12:29Z

OTOH, for the PEG parser we've been backporting everything relentlessly, otherwise when we fix a bug in 3.10 the backport is a pain in the neck.

lysnikolaou · 2020-10-26T22:37:45Z

Ok, since there's no consensus, let's not backport this.

terryjreedy · 2020-10-27T00:19:41Z

I intentionally added 'almost' to my previous comment to suggest that carefully considered exceptions are possible. I forgot about the 3.9 escape option to run the old parser. Guido's comment, when merge conflicts really happen, is real to me from my experience with IDLE. We (I) 'relentlessly' backport everything, but have not always. Manual backports are an extra risk. So is not backporting a fix because it is too much trouble.

I believe PP was once tested on a large number of other python codebases other than the stdlib. Is that done routinely, and in particular for this patch (before and after)? That would make a difference to me.

lysnikolaou · 2020-10-27T18:34:45Z

I think @pablogsal benchmarked this against something like the 10 most downloaded PyPI packages, although I'm not sure.

pablogsal · 2020-10-27T19:22:45Z

I think @pablogsal benchmarked this against something like the 10 most downloaded PyPI packages, although I'm not sure.

Yup, I did the benchmark again the 15 most downloaded packages.

lysnikolaou · 2020-10-27T22:40:54Z

FWIW I think it's worth it to backport, especially now that #23006 was merged into 3.9.

pablogsal · 2020-10-27T23:01:37Z

FWIW I think it's worth it to backport, especially now that #23006 was merged into 3.9.

I would be supportive of the backport for the reasons previously mentioned

gvanrossum · 2020-10-27T23:01:39Z

Agreed.

…es on the second run (pythonGH-22111) * Implement running the parser a second time for the errors messages The first parser run is only responsible for detecting whether there is a `SyntaxError` or not. If there isn't the AST gets returned. Otherwise, the parser is run a second time with all the `invalid_*` rules enabled so that all the customized error messages get produced. (cherry picked from commit bca7014)

…es on the second run (GH-22111) (GH-23011) * Implement running the parser a second time for the errors messages The first parser run is only responsible for detecting whether there is a `SyntaxError` or not. If there isn't the AST gets returned. Otherwise, the parser is run a second time with all the `invalid_*` rules enabled so that all the customized error messages get produced. (cherry picked from commit bca7014)

…lots1 * origin/master: (365 commits) bpo-42029: Remove IRIX code (pythonGH-23023) bpo-42143: Ensure PyFunction_NewWithQualName() can't fail after creating the func object (pythonGH-22953) bpo-34204: Use pickle.DEFAULT_PROTOCOL in shelve (pythonGH-19639) bpo-41805: Documentation for PEP 585 (pythonGH-22615) bpo-42161: Micro-optimize _collections._count_elements() (pythonGH-23008) bpo-42161: Remove private _PyLong_Zero and _PyLong_One (pythonGH-23003) bpo-42099: Fix reference to ob_type in unionobject.c and ceval (pythonGH-22829) bpo-41659: Disallow curly brace directly after primary (pythonGH-22996) bpo-6761: Enhance __call__ documentation (pythonGH-7987) bpo-42161: Modules/ uses _PyLong_GetZero() and _PyLong_GetOne() (pythonGH-22998) bpo-41474, Makefile: Add dependency on cpython/frameobject.h (pythonGH-22999) bpo-42157: Rename unicodedata.ucnhash_CAPI (pythonGH-22994) bpo-42161: Use _PyLong_GetZero() and _PyLong_GetOne() (pythonGH-22995) bpo-30681: Support invalid date format or value in email Date header (pythonGH-22090) bpo-42161: Add _PyLong_GetZero() and _PyLong_GetOne() (pythonGH-22993) bpo-42123: Run the parser two times and only enable invalid rules on the second run (pythonGH-22111) bpo-42157: Convert unicodedata.UCD to heap type (pythonGH-22991) bpo-42157: unicodedata avoids references to UCD_Type (pythonGH-22990) bpo-39101: Fixes BaseException hang in IsolatedAsyncioTestCase. (pythonGH-22654) bpo-1635741: _PyUnicode_Name_CAPI moves to internal C API (pythonGH-22713) ...

…the second run (pythonGH-22111) * Implement running the parser a second time for the errors messages The first parser run is only responsible for detecting whether there is a `SyntaxError` or not. If there isn't the AST gets returned. Otherwise, the parser is run a second time with all the `invalid_*` rules enabled so that all the customized error messages get produced.

lysnikolaou requested a review from pablogsal as a code owner September 5, 2020 13:42

the-knights-who-say-ni added the CLA signed label Sep 5, 2020

bedevere-bot added the awaiting core review label Sep 5, 2020

lysnikolaou changed the title ~~Implement running the parser a second time for the errors messages~~Implement running the parser a second time for the error messagesSep 5, 2020

lysnikolaou requested a review from gvanrossum September 5, 2020 13:46

Merge branch 'master' into parser-second-run
b524df8

pablogsal approved these changes Oct 22, 2020
View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels Oct 22, 2020

lysnikolaou changed the title ~~Implement running the parser a second time for the error messages~~bpo-42123: Implement running the parser two times and only enable invalid rules on the second runOct 22, 2020

lysnikolaou changed the title ~~bpo-42123: Implement running the parser two times and only enable invalid rules on the second run~~bpo-42123: Run the parser two times and only enable invalid rules on the second runOct 22, 2020

Add NEWS blurb
88ab59f

lysnikolaou merged commit bca7014 into python:masterOct 26, 2020

lysnikolaou deleted the parser-second-run branch October 26, 2020 22:42

bedevere-bot removed the awaiting merge label Oct 26, 2020

lysnikolaou mentioned this pull request Oct 27, 2020
[3.9] bpo-41659: Disallow curly brace directly after primary (GH-22996) #23006
Merged

Uh oh!

bpo-42123: Run the parser two times and only enable invalid rules on the second run#22111

bpo-42123: Run the parser two times and only enable invalid rules on the second run #22111

Uh oh!

Conversation

lysnikolaou commented Sep 5, 2020• edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lysnikolaou commented Sep 5, 2020

Uh oh!

pablogsal commented Sep 5, 2020

Uh oh!

lysnikolaou commented Sep 5, 2020

Uh oh!

lysnikolaou commented Oct 22, 2020

Uh oh!

pablogsal commented Oct 22, 2020

Uh oh!

pablogsal commented Oct 22, 2020

Uh oh!

lysnikolaou commented Oct 22, 2020

Uh oh!

terryjreedy commented Oct 24, 2020

Uh oh!

gvanrossum commented Oct 24, 2020

Uh oh!

lysnikolaou commented Oct 26, 2020

Uh oh!

terryjreedy commented Oct 27, 2020

Uh oh!

lysnikolaou commented Oct 27, 2020

Uh oh!

pablogsal commented Oct 27, 2020

Uh oh!

lysnikolaou commented Oct 27, 2020

Uh oh!

pablogsal commented Oct 27, 2020

Uh oh!

gvanrossum commented Oct 27, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

lysnikolaou commented Sep 5, 2020•
edited by bedevere-bot
Loading