gh-115077: Argument Clinic: generate better error messages when parsing function declaration#115555

erlend-aasland · 2024-02-16T09:56:20Z

Issue: Argument Clinic: make error messages more helpful to developers #115077

erlend-aasland · 2024-02-16T10:01:20Z

With this experiment, we can in the future make use of shlex's character position, and thus easily provide the position in the line where parsing failed. For example, by providing error messages that look more similar to the familiar Python tracebacks.

erlend-aasland · 2024-02-16T10:17:33Z

Some examples of improved cases:

foo = bar ->
- main: Illegal function name foo = bar ->
- This PR: No return annotation provided ...
foo as
- main: Illegal function name foo as
- This PR: No C base name provided ...
a b c d:
- main: Illegal function name a b c d
- This PR: Invalid syntax ...
foo = bar baz:
- main: Illegal function mame foo = bar baz
- This PR: Invalid syntax ...

UPDATE: after 9b93771, the latter two cases are no longer improved.

erlend-aasland · 2024-02-16T10:25:12Z

Another positive side effect: previously, the parsing fail()s (for function declarations) were scattered around in various places; now they are collected in one place. IMO, that helps readability and maintainability.

serhiy-storchaka

Why use the shell tokenizer to parse Argument Clinic syntax? Isn't it closer to Python syntax?

erlend-aasland · 2024-02-16T11:21:11Z

Why use the shell tokenizer to parse Argument Clinic syntax? Isn't it closer to Python syntax?

Because it was the short route to a proof-of-concept PR. We can of course rewrite it to use the Python tokeniser instead.

erlend-aasland · 2024-02-16T11:23:37Z

We can of course rewrite it to use the Python tokeniser instead.

Possible gotcha: the Python tokeniser will probably split up the full name (e.g. mod.cls.fn will be returned as ["mod", ".", "cls", ".", "fn"], IIRC). Currently, the shell tokeniser is easily configured to give us a single token for the full name: ["mod.cls.fn"]. This means we'd have to do extra post-processing for full names.

serhiy-storchaka · 2024-02-16T11:42:06Z

The shell tokenizer has much more gotchas.

erlend-aasland · 2024-02-16T12:03:16Z

The shell tokenizer has much more gotchas.

We already use the shell tokenizer for parsing the checksum line. Should we also stop using it there?

Let's rewrite it using the Python tokenizer then. If it introduces too much complexity, let's just forget about this experiment and leave the error messages like they are today.

erlend-aasland · 2024-02-16T12:15:35Z

The shell tokenizer has much more gotchas.

Could you point to some, so I can add tests for those?

serhiy-storchaka · 2024-02-16T12:31:49Z

I expect some surprises in handling quotes and escapes.

But for such simple case both look overkill to me. It can be done with regexpes or string methods. What are the problems in the current code?

serhiy-storchaka · 2024-02-16T13:02:03Z

For example:

m=re.match(r'\s*([\w.]+)\s*', line) assertmfull_name=m[1] ifnotlibclinic.is_legal_py_identifier(full_name): fail(f"Illegal function name: {full_name!r}") pos=m.end() m=re.compile(r'\bas\b\s*(?:([^-=\s]+)\s*)?').match(line, pos) ifm: ifnotm[1]: fail(f"No C basename provided for {full_name!r} after 'as' keyword") c_basename=m[1] ifnotlibclinic.is_legal_c_identifier(c_basename): fail(f"Illegal C basename: {c_basename!r}") pos=m.end() else: c_basename=self.generate_c_basename(full_name) m=re.compile(r'=\s*(?:([^-=\s]+)\s*)?').match(line, pos) ifm: ifnotm[1]: fail(f"No source function provided for {full_name!r} after '=' keyword") cloned=m[1] ifnotlibclinic.is_legal_py_identifier(cloned): fail(f"Illegal source function name: {cloned!r}") pos=m.end() m=re.compile(r'->\s*(.*)').match(line, pos) ifm: ifnotm[1]: fail(f"No return annotation provided for {full_name!r} after '->' keyword") returns=m[1].strip()

erlend-aasland · 2024-02-16T13:08:42Z

I expect some surprises in handling quotes and escapes.

That should be easy to check; I don't expect it to be a problem with our simple syntax; as you can see, the test suite completes without error, and all clinic code in our repo is parsed without problems. No surprises (yet).

But for such simple case both look overkill to me. It can be done with regexpes or string methods. What are the problems in the current code?

It generates very bad error messages in many cases (reflected in the PR title). Also, the parsing failures are scattered around the code, instead of collected in one place as in this PR. See my earlier comments:

IMO, it is worth it to generate better error messages.

erlend-aasland · 2024-02-16T13:40:12Z

For example:

It misses some corner cases, but it is a good alternative; thanks.

erlend-aasland · 2024-02-16T14:05:23Z

@serhiy-storchaka, I adapted it to fit in commits 9b93771 and 1cc7248. I removed some edge cases¹; perhaps it is extreme to check for such cases of invalid syntax anyway 🤷 It is a handful of lines shorter, which is nice. IMO, the shlex approach is more readable, but we don't have to weight that too heavy.

What do you think?

see https://github.com/python/cpython/pull/115555#issuecomment-1948104973 ↩

Tools/clinic/clinic.py

serhiy-storchaka · 2024-02-16T14:42:07Z

Tools/clinic/libclinic/parser.py

+RE_C_BASENAME = re.compile(r"\bas\b\s*(?:([^-=\s]+)\s*)?")
+RE_CLONE = re.compile(r"=\s*(?:([^-=\s]+)\s*)?")


I wrote it pass most of your tests, but perhaps \w+ or [\w.]+ is better than [^-=\s]+. It will produce different error message for foo.bar as '', but it may be for good.

Well, my test case might also be too contrived.

erlend-aasland · 2024-03-27T23:40:55Z

I don't have the bandwidth to follow this up now; closing the PR but keeping the local branch. Feel free to pick it up.

erlend-aasland added 2 commits February 16, 2024 10:55

Use a lexer to generate better error messages for invalid syntax
fe3b8ca

Extend test suite
c926abc

erlend-aasland requested review from serhiy-storchaka and sobolevn February 16, 2024 09:56

bedevere-appbot mentioned this pull request Feb 16, 2024
Argument Clinic: make error messages more helpful to developers #115077
Open

erlend-aasland added the skip news label Feb 16, 2024

erlend-aasland changed the title ~~gh-115077: Argument Clinic: use a lexer to generate better error message~~gh-115077: Argument Clinic: use a lexer to generate better error messagesFeb 16, 2024

erlend-aasland added 2 commits February 16, 2024 11:32

Validate cloned name post parsing
33c21e7

Remove now obsoleted comment
60553b7

serhiy-storchaka reviewed Feb 16, 2024
View reviewed changes

Use regex instead; compromise by not detecting some edge cases
9b93771

erlend-aasland changed the title ~~gh-115077: Argument Clinic: use a lexer to generate better error messages~~gh-115077: Argument Clinic: generate better error messages when parsing function declarationFeb 16, 2024

Add parser.py
1cc7248

serhiy-storchaka reviewed Feb 16, 2024
View reviewed changes

Tools/clinic/clinic.pyShow resolvedHide resolved

serhiy-storchaka reviewed Feb 16, 2024
View reviewed changes

erlend-aasland added 2 commits February 16, 2024 23:52

Pull in main
e8c1de3

Detect more cases of invalid syntax
3c83c5d

erlend-aasland closed this Mar 27, 2024

erlend-aasland deleted the clinic/tokenizer branch March 27, 2024 23:41

		RE_C_BASENAME = re.compile(r"\bas\b\s(?:([^-=\s]+)\s)?")
		RE_CLONE = re.compile(r"=\s(?:([^-=\s]+)\s)?")

Uh oh!

gh-115077: Argument Clinic: generate better error messages when parsing function declaration#115555

gh-115077: Argument Clinic: generate better error messages when parsing function declaration #115555

Uh oh!

Conversation

erlend-aasland commented Feb 16, 2024• edited by bedevere-app botLoading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erlend-aasland commented Feb 16, 2024• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erlend-aasland commented Feb 16, 2024• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erlend-aasland commented Feb 16, 2024

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

erlend-aasland commented Feb 16, 2024

Uh oh!

erlend-aasland commented Feb 16, 2024• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serhiy-storchaka commented Feb 16, 2024

Uh oh!

erlend-aasland commented Feb 16, 2024

Uh oh!

erlend-aasland commented Feb 16, 2024

Uh oh!

serhiy-storchaka commented Feb 16, 2024

Uh oh!

serhiy-storchaka commented Feb 16, 2024• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erlend-aasland commented Feb 16, 2024

Uh oh!

erlend-aasland commented Feb 16, 2024

Uh oh!

erlend-aasland commented Feb 16, 2024• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

Uh oh!

serhiy-storchakaFeb 16, 2024

Choose a reason for hiding this comment

Uh oh!

erlend-aaslandFeb 20, 2024

Choose a reason for hiding this comment

Uh oh!

erlend-aasland commented Mar 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

erlend-aasland commented Feb 16, 2024•
edited by bedevere-app bot
Loading

erlend-aasland commented Feb 16, 2024•
edited
Loading

erlend-aasland commented Feb 16, 2024•
edited
Loading

erlend-aasland commented Feb 16, 2024•
edited
Loading

serhiy-storchaka commented Feb 16, 2024•
edited
Loading

erlend-aasland commented Feb 16, 2024•
edited
Loading