Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
gh-115077: Argument Clinic: generate better error messages when parsing function declaration#115555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
erlend-aasland commented Feb 16, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
With this experiment, we can in the future make use of shlex's character position, and thus easily provide the position in the line where parsing failed. For example, by providing error messages that look more similar to the familiar Python tracebacks. |
erlend-aasland commented Feb 16, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Some examples of improved cases:
UPDATE: after 9b93771, the latter two cases are no longer improved. |
erlend-aasland commented Feb 16, 2024
Another positive side effect: previously, the parsing |
serhiy-storchaka left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use the shell tokenizer to parse Argument Clinic syntax? Isn't it closer to Python syntax?
erlend-aasland commented Feb 16, 2024
Because it was the short route to a proof-of-concept PR. We can of course rewrite it to use the Python tokeniser instead. |
erlend-aasland commented Feb 16, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Possible gotcha: the Python tokeniser will probably split up the full name (e.g. |
serhiy-storchaka commented Feb 16, 2024
The shell tokenizer has much more gotchas. |
erlend-aasland commented Feb 16, 2024
We already use the shell tokenizer for parsing the checksum line. Should we also stop using it there? Let's rewrite it using the Python tokenizer then. If it introduces too much complexity, let's just forget about this experiment and leave the error messages like they are today. |
erlend-aasland commented Feb 16, 2024
Could you point to some, so I can add tests for those? |
serhiy-storchaka commented Feb 16, 2024
I expect some surprises in handling quotes and escapes. But for such simple case both look overkill to me. It can be done with regexpes or string methods. What are the problems in the current code? |
serhiy-storchaka commented Feb 16, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
For example: m=re.match(r'\s*([\w.]+)\s*', line) assertmfull_name=m[1] ifnotlibclinic.is_legal_py_identifier(full_name): fail(f"Illegal function name: {full_name!r}") pos=m.end() m=re.compile(r'\bas\b\s*(?:([^-=\s]+)\s*)?').match(line, pos) ifm: ifnotm[1]: fail(f"No C basename provided for {full_name!r} after 'as' keyword") c_basename=m[1] ifnotlibclinic.is_legal_c_identifier(c_basename): fail(f"Illegal C basename: {c_basename!r}") pos=m.end() else: c_basename=self.generate_c_basename(full_name) m=re.compile(r'=\s*(?:([^-=\s]+)\s*)?').match(line, pos) ifm: ifnotm[1]: fail(f"No source function provided for {full_name!r} after '=' keyword") cloned=m[1] ifnotlibclinic.is_legal_py_identifier(cloned): fail(f"Illegal source function name: {cloned!r}") pos=m.end() m=re.compile(r'->\s*(.*)').match(line, pos) ifm: ifnotm[1]: fail(f"No return annotation provided for {full_name!r} after '->' keyword") returns=m[1].strip() |
erlend-aasland commented Feb 16, 2024
That should be easy to check; I don't expect it to be a problem with our simple syntax; as you can see, the test suite completes without error, and all clinic code in our repo is parsed without problems. No surprises (yet).
It generates very bad error messages in many cases (reflected in the PR title). Also, the parsing failures are scattered around the code, instead of collected in one place as in this PR. See my earlier comments:
IMO, it is worth it to generate better error messages. |
erlend-aasland commented Feb 16, 2024
It misses some corner cases, but it is a good alternative; thanks. |
erlend-aasland commented Feb 16, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
@serhiy-storchaka, I adapted it to fit in commits 9b93771 and 1cc7248. I removed some edge cases1; perhaps it is extreme to check for such cases of invalid syntax anyway 🤷 It is a handful of lines shorter, which is nice. IMO, the shlex approach is more readable, but we don't have to weight that too heavy. What do you think? Footnotes |
Uh oh!
There was an error while loading. Please reload this page.
| RE_C_BASENAME = re.compile(r"\bas\b\s*(?:([^-=\s]+)\s*)?") | ||
| RE_CLONE = re.compile(r"=\s*(?:([^-=\s]+)\s*)?") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote it pass most of your tests, but perhaps \w+ or [\w.]+ is better than [^-=\s]+. It will produce different error message for foo.bar as '', but it may be for good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, my test case might also be too contrived.
erlend-aasland commented Mar 27, 2024
I don't have the bandwidth to follow this up now; closing the PR but keeping the local branch. Feel free to pick it up. |
Uh oh!
There was an error while loading. Please reload this page.