Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 33.9k
gh-119118: Fix performance regression in tokenize module#119615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
lysnikolaou commented May 27, 2024 • edited by bedevere-app bot
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by bedevere-app bot
Uh oh!
There was an error while loading. Please reload this page.
- Cache line object to avoid creating a Unicode object for all of the tokens in the same line. - Speed up byte offset to column offset conversion by using the smallest buffer possible to measure the difference.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
pablogsal commented May 28, 2024
Hummm, seems also that this solution fails with |
pablogsal commented May 28, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Another idea: we do have the token already as unicode ( |
pablogsal commented May 28, 2024
I'm discussing another fix with @lysnikolaou offline |
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
pablogsal commented May 28, 2024
The docs failure seems unrelated. @hugovk do we know what may be happening here? |
lysnikolaou commented May 28, 2024
The latest results are: The test failures appear to be unrelated. We can probably merge this. |
hugovk commented May 28, 2024
Thanks @lysnikolaou for the PR, and @pablogsal for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13. |
…nGH-119615) * pythongh-119118: Fix performance regression in tokenize module - Cache line object to avoid creating a Unicode object for all of the tokens in the same line. - Speed up byte offset to column offset conversion by using the smallest buffer possible to measure the difference. (cherry picked from commit d87b015) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com> Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
GH-119682 is a backport of this pull request to the 3.13 branch. |
…nGH-119615) * pythongh-119118: Fix performance regression in tokenize module - Cache line object to avoid creating a Unicode object for all of the tokens in the same line. - Speed up byte offset to column offset conversion by using the smallest buffer possible to measure the difference. (cherry picked from commit d87b015) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com> Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
GH-119683 is a backport of this pull request to the 3.12 branch. |
…19615) (#119682) - Cache line object to avoid creating a Unicode object for all of the tokens in the same line. - Speed up byte offset to column offset conversion by using the smallest buffer possible to measure the difference. (cherry picked from commit d87b015) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com> Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
…19615) (#119683) - Cache line object to avoid creating a Unicode object for all of the tokens in the same line. - Speed up byte offset to column offset conversion by using the smallest buffer possible to measure the difference. (cherry picked from commit d87b015) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com> Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
…n#119615) * pythongh-119118: Fix performance regression in tokenize module - Cache line object to avoid creating a Unicode object for all of the tokens in the same line. - Speed up byte offset to column offset conversion by using the smallest buffer possible to measure the difference. Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
…n#119615) * pythongh-119118: Fix performance regression in tokenize module - Cache line object to avoid creating a Unicode object for all of the tokens in the same line. - Speed up byte offset to column offset conversion by using the smallest buffer possible to measure the difference. Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
tokenize.generate_tokens()performance regression in 3.12 #119118