Skip to content

Conversation

@lysnikolaou
Copy link
Member

@lysnikolaoulysnikolaou commented Jun 10, 2024

@lysnikolaoulysnikolaouforce-pushed the tokenize-thread-safety branch from 78df829 to abf568aCompareJune 10, 2024 13:50
@Fidget-Spinner
Copy link
Member

This LGTM. Can you add the reproducer to test.test_free_threading please? I'll approve after!

@lysnikolaou
Copy link
MemberAuthor

@Fidget-Spinner I added more locking around global state. Reviewing each commit seprately might make it easier.

Copy link
Member

@Fidget-SpinnerFidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a refactor, so good so far.

@lysnikolaou
Copy link
MemberAuthor

Feedback addressed.

@Fidget-Spinner I used _Py_atomic_store_int instead of the macro. Is that okay?

Copy link
Member

@Fidget-SpinnerFidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Feel free to ignore the following comments if this part of the tokenizer is not perf sensitive.

@lysnikolaou
Copy link
MemberAuthor

All feedback addressed. Thanks a lot for the reviews and the help @Fidget-Spinner and @erlend-aasland!

@lysnikolaoulysnikolaou changed the title gh-120317: Lock around tokenizer calls in the tokenizer modulegh-120317: Lock around global state in the tokenize moduleJun 10, 2024
Copy link
Member

@pablogsalpablogsal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM



@threading_helper.requires_working_threading()
classTestTokenize(unittest.TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this locally: it crashes spectacularly without the fix :)

}
goto exit;
}
if (it->done||type==ERRORTOKEN){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to protect the read? Otherwise we may get weird partial reads no?

@pablogsal
Copy link
Member

Actually, one question here: shouldn't we be protecting all reads of the it structure that are not under critical sections? Otherwise we are getting some partial reads no? I am missing something?

@pablogsal
Copy link
Member

For example, this read is not protected, no?:

 Py_ssize_t lineno = ISSTRINGLIT(type) ? it->tok->first_lineno : it->tok->lineno; Py_ssize_t end_lineno = it->tok->lineno; 

@lysnikolaoulysnikolaouforce-pushed the tokenize-thread-safety branch 2 times, most recently from 20c7cb2 to 1c51dafCompareJune 11, 2024 20:34
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
@lysnikolaoulysnikolaouforce-pushed the tokenize-thread-safety branch from 1c51daf to ca466faCompareJune 11, 2024 20:34
@lysnikolaou
Copy link
MemberAuthor

After some offline discussion with @pablogsal, we decided it makes more sense to just lock around all of tokenizeriter_next.

@lysnikolaoulysnikolaou added the needs backport to 3.13 bugs and security fixes label Jun 11, 2024
try:
r=next(it)
tokens.append(tokenize.TokenInfo._make(r))
time.sleep(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this test faster? It's going to run on every CI job. I think we should aim for 0.1 seconds or less for these sorts of tests. (It's currently ~3 seconds).

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the amount of time each thread sleeps to 0.03 secs, which means that the whole test now takes about 0.1 seconds. Also verified that the test still fails without the fix and succeeds with it.

@lysnikolaou
Copy link
MemberAuthor

lysnikolaou commented Jul 1, 2024

@pablogsal Can you do a final review here when you have some spare cycles?

Copy link
Member

@pablogsalpablogsal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the PR and the comments and it LGTM

I also ran the tests locally in free threaded mode to confirm

@lysnikolaoulysnikolaou merged commit 8549559 into python:mainJul 16, 2024
@miss-islington-app
Copy link

Thanks @lysnikolaou for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

@lysnikolaoulysnikolaou deleted the tokenize-thread-safety branch July 16, 2024 09:36
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jul 16, 2024
…honGH-120318) (cherry picked from commit 8549559) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com> Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
@bedevere-app
Copy link

GH-121841 is a backport of this pull request to the 3.13 branch.

@bedevere-appbedevere-appbot removed the needs backport to 3.13 bugs and security fixes label Jul 16, 2024
lysnikolaou added a commit that referenced this pull request Jul 16, 2024
…-120318) (#121841) (cherry picked from commit 8549559) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com> Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

@lysnikolaou@Fidget-Spinner@pablogsal@colesbury@erlend-aasland