Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 33.9k
Closed as not planned
Labels
stdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Description
Bug report
Bug description:
The tokenize module creates TokenInfo objects with a .line attribute. In Python 3.11, each token on a line used the same string object for .line. In 3.12, each token has a new copy of the same string.
This is part of a memory issue reported against coverage.py: coveragepy/coveragepy#1791
# tok.pyimportioimportsysimporttokenizeprint(f"{sys.version=}") text="lorem ipsum quia dolor sit amet consectetur adipisci velit"readline=io.StringIO(text).readlinetoks=list(tokenize.generate_tokens(readline)) print(f"{toks[0].line=}") print(f"{(toks[0].line==toks[1].line) =}") print(f"{(toks[0].lineistoks[1].line) =}")3.11 re-uses string objects:
% python3.11 /tmp/tok.py sys.version = '3.11.9 (main, Apr 8 2024, 14:01:56) [Clang 15.0.0 (clang-1500.3.9.4)]' toks[0].line = 'lorem ipsum quia dolor sit amet consectetur adipisci velit' (toks[0].line == toks[1].line) = True (toks[0].line is toks[1].line) = True 3.12 (and above) makes new string objects:
% python3.12 /tmp/tok.py sys.version = '3.12.3 (main, Apr 9 2024, 15:45:14) [Clang 15.0.0 (clang-1500.3.9.4)]' toks[0].line = 'lorem ipsum quia dolor sit amet consectetur adipisci velit' (toks[0].line == toks[1].line) = True (toks[0].line is toks[1].line) = False CPython versions tested on:
3.11, 3.12, 3.13, CPython main branch
Operating systems tested on:
macOS
Metadata
Metadata
Assignees
Labels
stdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error