Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
gh-107369: optimize textwrap.indent()#107374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
methane commented Jul 28, 2023 • edited by bedevere-bot
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by bedevere-bot
Uh oh!
There was an error while loading. Please reload this page.
eendebakpt left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Using str.split for the predicate instead of line.strip might change something for input that is not str, but I think this is ok.
serhiy-storchaka left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lstrip is faster for non-indented lines.
I wonder whether the following variants can be faster for some input and for how wide category of input.
defpredicate(line): returnlineand (notline[0].isspace() orline.lstrip())or
predicate=re.compile(r'\S').searchmethane commented Jul 28, 2023 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Since we use In case of unicodeobject.c, rstrip is bit faster. But it may be because most lines are indented already. So I chose str.lstrip here, as Serhiy suggested. |
serhiy-storchaka commented Jul 28, 2023
Now that you mention it, I can see that using We want to test whether the line has any non-space character. Algorithmically, |
Uh oh!
There was an error while loading. Please reload this page.
methane commented Jul 29, 2023
Because
lstrip() is slow when every line has long indent. But With 4c6a46a and https://gist.github.com/methane/5c6153c564d9508199a81c48d33161eb If I add |
methane commented Jul 29, 2023
To maximize performance, we can stop using lambda by...: |
serhiy-storchaka left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your research Inada-san. Which to use here, lstrip or isspace, I leave up to you. It does not really matter in most cases.
picnixz commented Jul 29, 2023 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
For very long texts, I think changing prefixed_lines= [] forlineintext.splitlines(True): ifnotline.isspace(): prefixed_lines.append(prefix) prefixed_lines.append(line)into the following may improve the overall performances prefixed_lines= [] append_line=prefixed_lines.appendforlineintext.splitlines(True): ifnotline.isspace(): append_line(prefix) append_line(line)EDIT: After a more careful benchmarking, this does not seem to bring more improvements. However, not using a lambda function seems to be better. |
indent()-ing
Object/unicodeobject.c(15332 lines) about 25% faster.