Skip to content

Conversation

@barneygale
Copy link
Contributor

@barneygalebarneygale commented Feb 6, 2024

When expanding and filtering paths for a ** wildcard segment, build an re.Pattern object from the subsequent pattern parts, rather than the entire pattern, and match against the os.DirEntry object prior to instantiating a path object.

Also skip compiling a pattern when expanding a * wildcard segment.

… regex matching When expanding and filtering paths for a `**` wildcard segment, build an `re.Pattern` object from the subsequent pattern parts, rather than the entire pattern. Also skip compiling a pattern when expanding a `*` wildcard segment.
@barneygale
Copy link
ContributorAuthor

barneygale commented Feb 6, 2024

Notable improvements:

$ ./python -m timeit -s "from pathlib import Path""list(Path.cwd().glob('*', follow_symlinks=False))" 2000 loops, best of 5: 180 usec per loop # before 2000 loops, best of 5: 159 usec per loop # after# --> 1.13x faster $ ./python -m timeit -s "from pathlib import Path""list(Path.cwd().glob('**/*.py', follow_symlinks=False))" 5 loops, best of 5: 54 msec per loop # before 5 loops, best of 5: 40.9 msec per loop # after# --> 1.32x faster

Everything else is about the same.

@zooba
Copy link
Member

zooba commented Feb 8, 2024

For whatever reason, every time I try to review this, I struggle to figure out what the change is doing :D

Since it doesn't require changing any test cases, and I know the tests cases are pretty thorough for this area, I don't think there's any reason to not sign off. Maybe trigger a buildbot run with the tag to make sure it doesn't behave strangely on any of those setups - they can occasionally be a bit unusual and find some edge cases.

@barneygale
Copy link
ContributorAuthor

barneygale commented Feb 8, 2024

Thanks Steve.

For whatever reason, every time I try to review this, I struggle to figure out what the change is doing :D

The algorithm might be worthy of a blog post at this point!

The main change is that we now filter partial paths through a regex corresponding to a partial pattern in _select_recursive, rather than complete paths through a regex corresponding to a complete pattern in PathBase.glob(). We can do this because previous parts have already been filtered by _select_children(), and so there's no need to re-filter them.

The secondary change (which includes the addition of _entry_str()) is to match against os.DirEntry.path directly, which allows us to skip construction of path objects for files that don't match.

@zooba
Copy link
Member

zooba commented Feb 8, 2024

Okay, today it made sense :) Guess I'm more awake right now. Reading the changes from the bottom up might have helped as well.

Personally, I don't think you can have too many comments in an algorithm like this, particularly when it's recursive and split between a couple of functions. I'll suggest a few comments that would've helped me, but I don't think there are any code changes needed.

Copy link
Member

@zoobazooba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just comments that may help make it more understandable. No changes required

@barneygalebarneygale merged commit 6f93b4d into python:mainFeb 10, 2024
fsc-eriker pushed a commit to fsc-eriker/cpython that referenced this pull request Feb 14, 2024
… regex matching (python#115061) When expanding and filtering paths for a `**` wildcard segment, build an `re.Pattern` object from the subsequent pattern parts, rather than the entire pattern, and match against the `os.DirEntry` object prior to instantiating a path object. Also skip compiling a pattern when expanding a `*` wildcard segment.
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performancePerformance or resource usagetopic-pathlib

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

@barneygale@zooba