Skip to content

Conversation

@barneygale
Copy link
Contributor

@barneygalebarneygale commented Feb 29, 2024

For ordinary literal pattern segments (e.g. foo/bar in foo/bar/../**), skip calling scandir() on each segment, and instead call exists() or is_dir() as necessary to exclude missing paths. This only applies when case_sensitive is None (the default); otherwise we can't guarantee case sensitivity or realness with this approach. If follow_symlinks is False we also need to exclude symlinks from intermediate segments.

This restores an optimization that was removed in da1980a by some eejit. It's actually even faster because we don't stat() intermediate directories, and in some cases we can skip all filesystem access when expanding a literal part (e.g. when it's followed by a non-recursive wildcard segment).

… scanning. For ordinary literal pattern segments (e.g. `foo/bar` in `foo/bar/../**`), skip calling `_scandir()` on each segment, and instead call `exists()` or `is_dir()` as necessary to exclude missing paths. This only applies when *case_sensitive* is `None` (the default); otherwise we can't guarantee case sensitivity or realness with this approach. If *follow_symlinks* is `False` we also need to exclude symlinks from intermediate segments. This restores an optimization that was removed in da1980a by some eejit. It's actually even faster because we don't `stat()` intermediate directories, and in some cases we can skip all filesystem access when expanding a literal part (e.g. when it's followed by a non-recursive wildcard segment).
@barneygale
Copy link
ContributorAuthor

barneygale commented Feb 29, 2024

Quite a lot faster:

$ ./python -m timeit -s "from pathlib import Path""list(Path().glob('Lib/pathlib/__init__.py'))" 2000 loops, best of 5: 195 usec per loop # before 20000 loops, best of 5: 17 usec per loop # after $ ./python -m timeit -s "from pathlib import Path""list(Path().glob('Lib/pathlib/*'))" 2000 loops, best of 5: 197 usec per loop # before 10000 loops, best of 5: 28.8 usec per loop # after $ ./python -m timeit -s "from pathlib import Path""list(Path().glob('Lib/*/__init__.py'))" 200 loops, best of 5: 1.24 msec per loop # before 1000 loops, best of 5: 307 usec per loop # after $ ./python -m timeit -s "from pathlib import Path""list(Path().glob('*/pathlib/__init__.py'))" 200 loops, best of 5: 1.22 msec per loop # before 1000 loops, best of 5: 261 usec per loop # after

@barneygalebarneygale marked this pull request as draft March 4, 2024 18:19
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performancePerformance or resource usagetopic-pathlib

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

@barneygale