Skip to content

Conversation

@barneygale
Copy link
Contributor

@barneygalebarneygale commented May 6, 2023

Stop de-duplicating results in _RecursiveWildcardSelector. A new _DoubleRecursiveWildcardSelector class is introduced which performs de-duplication, but this is used only for patterns with multiple non-adjacent ** segments, such as path.glob('**/foo/**'). By avoiding the use of a set in most cases, PurePath.__hash__() is not called, and so paths do not need to be parsed and (case-) normalised.

Also merge adjacent ** segments in patterns.

Timings:

$ ./python -m timeit -s 'from pathlib import Path; p = Path()' 'list(p.glob("**/*"))' 1 loop, best of 5: 197 msec per loop # before 2 loops, best of 5: 146 msec per loop # after --> 35% faster 
$ ./python -m timeit -s 'from pathlib import Path; p = Path()' 'list(p.glob("**/**/*"))' 1 loop, best of 5: 1.77 sec per loop # before 2 loops, best of 5: 146 msec per loop # after --> 12x faster 
$ ./python -m timeit -s 'from pathlib import Path; p = Path()' 'list(p.glob("**/*/**"))' 1 loop, best of 5: 738 msec per loop # before 1 loop, best of 5: 731 msec per loop # after --> about the same 

Stop de-duplicating results in `_RecursiveWildcardSelector`. A new `_DoubleRecursiveWildcardSelector` class is introduced which performs de-duplication, but this is used _only_ for patterns with multiple non-adjacent `**` segments, such as `path.glob('**/foo/**')`. By avoiding the use of a set, `PurePath.__hash__()` is not called, and so paths do not need to be parsed and (case-) normalised. Also merge adjacent '**' segments in patterns.
@barneygalebarneygale merged commit c0ece3d into python:mainMay 7, 2023
jbower-fb pushed a commit to jbower-fb/cpython that referenced this pull request May 8, 2023
…nGH-104244) Stop de-duplicating results in `_RecursiveWildcardSelector`. A new `_DoubleRecursiveWildcardSelector` class is introduced which performs de-duplication, but this is used _only_ for patterns with multiple non-adjacent `**` segments, such as `path.glob('**/foo/**')`. By avoiding the use of a set, `PurePath.__hash__()` is not called, and so paths do not need to be stringified and case-normalised. Also merge adjacent '**' segments in patterns.
carljm added a commit to carljm/cpython that referenced this pull request May 9, 2023
* main: (47 commits) pythongh-97696 Remove unnecessary check for eager_start kwarg (python#104188) pythonGH-104308: socket.getnameinfo should release the GIL (python#104307) pythongh-104310: Add importlib.util.allowing_all_extensions() (pythongh-104311) pythongh-99113: A Per-Interpreter GIL! (pythongh-104210) pythonGH-104284: Fix documentation gettext build (python#104296) pythongh-89550: Buffer GzipFile.write to reduce execution time by ~15% (python#101251) pythongh-104223: Fix issues with inheriting from buffer classes (python#104227) pythongh-99108: fix typo in Modules/Setup (python#104293) pythonGH-104145: Use fully-qualified cross reference types for the bisect module (python#104172) pythongh-103193: Improve `getattr_static` test coverage (python#104286) Trim trailing whitespace and test on CI (python#104275) pythongh-102500: Remove mention of bytes shorthand (python#104281) pythongh-97696: Improve and fix documentation for asyncio eager tasks (python#104256) pythongh-99108: Replace SHA3 implementation HACL* version (python#103597) pythongh-104273: Remove redundant len() calls in argparse function (python#104274) pythongh-64660: Don't hardcode Argument Clinic return converter result variable name (python#104200) pythongh-104265 Disallow instantiation of `_csv.Reader` and `_csv.Writer` (python#104266) pythonGH-102613: Improve performance of `pathlib.Path.rglob()` (pythonGH-104244) pythongh-103650: Fix perf maps address format (python#103651) pythonGH-89812: Churn `pathlib.Path` methods (pythonGH-104243) ...
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performancePerformance or resource usagetopic-pathlib

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

@barneygale@JelleZijlstra@bedevere-bot