Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 33.9k
GH-101362: Optimise pathlib by deferring path normalisation#101560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
barneygale commented Feb 4, 2023 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
`PurePath` now normalises and splits paths only when necessary, e.g. when `.name` or `.parent` is accessed. The result is cached. This speeds up path object construction by around 4x. `PurePath.__fspath__()` now returns an unnormalised path, which should be transparent to filesystem APIs (else pathlib's normalisation is broken!). This extends the earlier performance improvement to most impure `Path` methods, and also speeds up pickling, `p.joinpath('bar')` and `p / 'bar'`. This also fixespythonGH-76846 and pythonGH-85281 by unifying path constructors and adding an `__init__()` method.barneygale commented Feb 4, 2023 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Constructing path objects is up to 4x faster with one argument: $ ./python -m timeit -n 1000000 -s 'from pathlib import PurePath''PurePath("foo/bar")' 1000000 loops, best of 5: 2.01 usec per loop # before 1000000 loops, best of 5: 495 nsec per loop # afterMore than 2x faster with two arguments: $ ./python -m timeit -n 1000000 -s 'from pathlib import PurePath''PurePath("foo", "bar")' 1000000 loops, best of 5: 2.28 usec per loop # before 1000000 loops, best of 5: 1.02 usec per loop # after~~And ~25% faster when joining arguments:~~ [edit: no longer true! ] $ ./python -m timeit -n 1000000 -s 'from pathlib import PurePath; p = PurePath("foo")''p.joinpath("bar")' 1000000 loops, best of 5: 1.66 usec per loop # before 1000000 loops, best of 5: 1.3 usec per loop # afterBut it's 12% slower when the path needs normalization, as with $ ./python -m timeit -n 1000000 -s 'from pathlib import PurePath''str(PurePath("foo/bar"))' 1000000 loops, best of 5: 2.96 usec per loop # before 1000000 loops, best of 5: 3.31 usec per loop # after
[edit: resolved! see comment] $ ./python -m timeit -n 20 -s 'from pathlib import Path''list(Path().rglob("*"))' 20 loops, best of 5: 53.4 msec per loop # before 20 loops, best of 5: 66.5 msec per loop # after
[edit: no longer true! this can't be properly fixed until other stuff lands] $ ./python -m timeit -n 100000 -s 'from pathlib import Path''Path("README.rst").read_text()' 100000 loops, best of 5: 26.1 usec per loop # before 100000 loops, best of 5: 21.2 usec per loop # after $ ./python -m timeit -n 100000 -s 'from pathlib import Path''Path("README.rst").exists()' 100000 loops, best of 5: 5.45 usec per loop # before 100000 loops, best of 5: 2.97 usec per loop # after |
barneygale commented Feb 7, 2023
I've found a couple other small optimizations which are best tackled in other PRs, so I'm marking this PR as a 'draft' for now. |
barneygale commented Mar 6, 2023
I've undone the change to Still a tiny bit slower than pre-PR. The rest of the speedups/slowdowns mentioned in my previous comment are still there. |
barneygale commented Mar 11, 2023
The change to I think I need to solve that issue first, so I'm going to mark this PR as a draft (again!) |
barneygale commented Mar 17, 2023
This PR has strayed too far from the original implementation. I'm going to abandon it. New PR here: |
PurePathnow normalises and splits paths only when necessary, e.g. when.nameor.parentis accessed. The result is cached. This speeds up path object construction by around 4x.edit: will fix separately.PurePath.__fspath__()now returns an unnormalised path, which should be transparent to filesystem APIs (else pathlib's normalisation is broken!). This extends the earlier performance improvement to most impurePathmethods, and also speeds upp.joinpath('bar')andp / 'bar'.This also fixesGH-76846 and GH-85281 by unifying path constructors and adding anedit: will fix separately.__init__()method.