Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 33.9k
gh-101196: Make isdir/isfile/exists faster on Windows#101324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
mdboom commented Jan 25, 2023 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
fb3c6ac to a07f1e7CompareUh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
eryksun commented Jan 25, 2023
A builtin |
Co-authored-by: Eryk Sun <eryksun@gmail.com>
eryksun commented Jan 25, 2023
A few cases that are handled by
|
mdboom commented Jan 25, 2023
I guess this means we don't have test coverage for them...? In any event, I agree we should probably try to cover these cases. I'll see what can be done.
My initial implementation from this morning did that, which I like because it doesn't repeat what's clearly battle-tested code, but it made things slower in the event of symlinks. I was almost ok with that given that that isn't the common case, but your solution seemed to make everything a little faster (ignoring these new corner cases). |
eryksun commented Jan 25, 2023 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
There shouldn't be an issue with speed for common cases as long as you only fall back on
|
Lib/test/test_ntpath.py Outdated
| @unittest.skipIf(sys.platform!='win32', "drive letters are a windows concept") | ||
| deftest_isfile_driveletter(self): | ||
| current_drive=os.path.splitdrive(os.path.abspath(__file__))[0] +"\\" | ||
| self.assertFalse(os.path.isfile(current_drive)) |
eryksunJan 26, 2023 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason this is false is because a relative drive path like "C:" resolves to the working directory on the drive. Did you want to test a volume device path like r'\\.\C:' instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. And this is exactly the thing that (erroneously) returns True without checking the error code from GetFileInformationByHandleEx.
mdboom commented Jan 26, 2023 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
@eryksun: Thanks for all of your help and pointers. I've implemented support for these corner cases by calling I'm a little stuck on writing unit tests for them. I was able to cover I'm not sure how to recreate the case you describe in For |
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
zooba left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me. Any other POVs?
erlend-aasland commented Feb 6, 2023
Looks good to me, but I'll add that there's a lot of code duplication in here; it might be worth it to refactor the code shortly after landing this PR, so the added code is more maintainable and readable. |
mdboom commented Feb 6, 2023
I'll take a first pass at reducing some of this duplication. |
zooba commented Feb 6, 2023
My read of the duplication is that there's just enough difference that it's probably not worth it. The main change I'd like to see (but wasn't requiring) is to add an option to If all the additional functions were using the same OS call, it would make sense to merge them all. But they're subtly different each time, and since this is for perf, we don't want to introduce more switches. |
erlend-aasland commented Feb 6, 2023 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
+1 (I suggest adding that in a follow-up PR; let's land this.)
True that. |
eryksun commented Feb 7, 2023
Michael, are you still working on refactoring the code? Steve and Erlend seem to be in agreement that it's ready to merge as is. I think it's ready. |
zooba commented Feb 8, 2023
Just heard that it's good to go as it is. |
miss-islington commented Feb 8, 2023
miss-islington commented Feb 8, 2023
miss-islington commented Feb 8, 2023
Sorry, @mdboom and @zooba, I could not cleanly backport this to |
miss-islington commented Feb 8, 2023
Sorry @mdboom and @zooba, I had trouble checking out the |
zooba commented Feb 8, 2023
This ought to be safe to backport from a public API perspective, but I'll let the bot figure out how easy it'll be. |
miss-islington commented Feb 8, 2023
miss-islington commented Feb 8, 2023
Sorry, @mdboom and @zooba, I could not cleanly backport this to |
zooba commented Feb 8, 2023
Okay, it's not going to backport trivially :) If anyone's up for it, go ahead, otherwise I'll take a look when I get time. |
hauntsaninja commented Feb 9, 2023
(It's relatively rare that we backport performance improvements, so if it ends up being complicated maybe we shouldn't) |
* main: (82 commits) pythongh-101670: typo fix in PyImport_ExtendInittab() (python#101723) pythonGH-99293: Document that `Py_TPFLAGS_VALID_VERSION_TAG` shouldn't be used. (#pythonGH-101736) no-issue: Add Dong-hee Na as the cjkcodecs codeowner (pythongh-101731) pythongh-101678: Merge math_1_to_whatever() and math_1() (python#101730) pythongh-101678: refactor the math module to use special functions from c11 (pythonGH-101679) pythongh-85984: Remove legacy Lib/pty.py code. (python#92365) pythongh-98831: Use opcode metadata for stack_effect() (python#101704) pythongh-101283: Version was just released, so should be changed in 3.11.3 (pythonGH-101719) pythongh-101283: Fix use of unbound variable (pythonGH-101712) pythongh-101283: Improved fallback logic for subprocess with shell=True on Windows (pythonGH-101286) pythongh-101277: Port more itertools static types to heap types (python#101304) pythongh-98831: Modernize CALL and family (python#101508) pythonGH-101696: invalidate type version tag in `_PyStaticType_Dealloc` (python#101697) pythongh-100221: Fix creating dirs in `make sharedinstall` (pythonGH-100329) pythongh-101670: typo fix in PyImport_AppendInittab() (pythonGH-101672) pythongh-101196: Make isdir/isfile/exists faster on Windows (pythonGH-101324) pythongh-101614: Don't treat python3_d.dll as a Python DLL when checking extension modules for incompatibility (pythonGH-101615) pythongh-100933: Improve `check_element` helper in `test_xml_etree` (python#100934) pythonGH-101578: Normalize the current exception (pythonGH-101607) pythongh-47937: Note that Popen attributes are read-only (python#93070) ...
This makes
isdir,isfileandexistsfaster on Windows, by making fewer API calls than callingos.statand then looking at that result.On the following microbenchmarks:
Microbenchmarks
I get the following results:
So therefore, these operations are around 13%-28% faster, depending on whether the file exists, is a symlink etc.
NOTE: This does not improve the performance of
pathlib.Path.is_dirand friends. The behavior there is slightly different in terms of error handling, but I think a similar approach could also be applied there. If this PR is acceptable, I plan to do that as follow-up work.