Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 33.9k
gh-102765: Updated isdir/isfile/islink/exists to use Py_GetFileInformationByName in ntpath when available#103485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
finnagin commented Apr 12, 2023 • edited by bedevere-bot
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by bedevere-bot
Uh oh!
There was an error while loading. Please reload this page.
This comment was marked as outdated.
This comment was marked as outdated.
ghost commented Apr 12, 2023 • edited by ghost
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by ghost
Uh oh!
There was an error while loading. Please reload this page.
zooba commented Apr 12, 2023
There's an approximately equivalent NEWS entry for this, so I don't think we need another one. Shame that we have a whole lot more duplicated code now, but we don't really have any good alternatives in C without losing the straight-line execution. (In C++ we could probably use a lambda and trust the compiler to optimise appropriately.) |
Modules/posixmodule.c Outdated
| caseERROR_FILE_NOT_FOUND: | ||
| caseERROR_PATH_NOT_FOUND: | ||
| caseERROR_NOT_READY: | ||
| caseERROR_BAD_NET_NAME: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Four other errors can be excluded from the slow fallback path:
ERROR_BAD_NETPATH- The network path was not found.ERROR_BAD_PATHNAME- The specified path is invalid.ERROR_INVALID_NAME- The filename, directory name, or volume label syntax is incorrect.ERROR_FILENAME_EXCED_RANGE- The filename or extension is too long.
| caseERROR_FILE_NOT_FOUND: | |
| caseERROR_PATH_NOT_FOUND: | |
| caseERROR_NOT_READY: | |
| caseERROR_BAD_NET_NAME: | |
| caseERROR_FILE_NOT_FOUND: | |
| caseERROR_PATH_NOT_FOUND: | |
| caseERROR_NOT_READY: | |
| caseERROR_BAD_NET_NAME: | |
| caseERROR_BAD_NETPATH: | |
| caseERROR_BAD_PATHNAME: | |
| caseERROR_INVALID_NAME: | |
| caseERROR_FILENAME_EXCED_RANGE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I've gone ahead and added those cases for isdir, isfile, exists, & islink.
Modules/posixmodule.c Outdated
| caseERROR_NOT_SUPPORTED: | ||
| /* indicates the API couldn't be loaded */ | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this case highlighted?
Note that ERROR_NOT_SUPPORTED could also be returned if the device doesn't support the by-name query. Though in that case the error code is more likely to be ERROR_INVALID_FUNCTION or ERROR_INVALID_PARAMETER.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I did not realize that. Thanks!
I highlighted this case since it was also highlighted this way were Py_GetFileInformationByName was called in the win32 xstat implementation earlier in the file. Would it be better to alter this comment to say something like "indicates the API couldn't be loaded or the device doesn't support the by-name query" or would it be better just to remove the comment entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, it's probably better to pick a different error code for when the API is not supported. We don't really care, but it's worth recognising when it happened.
Maybe let's return ERROR_NOT_SUPPORTED | (1 << 29) (see SetLastError docs for the bit 29 reference). Then just leave out the cases that capture and ignore it.
Uh oh!
There was an error while loading. Please reload this page.
| if (_path.fd!=-1){ | ||
| hfile=_Py_get_osfhandle_noraise(_path.fd); | ||
| close_file= FALSE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of bugs slipped by when the new accelerator functions were added. These bugs should probably be resolved in a separate issue before this pull request gets merged.
- If
_Py_get_osfhandle_noraise()fails (e.g. a bad file descriptor), the result isINVALID_HANDLE_VALUE, but the thread's last error isn't set and is thus a random error code. It could be one of the errors that's handled by callingSTAT(), but in this case there is no_path.widevalue to check. It happens thatCreateFileW()doesn't raise an OS exception that crashes the process when passed a null pointer forlpFileName, but that's not documented and shouldn't be relied on. - If
GetFileType(hfile)isn'tFILE_TYPE_DISK, we have to return a false result. Since we didn't open the file, we don't know whether or not it has a pending synchronous operation that's blocked indefinitely. For example, a pipe or character file could have a pending synchronous read that may never complete. In this case,GetFileInformationByHandleEx()would block.
…e result Co-authored-by: Eryk Sun <eryksun@gmail.com>
eryksun commented Apr 15, 2023
Here's a staticDWORDfileTypeFromDeviceType(DWORDdeviceType){switch (deviceType){caseFILE_DEVICE_DISK: caseFILE_DEVICE_VIRTUAL_DISK: caseFILE_DEVICE_DFS: caseFILE_DEVICE_CD_ROM: caseFILE_DEVICE_CONTROLLER: caseFILE_DEVICE_DATALINK: caseFILE_DEVICE_DISK_FILE_SYSTEM: caseFILE_DEVICE_CD_ROM_FILE_SYSTEM: returnFILE_TYPE_DISK; caseFILE_DEVICE_CONSOLE: caseFILE_DEVICE_NULL: caseFILE_DEVICE_KEYBOARD: caseFILE_DEVICE_MODEM: caseFILE_DEVICE_MOUSE: caseFILE_DEVICE_PARALLEL_PORT: caseFILE_DEVICE_PRINTER: caseFILE_DEVICE_SCREEN: caseFILE_DEVICE_SERIAL_PORT: caseFILE_DEVICE_SOUND: returnFILE_TYPE_CHAR; caseFILE_DEVICE_NAMED_PIPE: returnFILE_TYPE_PIPE; default: returnFILE_TYPE_UNKNOWN} } staticunsigned intposixFileTypeFromFileInfo(DWORDfileType, DWORDfileAttributes, DWORDreparseTag){if ((fileAttributes&FILE_ATTRIBUTE_REPARSE_POINT) &&reparseTag==IO_REPARSE_TAG_SYMLINK){returnS_IFLNK} if (fileAttributes&FILE_ATTRIBUTE_DIRECTORY){returnS_IFDIR} switch (fileType){caseFILE_TYPE_DISK: // BUGBUG: define S_IFBLK as 0x6000 in PC/pyconfig.hreturnfileAttributes ? S_IFREG : 0x6000; caseFILE_TYPE_CHAR: returnS_IFCHR; caseFILE_TYPE_PIPE: // BUGBUG: define S_IFIFO as _S_IFIFO in PC/pyconfig.h. UCRT neglects// to define it for _CRT_INTERNAL_NONSTDC_NAMES in sys/stat.h.return_S_IFIFO} return0} staticBOOLgetPosixFileType(path_t*path, int*pPosixFileType, DWORD*pReparseTag, BOOLreparseDirectory, BOOLreparseFile){HANDLEhfile; DWORDfileType=0; DWORDfileAttributes=0; DWORDreparseTag=0; intposixFileType=0; BOOLreparse=reparseDirectory||reparseFile; BOOLqueryByHandle= TRUE; BOOLcloseFile= TRUE; BOOLresult= FALSE; if (path->wide){FILE_STAT_BASIC_INFORMATIONstatInfo; if (_Py_GetFileInformationByName(path->wide, FileStatBasicByNameInfo, &statInfo, sizeof(statInfo))){if (statInfo.FileAttributes&FILE_ATTRIBUTE_REPARSE_POINT){if (statInfo.FileAttributes&FILE_ATTRIBUTE_DIRECTORY){if (!reparseDirectory){queryByHandle= FALSE} } elseif (!reparseFile){queryByHandle= FALSE} } else{queryByHandle= FALSE} if (!queryByHandle){result= TRUE; fileType=fileTypeFromDeviceType(statInfo.DeviceType); fileAttributes=statInfo.FileAttributes; if (fileAttributes&FILE_ATTRIBUTE_REPARSE_POINT){reparseTag=statInfo.ReparseTag} } } else{switch(GetLastError()){caseERROR_FILE_NOT_FOUND: caseERROR_PATH_NOT_FOUND: caseERROR_NOT_READY: caseERROR_BAD_NET_NAME: caseERROR_BAD_NETPATH: caseERROR_BAD_PATHNAME: caseERROR_INVALID_NAME: caseERROR_FILENAME_EXCED_RANGE: queryByHandle= FALSE; break} } } if (queryByHandle){if (path->fd!=-1){closeFile= FALSE; hfile=_Py_get_osfhandle_noraise(path->fd)} else{DWORDflags=FILE_FLAG_BACKUP_SEMANTICS; if (!reparse){flags |= FILE_FLAG_OPEN_REPARSE_POINT} hfile=CreateFileW(path->wide, FILE_READ_ATTRIBUTES, 0, NULL, OPEN_EXISTING, flags, NULL)} if (hfile!=INVALID_HANDLE_VALUE){fileType=GetFileType(hfile); if (fileType!=FILE_TYPE_UNKNOWN||GetLastError() ==NO_ERROR){result= TRUE} if (closeFile||fileType==FILE_TYPE_DISK){FILE_ATTRIBUTE_TAG_INFOfati; FILE_BASIC_INFOfbi; if (GetFileInformationByHandleEx(hfile, FileAttributeTagInfo, &fati, sizeof(fati))){fileAttributes=fati.FileAttributes; if (fileAttributes&FILE_ATTRIBUTE_REPARSE_POINT){reparseTag=fati.ReparseTag} } elseif (GetFileInformationByHandleEx(hfile, FileBasicInfo, &fbi, sizeof(fbi))){fileAttributes=fbi.FileAttributes} } if (closeFile){CloseHandle(hfile)} } elseif (path->wide){intstatus; STRUCT_STATst; switch (GetLastError()){caseERROR_ACCESS_DENIED: caseERROR_SHARING_VIOLATION: caseERROR_CANT_ACCESS_FILE: caseERROR_INVALID_PARAMETER: if (reparse){status=STAT(path->wide, &st)} else{status=LSTAT(path->wide, &st)} if (status==0){result= TRUE; posixFileType=st.st_mode&S_IFMT; reparseTag=st.st_reparse_tag} } } } if (result){if (posixFileType){*pPosixFileType=posixFileType} else{*pPosixFileType=posixFileTypeFromFileInfo( fileType, fileAttributes, reparseTag)} if (pReparseTag){*pReparseTag=reparseTag} } returnresult}Example usage: staticPyObject*os__path_isdir_impl(PyObject*module, PyObject*path){BOOLsuccess; intfileType; path_t_path=PATH_T_INITIALIZE("isdir", "path", 0, 1); if (!path_converter(path, &_path)){path_cleanup(&_path); if (PyErr_ExceptionMatches(PyExc_ValueError)){PyErr_Clear(); Py_RETURN_FALSE} returnNULL} Py_BEGIN_ALLOW_THREADSsuccess=getPosixFileType(&_path, &fileType, NULL, TRUE, FALSE); Py_END_ALLOW_THREADSpath_cleanup(&_path); if (success&&fileType==S_IFDIR){Py_RETURN_TRUE} Py_RETURN_FALSE} staticPyObject*os__path_isfile_impl(PyObject*module, PyObject*path){BOOLsuccess; intfileType; path_t_path=PATH_T_INITIALIZE("isfile", "path", 0, 1); if (!path_converter(path, &_path)){path_cleanup(&_path); if (PyErr_ExceptionMatches(PyExc_ValueError)){PyErr_Clear(); Py_RETURN_FALSE} returnNULL} Py_BEGIN_ALLOW_THREADSsuccess=getPosixFileType(&_path, &fileType, NULL, FALSE, TRUE); Py_END_ALLOW_THREADSpath_cleanup(&_path); if (success&&fileType==S_IFREG){Py_RETURN_TRUE} Py_RETURN_FALSE} staticPyObject*os__path_exists_impl(PyObject*module, PyObject*path){BOOLsuccess; intfileType; path_t_path=PATH_T_INITIALIZE("exists", "path", 0, 1); if (!path_converter(path, &_path)){path_cleanup(&_path); if (PyErr_ExceptionMatches(PyExc_ValueError)){PyErr_Clear(); Py_RETURN_FALSE} returnNULL} Py_BEGIN_ALLOW_THREADSsuccess=getPosixFileType(&_path, &fileType, NULL, TRUE, TRUE); Py_END_ALLOW_THREADSpath_cleanup(&_path); if (success){Py_RETURN_TRUE} Py_RETURN_FALSE} staticPyObject*os__path_islink_impl(PyObject*module, PyObject*path){BOOLsuccess; intfileType; path_t_path=PATH_T_INITIALIZE("islink", "path", 0, 1); if (!path_converter(path, &_path)){path_cleanup(&_path); if (PyErr_ExceptionMatches(PyExc_ValueError)){PyErr_Clear(); Py_RETURN_FALSE} returnNULL} Py_BEGIN_ALLOW_THREADSsuccess=getPosixFileType(&_path, &fileType, NULL, FALSE, FALSE); Py_END_ALLOW_THREADSpath_cleanup(&_path); if (success&&fileType==S_IFLNK){Py_RETURN_TRUE} Py_RETURN_FALSE} staticPyObject*os__path_isjunction_impl(PyObject*module, PyObject*path){BOOLsuccess; intfileType; DWORDreparseTag; path_t_path=PATH_T_INITIALIZE("islink", "path", 0, 1); if (!path_converter(path, &_path)){path_cleanup(&_path); if (PyErr_ExceptionMatches(PyExc_ValueError)){PyErr_Clear(); Py_RETURN_FALSE} returnNULL} Py_BEGIN_ALLOW_THREADSsuccess=getPosixFileType(&_path, &fileType, &reparseTag, FALSE, FALSE); Py_END_ALLOW_THREADSpath_cleanup(&_path); if (success&&fileType==S_IFDIR&&reparseTag==IO_REPARSE_TAG_MOUNT_POINT){Py_RETURN_TRUE} Py_RETURN_FALSE} |
zooba commented Apr 17, 2023
I think we can probably refactor a few bits into The "should we use the slow path" switch can probably be moved out into the header file: There might be a similar refactoring that would only apply for these And I think we probably do want Eryk's |
zooba commented Apr 27, 2023
I'm happy with this, so let's take it. I'll create a new issue for the bug mentioned above - this PR doesn't actually impact it, and it doesn't matter whether it's fixed before or after. |
I have been able to run the regression tests on a Windows x64 machine with access to the api needed for
Py_GetFileInformationByNameand was able to verify that it did use the fast path in isdir, islink, isfile, & exists as well as pass the tests ran by rt.bat withrt.bat -p x64 -d -q -uall -u-cpu -rwW --slowest --timeout=1200 -j0.