Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
bpo-30891: importlib _find_and_load() try/except#2665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
mention-bot commented Jul 11, 2017
@Haypo, thanks for your PR! By analyzing the history of the files in this pull request, we identified @brettcannon, @ericsnowcurrently and @pitrou to be potential reviewers. |
vstinner commented Jul 11, 2017
I already proposed this change in my previous PR #2646, but @serhiy-storchaka complained that it would make import() slower. But @ncoghlan also noticed that previously, the whole function was protected by the global imp lock, whereas now "module = sys.modules[name]" is not protected by any lock and so there is a risk of a race condition. I agree, that's why I wrote this PR. See also discussion at PR #2646. |
vstinner commented Jul 11, 2017
If @serhiy-storchaka still considers that performance is more important that atomicity in this function, we can just move "module = sys.modules[name]" into the "with _ModuleLockManager(name):" block. |
serhiy-storchaka commented Jul 11, 2017
In common case You can use |
ncoghlan left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If anything this will make the "already loaded" case a tiny bit faster (since it skips the double name lookup), while making initial loads marginally slower (but likely imperceptibly slower relative to the overheads of actually reading files and directories from disk).
So +1 from me for going with the conventional solution of switching to relying on exception handling to help ensure consistency of state.
serhiy-storchaka commented Jul 11, 2017
The "already loaded" case already is made significantly faster by handling in C code. This code is executed in "still not loaded" case. |
ncoghlan commented Jul 11, 2017
@serhiy-storchaka The module lock means that in the presence of concurrent attempts at loading the module, the 2nd and subsequent threads will follow the "no KeyError" path. It occurs to me that this could use a comment after the with statement explaining the need for that path, though. There's also the micro-optimisation option you suggested on the original PR: creating a |
serhiy-storchaka commented Jul 11, 2017
A race condition is exceptional. It happens only when two or more threads simultaneously import the same module first time. Once the module is imported, all subsequent imports go by fast path in C code and don't have a race condition. This PR fixes even more rare case. When at the same time one thread unloads a module while other thread imports it (this case is similar to bpo-30876). I agree that this case is worth to be fixed, but not at the cost of more common cases. |
vstinner commented Jul 11, 2017
I don't want to add a sentinel object. I just moved "module = sys.modules[name]" in the "with _ModuleLockManager(name):" block. |
Lib/importlib/_bootstrap.py Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the branch that I think could use a comment to say "Another thread updated sys.modules while this one was waiting for the import lock"
ncoghlan left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note suggesting a clarifying comment around when the fallback path executes, but either way I'm fine with merging this approach.
vstinner commented Jul 11, 2017
Sorry, I don't understand which solution should be implemented:
|
serhiy-storchaka commented Jul 11, 2017
Both (1) and (3) are safe. (1) and (3) are equally safe if ignore some corner cases (non-string keys in sys.modules which raise KeyError on comparison). In normal case (3) is slightly faster than (1). |
ncoghlan commented Jul 11, 2017
The "name in" + "sys.modules[name]" case is fine if you're holding the module lock the entire time (hence why I switched my review to |
vstinner commented Jul 11, 2017
Ok, let's try sys.modules.get(). |
Lib/importlib/_bootstrap.py Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nick suggested the name _NEEDS_LOADING and it looks better to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Use sys.modules.get() in the "with _ModuleLockManager(name):" block to protect the dictionary key with the module lock and use an atomic get to prevent race condition. Remove also _bootstrap._POPULATE since it was unused (_bootstrap_external now has its own _POPULATE object), add a new _SENTINEL object instead.
serhiy-storchaka commented Jul 21, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Since this fixes an observable behavior, I think it needs a NEWS entry. |
vstinner commented Jul 21, 2017
Can you please elaborate? How replacing "name not in sys.modules" with "sys.modules.get(name, _NEEDS_LOADING)" changes the behaviour? sys.modules is a dummy dict, so there is no behaviour change, no? The main change is that it fixes a race condition, but this commit is the 3rd one in http://bugs.python.org/issue30891 and the first commit was made by you and already contains a NEWS item: It's enough to summarize the 3 commits, no? |
ncoghlan commented Jul 21, 2017
Right, I think the original NEWS entry is sufficient to cover all the changes - we're just iterating on the exact details of how that fix works. |
bedevere-bot commented Jul 21, 2017
GH-2801 is a backport of this pull request to the 3.6 branch. |
Use sys.modules.get() in the "with _ModuleLockManager(name):" block to protect the dictionary key with the module lock and use an atomic get to prevent race condition. Remove also _bootstrap._POPULATE since it was unused (_bootstrap_external now has its own _POPULATE object), add a new _SENTINEL object instead. (cherry picked from commit e72b135)
serhiy-storchaka commented Jul 21, 2017
Agree. |
importlib _find_and_load() now uses a "try/except KeyError" block to
check if the imported module is not sys.modules, rather than checking
for "name is sys.modules" and then get it with sys.modules[name].
This change should prevent a race condition.