Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34k
bpo-1635741: Port _datetime module to multiphase initialization (PEP 489)#19122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
phsilva commented Mar 23, 2020 • edited by bedevere-bot
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by bedevere-bot
Uh oh!
There was an error while loading. Please reload this page.
d45b8bd to 4b620ecComparepganssle commented Mar 24, 2020
I'm not sure I totally understand what this multi-phase initialization is about, so it's hard for me to judge this change. PEP 489 gives a few reasons to adopt multi-phase initialization: subinterpreter support, making it easier to reload the module and storing arbitrary module-level resources in a garbage-collected manner. It also, curiously, says that in Py3 extension modules are not being added to >>>importsys>>>importdatetime>>>"datetime"insys.modulesTrue>>>"_datetime"insys.modulesTrueFor the other reasons, I'm not sure if this PR achieves them by simply mechanically separating the exec and init steps, because we still have things like the population of the global C API capsule. Do we have to do anything special for that? Is there any way to write a test that would check that we're doing this right? I'm fine with migrating it over, but it's hard to tell if it's being done correctly if I don't have a clear idea of what we're looking to achieve here and what the consequences could be. Interestingly, I was just reading through PEP 489 the other day and considering porting over |
vstinner commented Mar 24, 2020
I would prefer to first land PR #19119 and then rebase this one on top of it. |
4b620ec to 0ac3031Comparephsilva commented Mar 25, 2020
vstinner left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the two issues that I spotted.
cc @corona10: that's a non-trivial module ;-)
Modules/_datetimemodule.c Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is wrong: if datetimemodule_exec() is executed more than once, the constant is initialized multiple times. I suggest to fix this code step by step. First step: only set a variable if it's NULL.
Something like:
if (!us_per_ms){us_per_ms = PyLong_FromLong(1000)} (...) if (us_per_ms == NULL || us_per_second == NULL || us_per_minute == NULL || seconds_per_day == NULL){return -1} Later we may put these constants in a new module state, so it would become possible to clear them at exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Modules/_datetimemodule.c Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code path leaks x reference if PyDict_SetItemString() fails. I propose to not fix this bug in this PR. Let's move step by step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, not fixing for now. Will add to the summary of future refactors.
Modules/_datetimemodule.c Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Static types are not really compatible with having two instances of the same C extension module.
Moreover, this function sets attributes of the type dictionary each time datetimemodule_exec() is called. Problem: type attributes are cached. If you run test_datetime twice, each run creates a fresh _datetime module: ./python -m test -v test_datetime test_datetime. But many tests fail at the second run. A workaround is to call explicitly PyType_ClearCache(); at the function exit wih a comment explaining that it's a workaround for static types.
Note: when running test_datetime, test_divide_and_round() fails: see https://bugs.python.org/issue40058 (it's not a regression of this PR).
Another way to see the bug:
import sys for i in range(1, 3): print(f"== run #{i} ==") import _datetime print("resolution", _datetime.timedelta.resolution) print("min", _datetime.timedelta.min) print("max", _datetime.timedelta.max) _datetime = None del sys.modules['_datetime'] Output:
== run #1 == resolution 0:00:00.000001 min -999999999 days, 0:00:00 max 999999999 days, 23:59:59.999999 == run #2 == resolution -999999999 days, 0:00:00 min 999999999 days, 23:59:59.999999 max 0001-01-01 When a second instance of _datetime is created, _datetime.timedelta.max becomes a date object instead of a timedelta object. It's because the type cache isn't updated when the dict is modified directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added PyType_ClearCache workaround, will add the conversion to use heap-allocated PyTypeObjects to future PR.
bedevere-bot commented Mar 25, 2020
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
vstinner commented Mar 25, 2020
@pganssle: With this change, when a second _datetime instance is created: the exec method is executed again. Right now, Modules/_datetimemodule.c still uses static types, so the two instances still use the same types. Example with this change applied: This PR prepares the code to be able to switch to types allocated on the heap. One problem of the datetime module is that it exposes its internals through a PyCapsule object. If it becomes possible to unload the C extension module and destroy its types, the PyCapsule may be made of dangling pointers... To switch to types allocated on the heap, the PyCapsule object should use strong references and use a destructor. |
Modules/_datetimemodule.c Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that it's correct to override the capsule if PyInit__datetime() is called twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CAPI is not fully static, there is a heap-allocated type that is created on exec. Not sure we can change this right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we change it?
If it's important that we have two separate PyCapsule_New objects, doesn't that mean that if you have a C extension running in two subinterpreters that references this capsule, they'll get a different value for PyDateTimeAPI depending on when they look up the reference rather than depending on which subinterpreter they are running from?
Not sure I see a way around this without a breaking change to the API.
0ac3031 to 9a42428Comparephsilva commented Mar 26, 2020
SummaryFixed:
For future PR:
cc: @vstinner |
phsilva commented Mar 26, 2020
I have made the requested changes; please review again. |
bedevere-bot commented Mar 26, 2020
Thanks for making the requested changes! @vstinner: please review the changes made to this pull request. |
phsilva commented Apr 1, 2020
@vstinner do you think we can proceed with the changes so far? In the previous comment I detailed the current status. |
pganssle left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work on this @phsilva.
Are there any tests you can add to ensure that this is working correctly?
I am not sure I can recommend putting them in datetimetester.py, because it does some funky stuff with imports, but a temporary test file (until we can sort out the datetimetester imports issue), or possibly some existing test file intended to test modules for PEP 489 compliance would be good.
I note that @vstinner already has at least one example of a test you can perform: induce two fresh imports of datetime and check that the constants are all right.
| returnm; | ||
| return-1; | ||
| // Workaround to allow static PyTypeObjects to have they dict redefined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Workaround to allow static PyTypeObjects to have they dict redefined. | |
| // Workaround to allow static PyTypeObjects to have their dict redefined. |
| PyModuleDef_HEAD_INIT, | ||
| "_datetime", | ||
| "Fast implementation of the datetime type.", | ||
| 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that this implies that the module is safe for subinterpreters, but this module still has several process-wide globals. Is it actually appropriate to set m_size to 0 in this case?
(Yes, I realize that it cannot be -1 if you want multi-phase initialization).
| return-1; | ||
| // Workaround to allow static PyTypeObjects to have they dict redefined. | ||
| // XXX: to be removed once _datetime moves to heap allocated PyTypeObjects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there negative consequences to this if we don't move to heap allocated PyTypeObjects?
Are there any downsides to using heap allocated types?
I don't want us to be in a situation where we have some workaround with negative consequences that we end up not being able to remove because we can't convert to heap allocated types.
Modules/_datetimemodule.c Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we change it?
If it's important that we have two separate PyCapsule_New objects, doesn't that mean that if you have a C extension running in two subinterpreters that references this capsule, they'll get a different value for PyDateTimeAPI depending on when they look up the reference rather than depending on which subinterpreter they are running from?
Not sure I see a way around this without a breaking change to the API.
bedevere-bot commented Apr 1, 2020
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
phsilva commented Apr 1, 2020
@pganssle Thanks for the extensive review. I will address each issue and let you know. |
vstinner commented Sep 8, 2020
The PyCapsule C API must be enhanced first, before we can fix this issue. The API should be modified to pass the module instance. I close the PR. If someone wants to work on that, please open a new issue about the PyCapsule C API. |
yol commented Nov 9, 2020
I can definitely confirm that this PR fixes bugs with the _datetime module and subinterpreters, so it would be awesome to have the module ported 👍 |
https://bugs.python.org/issue1635741