- Notifications
You must be signed in to change notification settings - Fork 78
Add timedelta, timedelta64 and datetime64 plus respective conversions#509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base:main
Are you sure you want to change the base?
Uh oh!
There was an error while loading. Please reload this page.
Conversation
hhaensel commented Jun 16, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
hhaensel commented Jun 27, 2024
@cjdoris what do you think about this PR? |
hhaensel commented Aug 15, 2024
@cjdoris still hoping to get this integrated ... |
MilesCranmer commented Aug 23, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Just chiming in with some comments: 1: Could you add some tests for this? This adds a lot of new features so I think should have testing to cover everything. Ideally the test coverage of the diff should be 100%. It should also cover your intended usecase with
as this seems subtle. 2: Is the 3-arg version of the |
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
src/Convert/numpy.jl Outdated
| year::Int=_year, month::Int=_month, day::Int=_day, hour::Int=_hour, minute::Int=_minute, second::Int=_second, | ||
| millisecond::Int=_millisecond, microsecond::Int=_microsecond, nanosecond::Int=_nanosecond | ||
| ) | ||
| pyimport("numpy").datetime64("$(DateTime(year, month, day, hour, minute, second))") + pytimedelta64(;millisecond, microsecond, nanosecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is pyimport("numpy") the correct API call, or is that just to be used in user packages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw similar calls at different places in the package, so I took this approach. But I also wouldn't know how to code a timedelta64 without calling pyimport.
Please let me know if there's a better solution.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
src/Convert/numpy.jl Outdated
| T = types[findfirst(==(unit), units)] | ||
| pyconvert_return(CompoundPeriod(T(value * count)) |> canonicalize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proper way to do this would be to use Base.Cartesian.@nif. That way you could write this code to avoid dynamic dispatch on types (which will be very slow).
| units = ("Y", "M", "W", "D", "h", "m", "s", "ms", "us", "ns") | ||
| types = (Year, Month, Week, Day, Hour, Minute, Second, Millisecond, Microsecond, Nanosecond) | ||
| T = types[findfirst(==(unit), units)] | ||
| pyconvert_return(DateTime(_base_datetime) + T(value * count)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to other comment – you should write this using Base.Cartesian.@nif over the types tuple to avoid dynamic dispatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, tested with a julia function calls and found that the julia part is around 25ns, whereas the python call is around 1.5microsecond
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
src/Convert/numpy.jl Outdated
| _year::Int=0, _month::Int=0, _day::Int=0, _hour::Int=0, _minute::Int=0, _second::Int=0, _millisecond::Int=0, _microsecond::Int=0, _nanosecond::Int=0, _week::Int=0; | ||
| year::Int=_year, month::Int=_month, day::Int=_day, hour::Int=_hour, minute::Int=_minute, second::Int=_second, microsecond::Int=_microsecond, millisecond::Int=_millisecond, nanosecond::Int=_nanosecond, week::Int=_week) | ||
| pytimedelta64(sum(( | ||
| Year(year), Month(month), # you cannot mix year or month with any of the below units in python, the error will be thrown by `pytimedelta64(::CompoundPeriod)` |
MilesCranmerAug 23, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment
you cannot mix year or month with any of the below units in python, the error will be thrown by
pytimedelta64(::CompoundPeriod)
Should be presented to the user as a descriptive error message rather than a comment in the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the comment isn't clear enough.
Python throws a well understandable descriptive error in case of wrong usage, so no need for us to do so. Agree?
| function pyconvert_rule_timedelta64(::Type{CompoundPeriod}, x::Py) | ||
| unit, count = pyconvert(Tuple, pyimport("numpy").datetime_data(x)) | ||
| value = reinterpret(Int64, pyconvert(Vector, x))[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is reinterpret safe here? Is there a better alternative to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought, pyconvert creates a new Julia Vector which is not mapped onto Python data. If that would be the case, we'd need to wrap the vector by a copy().
src/Convert/numpy.jl Outdated
| for T in (CompoundPeriod, Year, Month, Day, Hour, Minute, Second, Millisecond, Microsecond, Nanosecond, Week) | ||
| pyconvert_add_rule("numpy:timedelta64", T, pyconvert_rule_timedelta64, priority) | ||
| end |
MilesCranmerAug 23, 2024 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Julia is unlikely to unroll this loop, you should use Base.Cartesian.@nexprs here to avoid dynamic dispatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried my best, but I'm not sure how to test whether this will speed up things
src/Convert/numpy.jl Outdated
| args = T .== (Day, Second, Millisecond, Microsecond, Minute, Hour, Week) | ||
| pydatetime64(x.value .* args...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to rewrite this with Base.Cartesian.@nif rather than doing a masked sum, since you know there will be only 1 element in the sum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sum is dammed fast (16ns), and I couldn't beat it with a different version.
hhaensel commented Aug 23, 2024
Thanks for all the comments, I already learned some new things about Julia, which is always nice. I will go through them the next days and once we've agreed on the solutions write the test functions. Just want to re-raise my question from the previous PR. Do you think, we should stick with CompoundPeriod as default conversion type? |
Co-authored-by: Miles Cranmer <miles.cranmer@gmail.com>
MilesCranmer commented Jan 19, 2025
I see, thanks. In the meantime can you add tests and docs for the code changes? Ideally we should have 100% test coverage. |
hhaensel commented Jan 19, 2025
I did add tests for pytimedelta. using CondaPk CondaPkg.add("pandas")which I think is a reasonable approach. |
MilesCranmer commented Jan 20, 2025
Were the test files not committed? I don’t see any added tests |
hhaensel commented Jan 20, 2025
hhaensel commented Jan 20, 2025
When writing the tests for pytimedelta and pytimedelta64 I realised that my choice for keyword argument naming might be confusing for users. pytimedelta(second =1)but pyimport("datetime").timedelta(seconds =1)My choice was inspired from the Dates API (e.g. |
append 's' to keywords in pytimedelta and pytimedelta64
hhaensel commented Jan 20, 2025 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Changed the keywords. Tests should pass as soon as the other two PRs are merged. |
…me}), remove pydatetime64(::CompoundPeriod)
…nicalize, add global setting CANONICALIZE_TIMEDELTA64 for rule conversion
hhaensel commented Jan 21, 2025
Added functionality that the unit of pytimedelta64() is defined by the least unit specified, also for the case 0. julia> pytimedelta64(Year(0)) Python: np.timedelta64(0,'Y') julia> pytimedelta64(Week(0)) Python: np.timedelta64(0,'W') julia> pytimedelta64(Millisecond(0)) Python: np.timedelta64(0,'ms')and julia> pyconvert(Dates.CompoundPeriod, pytimedelta64(Second(60))) 60 seconds julia> pyconvert(Dates.CompoundPeriod, pytimedelta64(Second(60), canonicalize =true)) 1 minuteand julia> pyconvert(Dates.CompoundPeriod, pytimedelta64(Second(60))) 60 seconds julia> PythonCall.Convert.CANONICALIZE_TIMEDELTA64[] =true true julia> pyconvert(Dates.CompoundPeriod, pytimedelta64(Second(60))) 1 minute |
hhaensel commented Jan 21, 2025
I hope you find these changes consistent. |
hhaensel commented Jan 21, 2025
There's an error for CondaPkg on Windows, but in principle, the tests pass. |
hhaensel commented Jan 21, 2025
I cherry-picked #589 in order to proceed with testing.
The docs are probably the only remaining topic. |
hhaensel commented Feb 19, 2025
@cjdoris@MilesCranmer Ping! Do you think we can finalize this PR? Just pushed two (final?) commits so that we now have
|
Beforerr commented Mar 4, 2025
Just friendly wondering why this could not be merged? |
MilesCranmer commented Mar 4, 2025
Would need @cjdoris to review first. I know he has been very busy lately, but I am sure he will review it as soon as he has time |
This PR replaces #334 and takes into account the major refactoring of PythonCall.
Particularly, it fixes#293.
What's new?
Python Constructors
Conversion to Julian types
DataFrame handling
I've set the priority of datetime, timedelta, datetime64 and timedelta64 to ARRAY, which allows for automatic Table conversion - I hope that's the intended way to do it.
Default Conversion
I chose to use
Dates.CompoundPeriodas result type of default conversion from timedelta64 as both types support year, month and minor period units. This is debatable, we could also change it toPeriod, hence the resulting type would depend on the input.Both Python and Julia do not convert between Year/Month and the other period types, so there is no danger with this choice to arrive at ill-determined intervals.
The difference is that Julia allows addition/subtraction of mixed types while Python/Numpy throws an error.
The difference to the previous PR is that all conversions rely on either builtin or numpy functions and do not use pandas.
Ordering of arguments for
pytimedelta()was chosen to be identical to the python version, while ordering forpytimedelta64()is strictly descending, exceptweekwhich comes last.EDIT: add comments in what's new code, added conversions
EDIT2: removed comment about datetime_data, I had misunderstood the meaning and have updated the code