Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 33.9k
GH-101291: Rearrange the size bits in PyLongObject#102464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-101291: Rearrange the size bits in PyLongObject #102464
Uh oh!
There was an error while loading. Please reload this page.
Conversation
markshannon commented Mar 6, 2023 • edited by gvanrossum
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by gvanrossum
Uh oh!
There was an error while loading. Please reload this page.
…Long_SignedDigitCount which might not be optimal, but is safe.
…ion of immortal ints and tagged medium ints.
gvanrossum left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As my flight might depart soon here's a first batch of review comments. Still to do longobject.c, and some modules.
Tools/build/umarshal.py Outdated
| ifnot (0<=n<=self.end-self.pos): | ||
| print(n, self.end, self.pos) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove debug print()?
| ifnot (0<=n<=self.end-self.pos): | |
| print(n, self.end, self.pos) |
| @@ -0,0 +1,7 @@ | |||
| Rearrage bits in first field (after header) of PyLongObject. * Bits 0 and 1: | |||
| 1- sign. I.e. 0 for positive numbers, 1 for zero and 2 for negative numbers. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use consistent spacing around binary -.
| 1- sign. I.e. 0 for positive numbers, 1 for zero and 2 for negative numbers. | |
| 1- sign. I.e. 0 for positive numbers, 1 for zero and 2 for negative numbers. |
Include/internal/pycore_long.h Outdated
| return (a->long_value.lv_tag | b->long_value.lv_tag) < (2 << NON_SIZE_BITS); | ||
| } | ||
| /* The value returned by this function will have at least one bit to spare, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"one bit to spare" feels ambiguous, since the return type is signed -- is the spare bit the sign bit, or should there be at least one additional spare bit? (I know in practice we have 4 spare bits including the sign, but still, I'm not sure whether a 63-bit digit would be acceptable or not, from this description (or others).)
| staticinlinebool | ||
| _PyLong_IsPositive(constPyLongObject*op) | ||
| { | ||
| return (op->long_value.lv_tag&SIGN_MASK) ==0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not have #define SIGN_POSITIVE 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want these functions to be the only way to determine the sign.
Defining SIGN_POSITIVE will just encourage people to do the test elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, fine. Next question: maybe we also need a _PyLong_IsNonZero? I see !_PyLong_IsZero a lot, and the ! is easily missed. (Or maybe that's just my old eyes.) Possibly also IsNonNegative and IsNonPositive.
Include/internal/pycore_long.h Outdated
| returnop->long_value.lv_tag >> NON_SIZE_BITS; | ||
| } | ||
| /* Equivalent to _PyLong_DigitCount(op) * _PyLong_NonZeroSign(op) */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it this is for algorithms where the old "signed size" representation worked well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is for code that uses the "signed size" representation.
I make no judgement as to how well it works 🙂
| return (a->long_value.lv_tag&SIGN_MASK) == (b->long_value.lv_tag&SIGN_MASK); | ||
| } | ||
| #defineTAG_FROM_SIGN_AND_SIZE(sign, size) ((1 - (sign)) | ((size) << NON_SIZE_BITS)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe add a comment that this macro should only be used with literal or size_t arguments?
Include/internal/pycore_long.h Outdated
| #defineTAG_FROM_SIGN_AND_SIZE(sign, size) ((1 - (sign)) | ((size) << NON_SIZE_BITS)) | ||
| staticinlinevoid | ||
| _PyLong_SetSignAndSize(PyLongObject*op, intsign, Py_ssize_tsize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this use DigitCount instead of Size, for consistency with earlier APIs? Same for the next one.
Include/internal/pycore_long.h Outdated
| staticinlinevoid | ||
| _PyLong_FlipSign(PyLongObject*op){ | ||
| unsigned intflipped_sign=2- (op->long_value.lv_tag&SIGN_MASK); | ||
| op->long_value.lv_tag &= ~7; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want to use some defined name instead of hardcoding 7? Perhaps ~((1 << NON_SIZE_BITS) - 1).
| returnPyLong_FromLong(i_result); | ||
| } | ||
| if (PyLong_CheckExact(item) ||PyBool_Check(item)){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GitHub warning is at the wrong line, it applies to the PyLong_FromLong(i_result) two lines up. It does seem to warrant some attention. Similar below.
| { | ||
| if (PyLong_Check(v) &&PyLong_Check(w) &&Py_SIZE(v) &&Py_SIZE(w)){ | ||
| if (PyLong_Check(v) &&PyLong_Check(w) && | ||
| !_PyLong_IsZero((PyLongObject*)v) && !_PyLong_IsZero((PyLongObject*)w) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need another convenience macro IsNonZero.
| staticinlineint | ||
| _PyLong_CompactSign(constPyLongObject*op) | ||
| { | ||
| assert(PyLong_Check(op)); | ||
| assert(_PyLong_IsCompact(op)); | ||
| return1- (op->long_value.lv_tag&SIGN_MASK); | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be the new implementation of _PyLong_Sign(), if _PyLong_NonCompactSign() is removed? This gets rid of a branch in the proposed version of _PyLong_Sign().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They sure look identical to me. Maybe Mark has plans and maybe the compiler would optimize this anyway?
if (P(x)) return F(x); else return F(x); could just become return F(x);.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want the freedom to implement the "compact" and non-compact forms differently.
They have the same implementation at the moment, but that will change.
_PyLong_Sign() is part of the ABI, so we need to retain it. But almost all code using _PyLong_Sign() actually wants to know if an int is negative and should be using _PyLong_IsNegative().
gvanrossum left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the rest. I went over every diff chunk in longobject.c. Let's get this merged...
| Rearrage bits in first field (after header) of PyLongObject. * Bits 0 and 1: | ||
| 1 - sign. I.e. 0 for positive numbers, 1 for zero and 2 for negative numbers. | ||
| * Bit 2 reserved (probably for the immortal bit) * Bits 3+ the unsigned | ||
| size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format as bullets?
| Rearrage bits in first field (after header) of PyLongObject. * Bits 0 and 1: | |
| 1 - sign. I.e. 0 for positive numbers, 1 for zero and 2 for negative numbers. | |
| * Bit 2 reserved (probably for the immortal bit) * Bits 3+ the unsigned | |
| size. | |
| Rearrage bits in first field (after header) of PyLongObject: | |
| * Bits 0 and 1: 1 - sign. I.e. 0 for positive numbers, 1 for zero and 2 for negative numbers. | |
| * Bit 2 reserved (probably for the immortal bit). | |
| * Bits 3+ the unsigned size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. I suspect it got reformatted by something.
| assert(PyLong_Check(value)); | ||
| neg=_PyLong_IsNegative((PyLongObject*)value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put the blank line back.
| assert(PyLong_Check(value)); | |
| neg=_PyLong_IsNegative((PyLongObject*)value); | |
| assert(PyLong_Check(value)); | |
| neg=_PyLong_IsNegative((PyLongObject*)value); |
Include/internal/pycore_long.h Outdated
| /* Like _PyLong_DigitCount but asserts that op is non-negative */ | ||
| staticinlinePy_ssize_t | ||
| _PyLong_UnsignedDigitCount(constPyLongObject*op) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not excited about this name; I keep having to look up how it differs from _PyLong_DigitCount, and it's not really related to _PyLong_SignedDigitCount. :-( Maybe _PyLong_NonNegativeDigitCount? Or perhaps better _PyLong_DigitCountOfNonNegative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed this for the extra check during implementation. _PyLong_UnsignedDigitCount is now the same as _PyLong_DigitCount, and should remain so.
I'll remove it.
Objects/longobject.c Outdated
| * care (see comment above). | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidental reformat?
| *care (seecommentabove). | |
| */ | |
| *care (seecommentabove). | |
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| } while(0) | ||
| for (i=Py_SIZE(b) -1; i >= 0; --i){ | ||
| for (i=_PyLong_SignedDigitCount(b) -1; i >= 0; --i){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mybe we can prove that b is nonnegative here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe.
I'm not going to change any algorithms in this PR.
Hopefully, the more explicit semantics of the new API will allow someone to make some improvements in a future PR.
Objects/longobject.c Outdated
| shift=bsize >> 1; | ||
| if (kmul_split(a, shift, &ah, &al) <0) goto fail; | ||
| assert(Py_SIZE(ah) >0); /* the split isn't degenerate */ | ||
| assert(_PyLong_UnsignedDigitCount(ah) >0); /* the split isn't degenerate */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should just check the sign, right?
| assert(_PyLong_UnsignedDigitCount(ah)>0); /* the split isn't degenerate */ | |
| assert(_PyLong_IsPositive(ah)); /* the split isn't degenerate */ |
Same below several occurrences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
| staticinlinebool | ||
| _PyLong_IsPositive(constPyLongObject*op) | ||
| { | ||
| return (op->long_value.lv_tag&SIGN_MASK) ==0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, fine. Next question: maybe we also need a _PyLong_IsNonZero? I see !_PyLong_IsZero a lot, and the ! is easily missed. (Or maybe that's just my old eyes.) Possibly also IsNonNegative and IsNonPositive.
Objects/longobject.c Outdated
| returnNULL; | ||
| memset(z->long_value.ob_digit, 0, Py_SIZE(z) *sizeof(digit)); | ||
| memset(z->long_value.ob_digit, 0, _PyLong_UnsignedDigitCount(z) *sizeof(digit)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since z was just created with a nonnegative size:
| memset(z->long_value.ob_digit, 0, _PyLong_UnsignedDigitCount(z) *sizeof(digit)); | |
| memset(z->long_value.ob_digit, 0, _PyLong_DigitCount(z) *sizeof(digit)); |
gvanrossum left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go for it!
sobolevn commented Mar 23, 2023
I've opened #102940 to fix two new warnings from this PR :) |
…2464) * Eliminate all remaining uses of Py_SIZE and Py_SET_SIZE on PyLongObject, adding asserts. * Change layout of size/sign bits in longobject to support future addition of immortal ints and tagged medium ints. * Add functions to hide some internals of long object, and for setting sign and digit count. * Replace uses of IS_MEDIUM_VALUE macro with _PyLong_IsCompact().
scoder commented Mar 28, 2023
What about this part? Looks like it was dropped on the floor along the way. |
| * 0-1: Sign bits value = (1-sign), ie. negative=2, positive=0, zero=1. | ||
| * 2: Reserved for immortality bit |
eduardo-elizondoApr 7, 2023 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need an immortality flag here, but we do need a static flag (immortality should be marked by the refcount and this marks if the object is static or not. Using this, we can do the static check at dealloc time to prevent the deallocation of the objects
| staticinlineint | ||
| _PyLong_IsNonNegativeCompact(constPyLongObject*op){ | ||
| assert(PyLong_Check(op)); | ||
| returnop->long_value.lv_tag <= (1 << NON_SIZE_BITS); |
eduardo-elizondoApr 7, 2023 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work if we set the second (immortal/static) bit, i.e: the immortal small int 1 since it will have an lv_tag of 1100 and return an incorrect value here.
I'll create a new PR to restructure this a bit to make it work with the new bit flag.
…2464) * Eliminate all remaining uses of Py_SIZE and Py_SET_SIZE on PyLongObject, adding asserts. * Change layout of size/sign bits in longobject to support future addition of immortal ints and tagged medium ints. * Add functions to hide some internals of long object, and for setting sign and digit count. * Replace uses of IS_MEDIUM_VALUE macro with _PyLong_IsCompact().
| CAUTION: Generic code manipulating subtypes of PyVarObject has to | ||
| aware that ints abuse ob_size's sign bit. | ||
| */ |
verhovskyApr 29, 2023 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You didn't update this comment that documents _longobject, it's still talking about ob_size and PyVarObject
/* Long integer representation. The absolute value of a number is equal to SUM(for i=0 through abs(ob_size)-1) ob_digit[i] * 2**(SHIFT*i)
This PR rearranges the bits in what was
ob_size, to slightly speedup the most common operations and to prepare for storing the tagged 2-complement value directly in a future PR.The new layout is as follows:
The bulk of the change is removing all the uses of
Py_SIZEandPy_SETSIZE, and replacing them with a new set of inline functions.It disturbs me how much we use unchecked casts, but that's a separate issue...
This will, inevitably, break Cython generated code again.
Performance measurement shows no significant change: https://github.com/faster-cpython/benchmarking/tree/main/results/bm-20230302-3.12.0a5%2B-ce6bfb2