bpo-45490: Convert unicodeobject.h macros to static inline functions#31221

vstinner · 2022-02-08T19:51:40Z

Convert unicodeobject.h macros to static inline functions.
Reorder functions to declare functions before their first usage.
Add "kind" variable to PyUnicode_READ_CHAR() and
PyUnicode_MAX_CHAR_VALUE() functions to only call PyUnicode_KIND()
once.
PyUnicode_KIND() now returns an "enum PyUnicode_Kind".
Simplify PyUnicode_GET_SIZE().
Add assertions to PyUnicode_WRITE() on the max value.
Add cast macros:
- _PyASCIIObject_CAST()
- _PyCompactUnicodeObject_CAST()
- _PyUnicodeObject_CAST()
The following functions are now declared as deprecated using
Py_DEPRECATED(3.3):
- PyUnicode_GET_SIZE()
- PyUnicode_GET_DATA_SIZE()
- PyUnicode_AS_UNICODE()
- PyUnicode_AS_DATA()
- The implementation of these functions disable deprecation
  warnings in their body.
PyUnicode_READ_CHAR() now uses PyUnicode_1BYTE_DATA(),
PyUnicode_2BYTE_DATA() and PyUnicode_4BYTE_DATA().
Replace "const PyObject*" with "PyObject*" in _decimal.c
and pystrhex.c: PyUnicode_READY() can modify the object.
Replace "const void *data" with "void *data" in some unicodedata.c
and unicodeobject.c functions which use PyUnicode_WRITE(): data is
used to modify the string.

https://bugs.python.org/issue45490

* Convert unicodeobject.h macros to static inline functions. * Reorder functions to declare functions before their first usage. * Add "kind" variable to PyUnicode_READ_CHAR() and PyUnicode_MAX_CHAR_VALUE() functions to only call PyUnicode_KIND() once. * PyUnicode_KIND() now returns an "enum PyUnicode_Kind". * Simplify PyUnicode_GET_SIZE(). * Add assertions to PyUnicode_WRITE() on the max value. * Add cast macros: * _PyASCIIObject_CAST() * _PyCompactUnicodeObject_CAST() * _PyUnicodeObject_CAST() * The following functions are now declared as deprecated using Py_DEPRECATED(3.3): * PyUnicode_GET_SIZE() * PyUnicode_GET_DATA_SIZE() * PyUnicode_AS_UNICODE() * PyUnicode_AS_DATA() * The implementation of these functions disable deprecation warnings in their body. * PyUnicode_READ_CHAR() now uses PyUnicode_1BYTE_DATA(), PyUnicode_2BYTE_DATA() and PyUnicode_4BYTE_DATA(). * Replace "const PyObject*" with "PyObject*" in _decimal.c and pystrhex.c: PyUnicode_READY() can modify the object. * Replace "const void *data" with "void *data" in some unicodedata.c and unicodeobject.c functions which use PyUnicode_WRITE(): data is used to modify the string.

vstinner · 2022-02-08T19:53:09Z

@erlend-aasland: All in one PR to convert (almost) all macros of Include/cpython/unicodeobject.h.

I created a single PR to show what can be done with PEP 670, but IMO it will be better to split this large PR into smaller PRs to ease review, and apply (minor) API changes / cleanup in following PRs (not do everything at once).

erlend-aasland · 2022-02-09T09:48:18Z

I created a single PR to show what can be done with PEP 670, but IMO it will be better to split this large PR into smaller PRs to ease review, and apply (minor) API changes / cleanup in following PRs (not do everything at once).

Sounds good.

erlend-aasland · 2022-02-09T10:09:46Z

IMO, this is a great improvement when it comes to readability/maintainability.

erlend-aasland · 2022-02-09T10:10:31Z

Modules/_decimal/_decimal.c

 Return NULL if malloc fails and an empty string if invalid characters
 are found. */
 staticchar*
-numeric_as_ascii(constPyObject*u, intstrip_ws, intignore_underscores)


Do we really need to remove const? Ditto for the rest of the PR.

If the "u" string is not ready, PyUnicode_READY() will modify it. It's not a read-only operation.
In Python 3.12, PyUnicode_WCHAR_KIND will be removed: https://www.python.org/dev/peps/pep-0623/
In the meanwhile, I prefer to not stop lying: we do modify the object :-)

gpshead

not a whole review, just dropping some notes.

gpshead · 2022-02-09T19:30:41Z

Include/cpython/unicodeobject.h

+return_PyASCIIObject_CAST(op)->length;
+}

 /* In the access macros below, "kind" may be evaluated more than once.


presumably update comments like these to just mention that they used to be macros with these caveats in previous python versions.

Oh thanks, I didn't look at comments at all. I laser focused on macros code and make sure that I don't change the code :-)

gpshead · 2022-02-09T19:32:47Z

Include/cpython/unicodeobject.h

+if (kind == PyUnicode_1BYTE_KIND){
+returnPyUnicode_1BYTE_DATA(unicode)[index];
+ }
+elseif (PyUnicode_KIND(unicode) == PyUnicode_2BYTE_KIND){


as a function this no longer needs to be called twice. (and the comment above becomes less true)

gpshead · 2022-02-09T20:30:22Z

Include/cpython/unicodeobject.h

 /* Use only if you know it's a string */
-#definePyUnicode_CHECK_INTERNED(op) \
- (((PyASCIIObject *)(op))->state.interned)
+staticinlineintPyUnicode_CHECK_INTERNED(PyObject *op){


to avoid the cast going away, consider doing what Py_INCREF did & add indirection through a macro for the cast:
#define PyUnicode_CHECK_INTERNED(op) \ _PyUnicode_CHECK_INTERNED(_PyASCIIObject_CAST(op))

The SC asked to not add such macro :-) You're right that without such macro, there is a risk of introducing new compiler warnings.
If possible I would prefer to keep PyObject* for functions in unicodeobject.c, since it's the type used for arguments in existing functions and the type returned by functions creating strings like PyUnicode_New(), PyUnicode_FromString(), etc.
Maybe for this specific header file, we can avoid casts.
For me, PyASCIIObject is an implementation detail which should be hidden. If possible, it should even be moved to the internal C API, but that's a way larger topic which may require a PEP ;-)

vstinner · 2022-02-09T22:05:35Z

The PyUnicode_KIND() now returns an "enum PyUnicode_Kind" change adds new warnings:

comparison of integer expressions of different signedness: ‘enum PyUnicode_Kind’ and ‘int’ [-Wsign-compare]

Changing PyUnicode_KIND() should be done in separated PR. I'm not sure what's the best return type for that. I would prefer to not add new compiler warnings!

vstinner · 2022-02-09T22:25:08Z

Macros not casting their arguments:

PyUnicode_1BYTE_DATA()
PyUnicode_2BYTE_DATA()
PyUnicode_4BYTE_DATA()
PyUnicode_AS_DATA()
PyUnicode_DATA()
PyUnicode_GET_DATA_SIZE()
PyUnicode_MAX_CHAR_VALUE()
PyUnicode_READ()
PyUnicode_READ_CHAR()
PyUnicode_WRITE()
_PyUnicodeWriter_Prepare()
_PyUnicodeWriter_PrepareKind()

Macro casting its argument to PyObject*:

PyUnicode_READY()

Macros using a cast in their implementation:

Cast to PyASCIIObject* (and sometimes to other types):
- PyUnicode_AS_UNICODE()
- PyUnicode_CHECK_INTERNED()
- PyUnicode_GET_LENGTH()
- PyUnicode_GET_SIZE()
- PyUnicode_IS_ASCII()
- PyUnicode_IS_COMPACT()
- PyUnicode_IS_COMPACT_ASCII()
- PyUnicode_IS_READY()
- PyUnicode_KIND()
- _PyUnicode_COMPACT_DATA()
Cast to PyUnicodeObject*:
- _PyUnicode_NONCOMPACT_DATA()

The majority of macros use PyObject* for its Python str object parameter.

PyUnicode_READ() and PyUnicode_WRITE() expect (kind, data, index) and (kind, data, index, value) arguments: no Python object.

vstinner · 2022-02-23T23:31:32Z

This PR was an example. I updated PEP 670 from the discussion on this PR. If PEP 670 is accepted, I will rewrite this PR with smaller changes to ease the review.

vstinner added the skip news label Feb 8, 2022

the-knights-who-say-ni added the CLA signed label Feb 8, 2022

bedevere-bot added the awaiting core review label Feb 8, 2022

erlend-aasland reviewed Feb 9, 2022
View reviewed changes

gpshead reviewed Feb 9, 2022
View reviewed changes

vstinner mentioned this pull request Feb 21, 2022
PEP 670: clarify cast; don't change return type python/peps#2349
Merged

vstinner closed this Feb 23, 2022

vstinner deleted the unicode_static_inline branch February 23, 2022 23:30

This was referenced Apr 19, 2022
[C API] PEP 670: Convert macros to functions in the Python C API #89653
Closed
gh-89653: PEP 670: Convert unicodeobject.h macros to functions #91696
Closed

Uh oh!

bpo-45490: Convert unicodeobject.h macros to static inline functions#31221

bpo-45490: Convert unicodeobject.h macros to static inline functions #31221

Uh oh!

Conversation

vstinner commented Feb 8, 2022• edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner commented Feb 8, 2022

Uh oh!

erlend-aasland commented Feb 9, 2022

Uh oh!

erlend-aasland commented Feb 9, 2022

Uh oh!

erlend-aaslandFeb 9, 2022• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vstinnerFeb 9, 2022

Choose a reason for hiding this comment

Uh oh!

gpshead left a comment

Choose a reason for hiding this comment

Uh oh!

gpsheadFeb 9, 2022

Choose a reason for hiding this comment

Uh oh!

vstinnerFeb 9, 2022

Choose a reason for hiding this comment

Uh oh!

gpsheadFeb 9, 2022

Choose a reason for hiding this comment

Uh oh!

gpsheadFeb 9, 2022

Choose a reason for hiding this comment

Uh oh!

vstinnerFeb 9, 2022

Choose a reason for hiding this comment

Uh oh!

vstinner commented Feb 9, 2022

Uh oh!

vstinner commented Feb 9, 2022

Uh oh!

vstinner commented Feb 23, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vstinner commented Feb 8, 2022•
edited by bedevere-bot
Loading

erlend-aaslandFeb 9, 2022•
edited
Loading