Skip to content

Conversation

@vstinner
Copy link
Member

@vstinnervstinner commented May 19, 2024

@vstinnervstinnerforce-pushed the WIP_unicode_writer branch from fd7432e to b12f085CompareJune 7, 2024 19:33
@vstinner
Copy link
MemberAuthor

@erlend-aasland@serhiy-storchaka@encukou: Do you want to review this change?

@serhiy-storchakaserhiy-storchaka self-requested a review June 7, 2024 20:45
Comment on lines +1568 to +1570
*str* must be Python :class:`str` object. *start* must be greater than or
equal to 0, and less than or equal to *end*. *end* must be less than or
equal to *str* length.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit; I prefer to use SemBr for paragraphs like this.

Suggested change
*str* must be Python :class:`str` object. *start* must be greater than or
equal to 0, and less than or equal to *end*. *end* must be less than or
equal to *str* length.
*str* must be Python :class:`str` object.
*start* must be greater than or equal to 0,
and less than or equal to *end*.
*end* must be less than or equal to *str* length.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL that this is called SemBr!

Breaking on comma may be too much, but I prefer to break at the sentence boundary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. You don't need to break at comma, but I often do to minimise future diffs.

Alternative suggestion:

Suggested change
*str* must be Python :class:`str` object. *start* must be greater than or
equal to 0, and less than or equal to *end*. *end* must be less than or
equal to *str* length.
*str* must be Python :class:`str` object.
*start* must be greater than or equal to 0, and less than or equal to *end*.
*end* must be less than or equal to *str* length.

vstinnerand others added 2 commits June 10, 2024 10:02
Co-authored-by: Erlend E. Aasland <[email protected]>
Co-authored-by: Erlend E. Aasland <[email protected]>
@vstinner
Copy link
MemberAuthor

@erlend-aasland: I addressed your review. Would you mind to review the updated PR?

Co-authored-by: Serhiy Storchaka <[email protected]>
Copy link
Member

@serhiy-storchakaserhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except using TypeError and ValueError instead of SystemError, LGTM.

But I do not insist if other core developers prefer TypeError and ValueError. I just think that SystemError is more appropriate for programming errors in using the C API.

serhiy-storchaka
serhiy-storchaka previously approved these changes Jun 10, 2024
Copy link
Member

@malemburgmalemburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more general question in the light of the free threading patches: Are these writer APIs thread-safe ?

@vstinner
Copy link
MemberAuthor

A more general question in the light of the free threading patches: Are these writer APIs thread-safe ?

I did nothing to make them safe, so I would say that they are not thread safe. You must not share a writer between two threads. Use one writer per thread.

@vstinner
Copy link
MemberAuthor

@malemburg@serhiy-storchaka: I modified the implementation to make write operations atomic. Either the whole string is written, or "nothing is written".

In the doc, I wrote "On error, leave the writer unchanged". In practice, it's more subtle, the internal buffer can be enlarged, or its kind (UCS1, UCS2 or UCS4) can change. But from the API consumer point of view, it's as if nothing was written.

I added two unit tests to show that it's still possible to use a writer after an error.

The hack is just to save/restore writer->pos internally in the 2 functions which are not atomic: WriteUTF8() and WriteFormat().

@serhiy-storchakaserhiy-storchaka self-requested a review June 10, 2024 16:14
@serhiy-storchakaserhiy-storchaka dismissed their stale reviewJune 10, 2024 16:15

I withdraw my approve because writing cannot be atomic.

Copy link
Contributor

@erlend-aaslanderlend-aasland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice; thanks!

}


PyObject*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: inconsistent with the rest of the PR.

Suggested change
PyObject*
PyObject*

Copy link
Member

@picnixzpicnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also took the liberty of exhaustively point where the 2-blank lines rule is missing (the file is huge so committing directly might help).

@vstinner
Copy link
MemberAuthor

@picnixz: I appreciate your remarks about coding style, but IMO you're going too far. I marked your "two empty lines" suggestions as resolved. This PR is already approved, twice. It's a complex API being discussed for 1 month and I would prefer to not waste my time on coding style. However, I fixed the "an Unicode" typos, thanks :-)

@vstinnervstinner merged commit 5c4235c into python:mainJun 17, 2024
@vstinnervstinner deleted the WIP_unicode_writer branch June 17, 2024 15:10
@vstinner
Copy link
MemberAuthor

The C API Working Group agreed to start with this minimum PyUnicodeWriter API. I merged my PR.

mrahtz pushed a commit to mrahtz/cpython that referenced this pull request Jun 30, 2024
noahbkim pushed a commit to hudson-trading/cpython that referenced this pull request Jul 11, 2024
estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

@vstinner@serhiy-storchaka@malemburg@picnixz@erlend-aasland