Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34.2k
readline: fix character width calculation#13918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
TimothyGu commented Jun 26, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
cjihrig commented Jun 26, 2017
Why not just create a new test? |
src/node_i18n.cc Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove the u_isdefined() check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My terminal (GNOME Terminal) actually displays unassigned characters (in a weird box form) so they are more than 0-width. Indeed, UAX #11 specifies that
Unassigned code points in ranges intended for CJK ideographs are classified as Wide.
while
All other unassigned code points are by default classified as Neutral.
I removed the u_isdefined() check here so that these unassigned characters can use the generic ICU routine below, which does the right thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspected that was the case. Ok :)
jasnell left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice... thank you!
TimothyGu commented Jun 27, 2017
Err... why did I not think of that... |
TimothyGu commented Jun 27, 2017
Tests for 0-width characters are added too. |
Trott commented Jun 27, 2017
Stopped the Raspberry Pi devices in Ci because they had been running for 2.5 hours and had no new console messages for over an hour. Not sure if this will self-correct or if we need Build WG intervention.... |
Trott commented Jun 27, 2017
Not looking like the Pi issue is self-correcting. Will have to either wait for the issue to get resolved before proceeding, or decide that this can land without the Pi run. |
TimothyGu commented Jun 28, 2017
Given that the other machines are universally green (except for macOS's fickle nature), nor does this PR have any architecture-specific components, I'd say this can land w/o confirmation from RPi. |
- Categorize all nonspacing marks (Mn) and enclosing marks (Me) as 0-width - Categorize all spacing marks (Mc) as non-0-width. - Treat soft hyphens (a format character Cf) as non-0-width. - Do not treat all unassigned code points as 0-width; instead, let ICU select the default for that character per UAX nodejs#11. - Avoid getting the General_Category of a character multiple times as it is an intensive operation. Refs: http://unicode.org/reports/tr11/
jasnell commented Jun 29, 2017
PR-URL: #13918 Reviewed-By: James M Snell <[email protected]>
- Categorize all nonspacing marks (Mn) and enclosing marks (Me) as 0-width - Categorize all spacing marks (Mc) as non-0-width. - Treat soft hyphens (a format character Cf) as non-0-width. - Do not treat all unassigned code points as 0-width; instead, let ICU select the default for that character per UAX #11. - Avoid getting the General_Category of a character multiple times as it is an intensive operation. Refs: http://unicode.org/reports/tr11/ PR-URL: #13918 Reviewed-By: James M Snell <[email protected]>
jasnell commented Jun 29, 2017
PR-URL: #13918 Reviewed-By: James M Snell <[email protected]>
- Categorize all nonspacing marks (Mn) and enclosing marks (Me) as 0-width - Categorize all spacing marks (Mc) as non-0-width. - Treat soft hyphens (a format character Cf) as non-0-width. - Do not treat all unassigned code points as 0-width; instead, let ICU select the default for that character per UAX #11. - Avoid getting the General_Category of a character multiple times as it is an intensive operation. Refs: http://unicode.org/reports/tr11/ PR-URL: #13918 Reviewed-By: James M Snell <[email protected]>
PR-URL: #13918 Reviewed-By: James M Snell <[email protected]>
- Categorize all nonspacing marks (Mn) and enclosing marks (Me) as 0-width - Categorize all spacing marks (Mc) as non-0-width. - Treat soft hyphens (a format character Cf) as non-0-width. - Do not treat all unassigned code points as 0-width; instead, let ICU select the default for that character per UAX #11. - Avoid getting the General_Category of a character multiple times as it is an intensive operation. Refs: http://unicode.org/reports/tr11/ PR-URL: #13918 Reviewed-By: James M Snell <[email protected]>
PR-URL: #13918 Reviewed-By: James M Snell <[email protected]>
- Categorize all nonspacing marks (Mn) and enclosing marks (Me) as 0-width - Categorize all spacing marks (Mc) as non-0-width. - Treat soft hyphens (a format character Cf) as non-0-width. - Do not treat all unassigned code points as 0-width; instead, let ICU select the default for that character per UAX #11. - Avoid getting the General_Category of a character multiple times as it is an intensive operation. Refs: http://unicode.org/reports/tr11/ PR-URL: #13918 Reviewed-By: James M Snell <[email protected]>
MylesBorins commented Aug 14, 2017
Should this be backported to |
Fixes width calculation of non-spacing marks, commonly seen in Unicode Normalization Form D. Example:
'a\u0301'('á'.normalize('NFD')),'ру́сский язы́к'(Unicode doesn't have many precomposed accented Cyrillic letters).Outdated information
Not sure where to add tests for this feature though.
readlinehas the following tests:None of them seem to fit this bug, which is the glue between
getStringWidthand readline, Hence the WIP.The second commit changes how widths of certain characters are determined:
These decisions are made, partially by following the behavior of GNOME Terminal. Testing on other terminals is of course welcome.
Checklist
make -j4 test(UNIX), orvcbuild test(Windows) passesAffected core subsystem(s)
readline