Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34.2k
string_decoder: fix bad utf8 character handling#7310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
164ca60 to 541bcc9Comparemscdex commented Jun 15, 2016
/cc @gagern |
lib/string_decoder.js Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make these two assignments on separate lines here and below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this way describes the intention more succinctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is against the code style though, not sure why we don't lint it.
silverwindJun 15, 2016 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@indutny I'm not aware of a fitting rule in ESLint. Filed eslint/eslint#6424 for a rule proposal.
cjihrig commented Jun 15, 2016
LGTM pending CI and a style fix. |
mscdex commented Jun 15, 2016
test/parallel/test-string-decoder.js Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change? And if needed, maybe leave the old test too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just making capitalization consistent with all of the other hex buffers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be inclined to leave it just to make sure that lower-case is tested. (I'm sure that under the hood, string_decoder delegates to buffer or whatever. So it's probably far-fetched that we'd be in a situation where lower-case was broken but upper-case still worked. But it doesn't cost anything to have that tiny bit of extra coverage and the change seems aesthetic only.)
While I have an opinion, I don't feel strongly enough about this to protest or anything. If you do feel strongly, feel free to leave it as you have it.
mscdexJun 15, 2016 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. The hex string is being passed to the Buffer creation function (Buffer.from()), not to the StringDecoder functions. That function has plenty of lowercase and uppercase hex string tests in the appropriate test-buffer* test files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mscdex Argh! You're right, of course. Ignore me.
mscdex commented Jun 15, 2016
CI is green. /cc @nodejs/collaborators |
indutny commented Jun 15, 2016
One comment, otherwise LGTM. |
541bcc9 to 52462c6CompareI thought of this while looking at nodejs#7310.
gagern commented Jun 16, 2016 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
I did some more tests in gagern/node@f3d4f2a. Observations:
|
gagern commented Jun 16, 2016
I tried some more systematic testing in gagern/node@2e4bff1: using the bytes I'm not sure how much of this can be attributed to the same bug. I'm also not sure how much of this is a regression compared to 6.2.0. If you can afford the time, it might make sense to include some variant of this systematic testing code. If not, then at least try out the indicated problematic cases, and perhaps add them to the suite to guard against regressions. Is there some special test suite where lengthy tests can be placed, to be run occasionally without slowing down the main test suite? I'm also somewhat concerned about the use of |
gagern commented Jun 16, 2016
52462c6 to 023bb56Compare023bb56 to 985bd75Comparemscdex commented Jun 18, 2016 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
I've now included fixes for the other test cases that @gagern had created, which I have now included. CI: https://ci.nodejs.org/job/node-test-pull-request/3025/ CI is green except for an unrelated test failure on Windows. |
985bd75 to bafb77fComparejasnell commented Jun 20, 2016
LGTM |
mscdex commented Jun 20, 2016
cjihrig commented Jun 20, 2016
Yea, LGTM. CI seemed happy. |
lib/string_decoder.js Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't it still go negative here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because if execution has reached here, nb would have to be a 2, 3, or 4-byte character indicator. So self.lastNeed would range from 0 to 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, nb can't be 1.
mscdex commented Jun 24, 2016 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
One last CI before landing, just in case: https://ci.nodejs.org/job/node-test-pull-request/3064/ EDIT: CI is green, but a few CI nodes are offline at the moment. Still enough coverage IMHO. |
This commit fixes an issue when extra utf8 continuation bytes appear at the end of a chunk of data, causing miscalculations to be made when checking how many bytes are needed to decode a complete character. Fixes: nodejs#7308 PR-URL: nodejs#7310 Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Fedor Indutny <[email protected]>
PR-URL: nodejs#7310 Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Fedor Indutny <[email protected]>
d3cea4b to 5e8cbd7CompareThis commit fixes an issue when extra utf8 continuation bytes appear at the end of a chunk of data, causing miscalculations to be made when checking how many bytes are needed to decode a complete character. Fixes: #7308 PR-URL: #7310 Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Fedor Indutny <[email protected]>
PR-URL: #7310 Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Fedor Indutny <[email protected]>
This commit fixes an issue when extra utf8 continuation bytes appear at the end of a chunk of data, causing miscalculations to be made when checking how many bytes are needed to decode a complete character. Fixes: #7308 PR-URL: #7310 Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Fedor Indutny <[email protected]>
PR-URL: #7310 Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Fedor Indutny <[email protected]>
MylesBorins commented Jul 11, 2016
@mscdex lts? |
mscdex commented Jul 12, 2016
@thealphanerd This PR is only relevant if #6777 is also landed, however that PR is currently marked as |
MylesBorins commented Jul 12, 2016
@mscdex I'm going to mark this as dont-land for now as well. Would you like us to keep an issue in @nodejs/lts to keep track of the commits neccessary to backport the string_decoder changes? |
mscdex commented Jul 12, 2016
@thealphanerd Sure. |
gagern commented Jul 14, 2016
Will the 6.3.1 release be derived from current master, and hence contain this fix here? Or is the fix scheduled for 6.4.0? |
targos commented Jul 14, 2016
@gagern 6.3.0 already has the fix. |
Checklist
make -j4 test(UNIX) orvcbuild test nosign(Windows) passesAffected core subsystem(s)
Description of change
This commit fixes an issue when extra utf8 continuation bytes appear at the end of a chunk of data, causing miscalculations to be made when checking how many bytes are needed to decode a complete
character.
Fixes: #7308