Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34.2k
string_decoder: support Uint8Array input to methods#11613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
test/parallel/test-string-decoder.js Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mscdex What was/is testing? The 2 would always going to be out of range…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added by @EricPoker in 48f8869.
lib/buffer.js Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'm really not a fan of this syntax.. but oh well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me neither. but what are the alternatives if I want to avoid the cost of property lookups? Splitting this into 6 lines? That’s something I can do if you prefer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nah, this is fine as is. Just had to gripe about it ;-)
addaleax commented Mar 6, 2017
@mscdex Any further thoughts on this? Otherwise I’d like to land this in the next 1 or 2 days. |
lib/string_decoder.js Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will create hidden classes, which could add to the lookup overhead.
lib/string_decoder.js Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a prototypeless object would be better, but I'm not sure that having a lookup object like this is best.
Taking into account my comment from below, I would suggest creating a prototypeless object with the properties assigned at the same time using Object.create(null,{... }).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it might be worthwhile comparing other lookup strategies, such as using a function that returns the correct function based on the encoding.
mscdex commented Mar 7, 2017
Shouldn't this be semver-major if we're now (explicitly) changing the behavior of strings passed to |
lib/string_decoder.js Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the comment here. Is the suggestion that in the future it should throw on a string? Otherwise the comment seems at odds with the string check below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the comment here. Is the suggestion that in the future it should throw on a string? Otherwise the comment seems at odds with the string check below.
Yes… do you have different thoughts? It doesn’t really make sense to pass in a string here, does it?
joyeecheungMay 5, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just leave the behavior about string inputs as it is and make it throw in another PR? (that would be semver-major, I guess) (EDIT: OK this PR is already semver-major..)
mscdex commented Mar 7, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Also, have you benchmarked the |
jasnell commented Mar 17, 2017
ping @addaleax :-) |
7014a3e to db0181eCompareaddaleax commented Mar 20, 2017
@jasnell Thanks for the ping… I’ve rebased this and edited a bit, @mscdex was right to ask for individual benchmarks for the Here’s the current benchmark situation: I would be okay with accepting these, especially given how the improvements tend to affect the more common encodings (esp.
@mscdex I wouldn’t consider string input covered as part of the API, but it’s a reasonable point of view. I’m changing the label to be careful. |
lib/string_decoder.js Outdated
mscdexMar 20, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These additions concern me a bit because they make the function size exceed Crankshaft's max inlineable source size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mscdex … yeah. Do you have a better alternative in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have to look into it.
jasnell commented Apr 4, 2017
mscdex commented Apr 4, 2017
@jasnell I plan on taking a look this week. |
mscdex commented Apr 7, 2017
FWIW I think I've found some more optimizations to avoid the current performance regressions and then some. I am running benchmarks now ... |
mscdex commented Apr 7, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Ok, here are the results (compared to the current node master branch) with this PR + my changes to |
addaleax commented Apr 9, 2017
@mscdex Are your modifications pushed somewhere? Are you okay with this change (possibly pending applying them)? |
mscdex commented Apr 9, 2017
Not yet, I wasn't sure where to push it for review. |
addaleax commented Apr 9, 2017
You can just push to this branch if you like. |
addaleax commented Apr 14, 2017
CI is green. @jasnell Do you mind taking another look? |
TimothyGu commented Apr 14, 2017
After #12223, I wonder if it would make sense to add support for all ArrayBuffer views instead of just Uint8Array, here rather than in a later PR. |
addaleax commented Apr 14, 2017
@TimothyGu I am not sure that makes sense … for something that decodes byte sequences, shouldn’t the input be an Uint8Array? |
lib/string_decoder.js Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it affect performance if we use constants with names instead of number literals for indices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC yes.
Makes the string slice methods of buffers available on the binding object in addition to the `Buffer` prototype. This enables subsequent `string_decoder` changes to use these methods directly without performance loss, since all parameters are known by the string decoder in those cases.
This is a bit odd since `string_decoder` does currently not perform any type checking. Also, this adds an explicit check for `string` input, which does not really make sense but is relied upon by our test suite.
b67e1e9 to 525fabdCompareaddaleax commented May 5, 2017
Rebased. @mscdex Do my changes LGTY? This is basically only waiting for a second CTC member approval. |
| Local<Value> buffer_arg, | ||
| Local<Value> start_arg, | ||
| Local<Value> end_arg){ | ||
| Isolate* isolate = args.GetIsolate(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could match StringSlice() above and instead do Isolate* isolate = env->isolate(); for consistency?
mscdex commented May 5, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
LGTM with one minor nit that shouldn't block this from landing. CI again: https://ci.nodejs.org/job/node-test-pull-request/7898/ |
| sequence.forEach((write)=>{ | ||
| output+=decoder.write(input.slice(write[0],write[1])); | ||
| for(constuseUint8arrayof[false,true]){ | ||
| sequences.forEach((sequence)=>{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While at it, change this to a for-of loop?
TimothyGu commented May 5, 2017
Well maybe, but I feel it is plausible for the user to use a Uint16Array for UTF-16/UCS-2 input, for example |
mcollina left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a polyfill for https://github.com/rvagg/string_decoder (it can be down there too!)
BridgeAR commented Aug 26, 2017
Needs a rebase but I guess this is otherwise pretty much good to go? |
BridgeAR commented Sep 12, 2017
Ping @addaleax I would love to get this in and I think this only needs a rebase. Otherwise I would go ahead and close this. |
mscdex commented Sep 12, 2017
@BridgeAR benchmarks would probably need to be re-ran before merging because this was all done before TurboFan. |
addaleax commented Sep 13, 2017
@BridgeAR@mcollina’s review was dependent on there being a polyfill for the corresponding npm module, which I haven’t done yet; feel free to take this over if you like @mscdex I agree, but I wouldn’t expect much of a difference since I don’t think TurboFan had impact on how native bindings are called |
mscdex commented Sep 13, 2017
@addaleax I was referring more to the js-land stuff, especially the commit I pushed. |
BridgeAR commented Sep 23, 2017
mcollina commented Sep 25, 2017
@BridgeAR this can land independently, we pick the content from core releases, so we can fetch them But good catch on the other PR, I'll get it updated and landed. This is semver-major anyway, so we have time. |
BridgeAR commented Oct 2, 2017
Ping @addaleax |
BridgeAR commented Nov 22, 2017
Closing due to long inactivity. @addaleax please reopen if you want to follow up on this :-) (in that case the benchmarks should be rerun though). |
addaleax commented Nov 22, 2017
Yeah, I think there’s no point in pursuing this given that we now have |
This includes a bit of refactoring for the
Bufferinternals to keep up performance. Some quick benchmark results (only thestring-decoderbenchmark, excluding the bigger input/chunk sizes and with reducedn):Checklist
make -j4 test(UNIX), orvcbuild test(Windows) passesAffected core subsystem(s)
string_decoder, buffer