Skip to content

Conversation

@jasnell
Copy link
Member

@jasnelljasnell commented Oct 11, 2016

Checklist
  • make -j8 test (UNIX), or vcbuild test nosign (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

buffer

Description of change

Add buffer.transcode(source, from, to) method. Primarily uses ICU to transcode a buffer's content from one of Node.js' supported encodings to another.

Originally part of a proposal to add a new unicode module. Decided to refactor the approach towrds individual PRs without a new module.

Refs: #8075
/cc @trevnorris@addaleax

@jasnelljasnell added the buffer Issues and PRs related to the buffer subsystem. label Oct 11, 2016
@nodejs-github-botnodejs-github-bot added build Issues and PRs related to build files or the CI. c++ Issues and PRs that require attention from people who are familiar with C++. tools Issues and PRs related to the tools directory. labels Oct 11, 2016
@jasnell
Copy link
MemberAuthor

@jasnelljasnell added the semver-minor PRs that contain new features and should be released in the next minor version. label Oct 11, 2016
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about possibly placing this in lib/internal/buffer-transcode.js and conditionally require()'ing it. purely cosmetic though, to prevent an extra level of indent. or you could just return early. :)

if(!process.binding('config').hasIntl)return;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We going to have a complaint about not supporting SharedArrayBuffer for this?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, perhaps. Not too worried about that for now.

@jasnell
Copy link
MemberAuthor

Updated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe give an example of transcoding is not possible?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm – this could be a ThrowICUError function, right?

Copy link
Member

@addaleaxaddaleaxOct 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine but at some point this might become a member of the MaybeStackBuffer class? I realize that would conflict a bit with the MaybeStackBuffer<UChar> overload, maybe leave a TODO here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that final to residue from editing?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, slight brain malfunction there I think ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhhh this returns a string or a Buffer depending on the target encoding? I don’t think binary-to-text encodings should be allowed here, .toString() is the right method for them.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right. I'll pull these back out.

@jasnell
Copy link
MemberAuthor

@addaleax ... updated! PTAL

Copy link
Member

@addaleaxaddaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this use SwapBytes16?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(ditto)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 1024 seem kind of magic here, although I realize that is largely my fault. 😄 (Not sure if there’s anything to do about that)

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed now!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: s/from_enc/fromEncoding/ and s/to_enc/toEncoding/. Ditto for cnv_from and cnv_to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize you adapted this code from elsewhere but using snprintf() to format the error message will be much more efficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a if (e.Empty()) return Local<Value>();.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be if (!ret.Empty()) buf->Release(); - it's leaking memory now when the buffer can't be created.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: strict aliasing violation and prone to crashing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there is ample opportunity to share code between Ucs2FromUtf8 and Utf8FromUcs2, they are 80% identical.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps. For now I'm more inclined to keep these separate as it makes finding and tweaking bugs a bit easier. I'll take another pass in a separate PR to condense things down.

src/util.h Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this also reset length_?

src/util.h Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you move this into a common header, you might want to give it a slightly less generic name; e.g. SPREAD_BUFFER_ARG.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just remove the 'defines' block instead of commenting it out.

@jasnelljasnellforce-pushed the buffer-transcode branch 3 times, most recently from 9e0464b to 2da087eCompareOctober 13, 2016 15:11
@jasnell
Copy link
MemberAuthor

jasnell commented Oct 13, 2016

@jasnell
Copy link
MemberAuthor

jasnell commented Oct 13, 2016

Another CI run after cleanups: https://ci.nodejs.org/job/node-test-pull-request/4507/ ... that last run was less than successful....
Trying another: https://ci.nodejs.org/job/node-test-pull-request/4508/

@jasnell
Copy link
MemberAuthor

CI looks good. @bnoordhuis PTAL... LGTY?

@jasnell
Copy link
MemberAuthor

ping @bnoordhuis

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're returning empty handles as an indicator it's probably be more "V8-ish" have the return signature as a MaybeLocal<Value> instead. been trying to do that in other locations myself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creating Local<Value>'s but there's no HandleScope. if this callback is expected to always be called within an existing HandleScope (like MakeCallback), mind putting a comment at the top. also like MakeCallback (see src/node.h).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we know this is a v8::Object then can use e.As<Object>(). that's also more explicit that no extra handle is being created.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new v8::Maybe<T> API for v8::Object::Set() is annoying and ugly, but if if we're going to use some of the new API might as well use all of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're manipulating the original memory, why bother take a copy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this operation fails, do we want to abort or throw?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for future performance enhancement, detect the alignment of the pointer and perform as many swaps that can be done in a single go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm a little confused by this whole object, but right here if we're converting ascii to latin1 shouldn't we be passing 'latin1' as the encoding argument to Buffer.from()?

@jasnelljasnellforce-pushed the buffer-transcode branch 3 times, most recently from 6f70820 to 65144bbCompareOctober 18, 2016 21:43
@jasnell
Copy link
MemberAuthor

@bnoordhuis ... ok, reworked the implementation with an eye towards simplification and reducing duplication. PTAL
@trevnorris and @addaleax ... if I could trouble each of you to take another look also, I'd appreciate it.

@jasnell
Copy link
MemberAuthor

@jasnell
Copy link
MemberAuthor

Thank you for the follow up review @addaleax. Will get this landed tomorrow if there are no further objections.

Add buffer.transcode(source, from, to) method. Primarily uses ICU to transcode a buffer's content from one of Node.js' supported encodings to another. Originally part of a proposal to add a new unicode module. Decided to refactor the approach towrds individual PRs without a new module. Refs: nodejs#8075
@jasnell
Copy link
MemberAuthor

New CI run after squashing: https://ci.nodejs.org/job/node-test-pull-request/4665/

@jasnell
Copy link
MemberAuthor

green except for unrelated failures. landing

@jasnelljasnell dismissed bnoordhuis’s stale reviewOctober 25, 2016 17:11

PR updated after view

jasnell added a commit that referenced this pull request Oct 25, 2016
Add buffer.transcode(source, from, to) method. Primarily uses ICU to transcode a buffer's content from one of Node.js' supported encodings to another. Originally part of a proposal to add a new unicode module. Decided to refactor the approach towrds individual PRs without a new module. Refs: #8075 PR-URL: #9038 Reviewed-By: Anna Henningsen <[email protected]>
@jasnell
Copy link
MemberAuthor

Landed in e8eaaa7

@jasnelljasnell closed this Oct 25, 2016
@srl295srl295 mentioned this pull request Oct 25, 2016
4 tasks
evanlucas pushed a commit that referenced this pull request Nov 3, 2016
Add buffer.transcode(source, from, to) method. Primarily uses ICU to transcode a buffer's content from one of Node.js' supported encodings to another. Originally part of a proposal to add a new unicode module. Decided to refactor the approach towrds individual PRs without a new module. Refs: #8075 PR-URL: #9038 Reviewed-By: Anna Henningsen <[email protected]>
@evanlucasevanlucas mentioned this pull request Nov 3, 2016
@addaleax
Copy link
Member

If this is backported to any of the other release lines, it needs to come with #9838

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bufferIssues and PRs related to the buffer subsystem.buildIssues and PRs related to build files or the CI.c++Issues and PRs that require attention from people who are familiar with C++.semver-minorPRs that contain new features and should be released in the next minor version.toolsIssues and PRs related to the tools directory.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

@jasnell@addaleax@bnoordhuis@trevnorris@MylesBorins@nodejs-github-bot