Skip to content

Conversation

@watilde
Copy link
Contributor

@watildewatilde commented Jan 23, 2017

Updates:

  • skip the parsing process if a pair of a key and a value is empty
  • add test cases for querystring and URLSearchParams

Fixes: #10454

/cc @nodejs/url

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

querystring, url, url-whatwg, test

@nodejs-github-botnodejs-github-bot added querystring Issues and PRs related to the built-in querystring module. dont-land-on-v7.x labels Jan 23, 2017
@watildewatildeforce-pushed the feature/fixes-querystring-parse branch from 7d22fc5 to 0701371CompareJanuary 23, 2017 21:13
@watildewatilde changed the title url: repeated '&' in a paramsString will be skippedquerystring, url: repeated '&' in a paramsString will be skippedJan 23, 2017
@TimothyGu
Copy link
Member

I believe the consensus was to freeze the querystring module, and put all new work onto URLSearchParams.

@jasnell, can you weigh in on this?

@mscdex
Copy link
Contributor

@TimothyGu I don't recall that. Perhaps you're thinking about the url module instead?

@TimothyGu
Copy link
Member

Regardless of any previous planning, I am not sure this PR (or any PRs that change the behavior of this module) is the way to go.

  1. A change in the behavior of this module is almost definitively semver-major. It may also cause ripple effects in the entire ecosystem, though I am unsure how much of an effect this might bring.
  2. querystring module has applications more than simply parsing and serializing URL query strings. The fact that sep, eq, and decodeURIComponent are modifiable means that other usage, including those that depend on the current behavior, is possible. I cannot find it right now, but I have even read a Node.js "tip" somewhere that encourages using this module for performant generic serialized format.
  3. Even if the goal to make querystring WHATWG-compliant is accepted, there are other issues whose fixes may or must be semver-major as well.
    1. Serialization of spaces as + rather than %20, left paren as ) rather than %27, etc.: url: Improve WHATWG URLSearchParams spec compliance #9484 (comment).
    2. This module returns an object/hashmap, which is fundamentally incompatible with the list format WHATWG URL Standard mandates. For example, qs.stringify(qs.parse('a=1&b=2&a=3')) returns 'a=1&a=3&b=2' instead of 'a=1&b=2&a=3'.
  4. Proposal to eventually move URLSearchParams parsing to native code.

@jasnell
Copy link
Member

I wouldn't describe it as consensus. In my opinion significant changes to the the existing url module should be avoided, particularly semver-major or semver-minor. If it's a bug fix then I'm good with it. I'd rather keep things pretty steady state but this change makes sense. It's also worth noting that the URLSearchParams implementation currently still makes use of the querystring module.

@joyeecheung
Copy link
Member

joyeecheung commented Jan 24, 2017

I think this looks more like a bugfix, albeit a breaking one...I doubt anyone would rely on the behavior of parsing && and getting an empty-ish object instead of treating it as a bug (although it's harmless for common usage AFAICT) ?

In fact you can't even reserialize to the original string with this bug. No error would be thrown if you only use the Node.js built-in module, but it could be a problem when it ends up in the hand of other modules in other languages.

>qs.parse('a=1&&b=1'){a: '1','': '',b: '1'}>qs.stringify(qs.parse('a=1&&b=1'))'a=1&=&b=1'

FWIW, in Python:

>>>parse_qs('a=1&&b=2'){'a': ['1'], 'b': ['2']}

PHP:

php > $foo = array(); php > parse_str('a=1&&b=2', $foo); php > var_dump($foo); array(2){["a"]=> string(1) "1" ["b"]=> string(1) "2" }

Ruby:

2.3.1:003 > CGI.parse("a=1&&b=2")=>{"a"=>["1"],"b"=>["2"]}

@joyeecheung
Copy link
Member

Regarding @TimothyGu 's concerns I think 1 might not be that significant seeing what other languages behaves. 2 would be more possible, but I can't think of a reason for relying on an empty key (if it's just for custom searialization), though some people could rely on an empty value, maybe changing || to && on L281 would be better?
As for 3&4, I think it doesn't hurt to fix the querystring module, whether URLSearchParams is gonna get reimplemented or not. It even makes more sense to land a fix in querystring if URLSearchParams is gonna be fixed separately.

@joyeecheungjoyeecheung added url Issues and PRs related to the legacy built-in url module. whatwg-url Issues and PRs related to the WHATWG URL implementation. labels Jan 24, 2017
@watilde
Copy link
ContributorAuthor

I've tested the above case @joyeecheung mentioned, and it seems to work fine in this branch

$ ./out/Release/node > var qs = require('querystring') undefined > qs.parse('a=1&&b=1'){a: '1', b: '1' } > qs.stringify(qs.parse('a=1&&b=1')) 'a=1&b=1' 

About changing || to &&, we have cases that the key or the value can be empty like the following. I tried updating it once, but some of the test cases threw some errors.

> qs.parse('=a&b'){'': 'a', b: '' } 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just key.length > 0? Empty values are not that uncommon (e.g. qs.parse('b=&x=1'))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just took a look at the spec and this bit of code, I think the current way is the best :D. The only one that needs to be ignored is &&, otherwise it doesn't match what the spec needs.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up the code and the spec hardly!

@joyeecheung
Copy link
Member

@jasnell
Copy link
Member

I agree that it's a bugfix. Parsing && as meaning an empty key makes exceedingly little sense. I'm definitely +1 on getting this fixed.

Copy link
Member

@jasnelljasnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to also test the following cases:

`a=b&c&d=e` `a=b&c=&d=e` `a=b&=c&d=e` `a=b&=&c=d` 

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right! Adding the cases that the key or the value is empty sounds useful for revealing the current behaviour. I've added them :)

@jasnell
Copy link
Member

New CI since the test was updated: https://ci.nodejs.org/job/node-test-pull-request/6032/

@watilde
Copy link
ContributorAuthor

Some of the nice performance improvements in querystring were added! Resolved conflicts.

@watilde
Copy link
ContributorAuthor

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be parsed as {'': '', a: 'b', c: 'd' }. Try new URLSearchParams('&=&').get('').

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow I didn't know that, thanks! Then I will try to support the case.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've updated! The logic was changed to simple one :)

+ update state machine in parse + repeated sep should be adjusted + `&=&=` should be `{'': [ '', '' ] }` + add test cases for querystring and URLSearchParams Fixes: nodejs#10454
@watildewatildeforce-pushed the feature/fixes-querystring-parse branch from 666b945 to f81201aCompareFebruary 1, 2017 20:39
@watilde
Copy link
ContributorAuthor

I made improvements in performance again, and now it seems to be better than my last update.

@mscdex
Copy link
Contributor

mscdex commented Feb 1, 2017

@watilde That benchmark output looks truncated (the benchmark/compare.js script by default performs more runs than that). Having the output piped to RScript benchmark/compare.R will yield more useful information as @joyeecheung pointed out.

@watilde
Copy link
ContributorAuthor

watilde commented Feb 2, 2017

@mscdex That's right indeed! Then I just tried it, and here is the result:

@joyeecheung
Copy link
Member

@joaocgreis
Copy link
Member

Aborted CI, only arm-fanned was pending because of nodejs/build#611

@jasnell
Copy link
Member

jasnell pushed a commit that referenced this pull request Feb 2, 2017
* update state machine in parse * repeated sep should be adjusted * `&=&=` should be `{'': [ '', '' ] }` * add test cases for querystring and URLSearchParams Fixes: #10454 PR-URL: #10967 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Timothy Gu <[email protected]>
@jasnell
Copy link
Member

Landed in 4e259b2

@jasnelljasnell closed this Feb 2, 2017
@TimothyGu
Copy link
Member

Should this be marked as semver-major?

@mscdex
Copy link
Contributor

mscdex commented Feb 2, 2017

I was hoping to look at this this weekend before this landed...

@mscdexmscdex added the semver-major PRs that contain breaking changes and should be released in the next major version. label Feb 2, 2017
@jasnell
Copy link
Member

oy.. my apologies @mscdex... if you indicated that in the thread and I missed it, I do apologize.

@jasnell
Copy link
Member

if this ends up not being acceptable to you @mscdex, then we should be able to do a quick revert

@watildewatilde deleted the feature/fixes-querystring-parse branch February 2, 2017 21:21
@watildewatilde restored the feature/fixes-querystring-parse branch February 2, 2017 21:23
@watilde
Copy link
ContributorAuthor

Isn't this patch update since it's just bug-fix?

@mscdex
Copy link
Contributor

mscdex commented Feb 2, 2017

@watilde It might be a bug fix for the whatwg url module, but it's arguably not for the non-whatwg url module since there is no real spec that it adheres to. If this change were made to whatwg url's own querystring implementation that would be different and semver-major could just be applied to the non-whatwg url changes.

@jasnell#10967 (comment) I probably won't get time until this weekend to review this, so I don't think a 'quick revert' will happen unless else everyone agrees.

@joyeecheung
Copy link
Member

I think we can wail until @mscdex 's review to decide then. Personally I think it's fine to accept this if it doesn't introduce any new behavior other than fixing #10454 because the old behavior looks buggy to me (see #10967 (comment)).

@mscdex
Copy link
Contributor

@joyeecheung Calling it a 'bug fix' is debatable. One could also argue that stringify() should not include include key/value pairs that are both empty. I'd personally rather be on the safe side and land it in v8.0.0.

@joyeecheung
Copy link
Member

@mscdex Sorry for being vague, I am fine with marking this semver-major.

krydos pushed a commit to krydos/node that referenced this pull request Feb 25, 2017
* update state machine in parse * repeated sep should be adjusted * `&=&=` should be `{'': [ '', '' ] }` * add test cases for querystring and URLSearchParams Fixes: nodejs#10454 PR-URL: nodejs#10967 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Timothy Gu <[email protected]>
@jasnelljasnell mentioned this pull request Apr 4, 2017
@watildewatilde deleted the feature/fixes-querystring-parse branch April 11, 2017 18:58
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

querystringIssues and PRs related to the built-in querystring module.semver-majorPRs that contain breaking changes and should be released in the next major version.urlIssues and PRs related to the legacy built-in url module.whatwg-urlIssues and PRs related to the WHATWG URL implementation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants

@watilde@TimothyGu@mscdex@jasnell@joyeecheung@joaocgreis@Fishrock123@nodejs-github-bot