querystring: improve parse() and escape() performance#5012

mscdex · 2016-01-31T20:08:01Z

parse() performance is improved by ~20-200% with the various querystring-parse benchmarks.

Some optimization strategies used include:

Combining multiple searches (for '&', '=', and '+') on the same
string into a single loop
Avoiding string.split()
Minimizing creation of temporary strings
Avoiding string decoding if no encoded bytes were found and the
default string decoder is being used

escape() performance is improved a bit, up to ~15% with the various querystring-stringify benchmarks by reducing the number of string concatenations and avoiding a potential deopt if the input string ends on an incomplete multibyte character.

Also, a constant deopt in unescapeBuffer() is avoided by checking the index (to make sure it is not out of bounds) passed to charCodeAt()

jbergstroem · 2016-01-31T20:34:39Z

CI: https://ci.nodejs.org/job/node-test-commit/2009/

infusion · 2016-01-31T22:28:06Z

Why did you remove the str.length cache in escape()?

mscdex · 2016-01-31T22:34:29Z

@infusion It's not necessary with modern versions of v8.

evanlucas · 2016-02-01T11:52:59Z

lib/querystring.js

Just curious, what is the benefit of using NaN here?

charCodeAt() returns NaN for out of bounds indices. I was just keeping the same behavior here but avoiding the deopt.

ahh that makes sense. Thanks

Would there be drawbacks to changing the <= to < in the loop test? You'd have to replicate some of the out[outIndex++] assignments after the loop but it would keep the loop body simple. I guess you can also accomplish that with a s/NaN/0/.

I left it as-is to keep changes minimal. I typically don't like to duplicate code when reusing the loop logic like that is easy/simple enough.

mscdex · 2016-02-01T17:27:55Z

@bnoordhuis I've fixed the missing post-OptimizeFunctionOnNextCall function calls. Performance is still the same FWIW.

jasnell · 2016-02-01T17:41:57Z

LGTM

bnoordhuis · 2016-02-01T19:02:43Z

lib/querystring.js

I suspect you can eke out some more performance if you replace the calls to charCode() with their number literal equivalents.

dolphin278 · 2016-02-05T10:03:11Z

lib/querystring.js

@mscdex General question – isn't keys[keys.length] = key still better than keys.push(key)?
Few months ago I've inspected compiled code using IRHydra and I remember some difference between pushing items and assigning by array length in loop in favor of second one.
Does it matter anymore?

Testing with Chrome with v8 4.7 on jsperf shows that using arr[arr.length] = x is indeed much faster, but there wasn't as large of a performance gain in the node benchmark (including a new benchmark input I just added that has even more duplicate keys). However I've changed it anyway for the small performance increase it does provide.

mscdex · 2016-02-11T15:58:26Z

Can I get some more LGTMs on this one?

/cc @nodejs/collaborators

jasnell · 2016-02-11T16:00:50Z

Still LGTM

silverwind · 2016-02-11T16:07:30Z

LGTM pending CI.

CI: https://ci.nodejs.org/job/node-test-pull-request/1638/

mcollina · 2016-02-11T20:38:17Z

LGTM

This commit improves parse() performance by ~20-200% with the various querystring-parse benchmarks. Some optimization strategies used in this commit include: * Combining multiple searches (for '&', '=', and '+') on the same string into a single loop * Avoiding string.split() * Minimizing creation of temporary strings * Avoiding string decoding if no encoded bytes were found and the default string decoder is being used

Before this, v8 would deopt when an out of bounds `inIndex` would get passed to charCodeAt(). charCodeAt() returns NaN in such cases, so we directly emulate that behavior as well. Also, calls to charCodeAt() for constant strings have been replaced by the raw character codes and parser state is now stored as an integer instead of a string. Both of these provide a slight performance increase.

This commit improves escape() performance by up to 15% with the existing querystring-stringify benchmarks by reducing the number of string concatentations. A potential deopt is also avoided by making sure the index passed to charCodeAt() is within bounds.

mscdex · 2016-02-13T01:29:23Z

CI again since the last one had CI infrastructure issues: https://ci.nodejs.org/job/node-test-commit/2216/

This commit improves parse() performance by ~20-200% with the various querystring-parse benchmarks. Some optimization strategies used in this commit include: * Combining multiple searches (for '&', '=', and '+') on the same string into a single loop * Avoiding string.split() * Minimizing creation of temporary strings * Avoiding string decoding if no encoded bytes were found and the default string decoder is being used PR-URL: #5012 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Roman Reiss <[email protected]> Reviewed-By: Matteo Collina <[email protected]>

Before this, v8 would deopt when an out of bounds `inIndex` would get passed to charCodeAt(). charCodeAt() returns NaN in such cases, so we directly emulate that behavior as well. Also, calls to charCodeAt() for constant strings have been replaced by the raw character codes and parser state is now stored as an integer instead of a string. Both of these provide a slight performance increase. PR-URL: #5012 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Roman Reiss <[email protected]> Reviewed-By: Matteo Collina <[email protected]>

This commit improves escape() performance by up to 15% with the existing querystring-stringify benchmarks by reducing the number of string concatentations. A potential deopt is also avoided by making sure the index passed to charCodeAt() is within bounds. PR-URL: #5012 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Roman Reiss <[email protected]> Reviewed-By: Matteo Collina <[email protected]>

mscdex · 2016-02-13T01:30:43Z

Landed in 00638ac, c8e650d, and a2a69a2.

This commit improves parse() performance by ~20-200% with the various querystring-parse benchmarks. Some optimization strategies used in this commit include: * Combining multiple searches (for '&', '=', and '+') on the same string into a single loop * Avoiding string.split() * Minimizing creation of temporary strings * Avoiding string decoding if no encoded bytes were found and the default string decoder is being used PR-URL: #5012 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Roman Reiss <[email protected]> Reviewed-By: Matteo Collina <[email protected]>

Before this, v8 would deopt when an out of bounds `inIndex` would get passed to charCodeAt(). charCodeAt() returns NaN in such cases, so we directly emulate that behavior as well. Also, calls to charCodeAt() for constant strings have been replaced by the raw character codes and parser state is now stored as an integer instead of a string. Both of these provide a slight performance increase. PR-URL: #5012 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Roman Reiss <[email protected]> Reviewed-By: Matteo Collina <[email protected]>

This commit improves escape() performance by up to 15% with the existing querystring-stringify benchmarks by reducing the number of string concatentations. A potential deopt is also avoided by making sure the index passed to charCodeAt() is within bounds. PR-URL: #5012 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Roman Reiss <[email protected]> Reviewed-By: Matteo Collina <[email protected]>

* buffer: - You can now supply an encoding argument when filling a Buffer Buffer#fill(string[, start[, end]][, encoding]), supplying an existing Buffer will also work with Buffer#fill(buffer[, start[, end]]). See the API documentation for details on how this works. (Trevor Norris) #4935 - Buffer#indexOf() no longer requires a byteOffset argument if you also wish to specify an encoding: Buffer#indexOf(val[, byteOffset][, encoding]). (Trevor Norris) #4803 * child_process: spawn() and spawnSync() now support a 'shell' option to allow for optional execution of the given command inside a shell. If set to true, cmd.exe will be used on Windows and /bin/sh elsewhere. A path to a custom shell can also be passed to override these defaults. On Windows, this option allows .bat. and .cmd files to be executed with spawn() and spawnSync(). (Colin Ihrig) #4598 * http_parser: Update to http-parser 2.6.2 to fix an unintentionally strict limitation of allowable header characters. (James M Snell) #5237 * dgram: socket.send() now supports accepts an array of Buffers or Strings as the first argument. See the API docs for details on how this works. (Matteo Collina) #4374 * http: Fix a bug where handling headers will mistakenly trigger an 'upgrade' event where the server is just advertising its protocols. This bug can prevent HTTP clients from communicating with HTTP/2 enabled servers. (Fedor Indutny) #4337 * net: Added a listening Boolean property to net and http servers to indicate whether the server is listening for connections. (José Moreira) #4743 * node: The C++ node::MakeCallback() API is now reentrant and calling it from inside another MakeCallback() call no longer causes the nextTick queue or Promises microtask queue to be processed out of order. (Trevor Norris) #4507 * tls: Add a new tlsSocket.getProtocol() method to get the negotiated TLS protocol version of the current connection. (Brian White) #4995 * vm: Introduce new 'produceCachedData' and 'cachedData' options to new vm.Script() to interact with V8's code cache. When a new vm.Script object is created with the 'produceCachedData' set to true a Buffer with V8's code cache data will be produced and stored in cachedData property of the returned object. This data in turn may be supplied back to another vm.Script() object with a 'cachedData' option if the supplied source is the same. Successfully executing a script from cached data can speed up instantiation time. See the API docs for details. (Fedor Indutny) #4777 * performance: Improvements in: - process.nextTick() (Ruben Bridgewater) #5092 - path module (Brian White) #5123 - querystring module (Brian White) #5012 - streams module when processing small chunks (Matteo Collina) #4354

* buffer: - You can now supply an encoding argument when filling a Buffer Buffer#fill(string[, start[, end]][, encoding]), supplying an existing Buffer will also work with Buffer#fill(buffer[, start[, end]]). See the API documentation for details on how this works. (Trevor Norris) #4935 - Buffer#indexOf() no longer requires a byteOffset argument if you also wish to specify an encoding: Buffer#indexOf(val[, byteOffset][, encoding]). (Trevor Norris) #4803 * child_process: spawn() and spawnSync() now support a 'shell' option to allow for optional execution of the given command inside a shell. If set to true, cmd.exe will be used on Windows and /bin/sh elsewhere. A path to a custom shell can also be passed to override these defaults. On Windows, this option allows .bat. and .cmd files to be executed with spawn() and spawnSync(). (Colin Ihrig) #4598 * http_parser: Update to http-parser 2.6.2 to fix an unintentionally strict limitation of allowable header characters. (James M Snell) #5237 * dgram: socket.send() now supports accepts an array of Buffers or Strings as the first argument. See the API docs for details on how this works. (Matteo Collina) #4374 * http: Fix a bug where handling headers will mistakenly trigger an 'upgrade' event where the server is just advertising its protocols. This bug can prevent HTTP clients from communicating with HTTP/2 enabled servers. (Fedor Indutny) #4337 * net: Added a listening Boolean property to net and http servers to indicate whether the server is listening for connections. (José Moreira) #4743 * node: The C++ node::MakeCallback() API is now reentrant and calling it from inside another MakeCallback() call no longer causes the nextTick queue or Promises microtask queue to be processed out of order. (Trevor Norris) #4507 * tls: Add a new tlsSocket.getProtocol() method to get the negotiated TLS protocol version of the current connection. (Brian White) #4995 * vm: Introduce new 'produceCachedData' and 'cachedData' options to new vm.Script() to interact with V8's code cache. When a new vm.Script object is created with the 'produceCachedData' set to true a Buffer with V8's code cache data will be produced and stored in cachedData property of the returned object. This data in turn may be supplied back to another vm.Script() object with a 'cachedData' option if the supplied source is the same. Successfully executing a script from cached data can speed up instantiation time. See the API docs for details. (Fedor Indutny) #4777 * performance: Improvements in: - process.nextTick() (Ruben Bridgewater) #5092 - path module (Brian White) #5123 - querystring module (Brian White) #5012 - streams module when processing small chunks (Matteo Collina) #4354 PR-URL: #5295

This commit improves parse() performance by ~20-200% with the various querystring-parse benchmarks. Some optimization strategies used in this commit include: * Combining multiple searches (for '&', '=', and '+') on the same string into a single loop * Avoiding string.split() * Minimizing creation of temporary strings * Avoiding string decoding if no encoded bytes were found and the default string decoder is being used PR-URL: nodejs#5012 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Roman Reiss <[email protected]> Reviewed-By: Matteo Collina <[email protected]>

Before this, v8 would deopt when an out of bounds `inIndex` would get passed to charCodeAt(). charCodeAt() returns NaN in such cases, so we directly emulate that behavior as well. Also, calls to charCodeAt() for constant strings have been replaced by the raw character codes and parser state is now stored as an integer instead of a string. Both of these provide a slight performance increase. PR-URL: nodejs#5012 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Roman Reiss <[email protected]> Reviewed-By: Matteo Collina <[email protected]>

This commit improves escape() performance by up to 15% with the existing querystring-stringify benchmarks by reducing the number of string concatentations. A potential deopt is also avoided by making sure the index passed to charCodeAt() is within bounds. PR-URL: nodejs#5012 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Roman Reiss <[email protected]> Reviewed-By: Matteo Collina <[email protected]>

* buffer: - You can now supply an encoding argument when filling a Buffer Buffer#fill(string[, start[, end]][, encoding]), supplying an existing Buffer will also work with Buffer#fill(buffer[, start[, end]]). See the API documentation for details on how this works. (Trevor Norris) #4935 - Buffer#indexOf() no longer requires a byteOffset argument if you also wish to specify an encoding: Buffer#indexOf(val[, byteOffset][, encoding]). (Trevor Norris) #4803 * child_process: spawn() and spawnSync() now support a 'shell' option to allow for optional execution of the given command inside a shell. If set to true, cmd.exe will be used on Windows and /bin/sh elsewhere. A path to a custom shell can also be passed to override these defaults. On Windows, this option allows .bat. and .cmd files to be executed with spawn() and spawnSync(). (Colin Ihrig) #4598 * http_parser: Update to http-parser 2.6.2 to fix an unintentionally strict limitation of allowable header characters. (James M Snell) #5237 * dgram: socket.send() now supports accepts an array of Buffers or Strings as the first argument. See the API docs for details on how this works. (Matteo Collina) #4374 * http: Fix a bug where handling headers will mistakenly trigger an 'upgrade' event where the server is just advertising its protocols. This bug can prevent HTTP clients from communicating with HTTP/2 enabled servers. (Fedor Indutny) #4337 * net: Added a listening Boolean property to net and http servers to indicate whether the server is listening for connections. (José Moreira) #4743 * node: The C++ node::MakeCallback() API is now reentrant and calling it from inside another MakeCallback() call no longer causes the nextTick queue or Promises microtask queue to be processed out of order. (Trevor Norris) #4507 * tls: Add a new tlsSocket.getProtocol() method to get the negotiated TLS protocol version of the current connection. (Brian White) #4995 * vm: Introduce new 'produceCachedData' and 'cachedData' options to new vm.Script() to interact with V8's code cache. When a new vm.Script object is created with the 'produceCachedData' set to true a Buffer with V8's code cache data will be produced and stored in cachedData property of the returned object. This data in turn may be supplied back to another vm.Script() object with a 'cachedData' option if the supplied source is the same. Successfully executing a script from cached data can speed up instantiation time. See the API docs for details. (Fedor Indutny) #4777 * performance: Improvements in: - process.nextTick() (Ruben Bridgewater) #5092 - path module (Brian White) #5123 - querystring module (Brian White) #5012 - streams module when processing small chunks (Matteo Collina) #4354 PR-URL: #5295

MylesBorins · 2016-03-10T21:28:17Z

Adding the LTS watch flag, but I think this will have to sit for a while before we know that this is stable.

Thoughts?

jasnell · 2016-03-11T00:45:19Z

Yeah, like the path changes, we'll want to let this sit for a good long time before backporting.

rvagg · 2016-03-14T01:47:55Z

-1 for LTS is my vote (not being absolute, if you two want to disagree with me). I'm leaning that way for perf changes, stronger for larger changes (not just in LOC but impact). Performance profile of LTS should be relatively stable over time and we owe it to users not to screw with things too much. While we can pick up edge cases with Stable releases, there's a whole different sector of users who use LTS that may experience totally different edge cases than users of Stable might.

MylesBorins · 2016-03-14T01:56:58Z

@rvagg I don't disagree.
What I do think could be interesting though is keeping track of larger changes like this and all the future regressions so that it will be easy to backport the entire lot if we need to (e.g. to many changes making the overall backporting process a nightmare)

MylesBorins · 2016-05-17T22:33:39Z

marking this don't land for now

mscdex added the querystring Issues and PRs related to the built-in querystring module. label Jan 31, 2016

evanlucas reviewed Feb 1, 2016
View reviewed changes

mscdex force-pushed the perf-querystring branch from e7bb9ef to ed61c27Compare February 1, 2016 17:26

bnoordhuis reviewed Feb 1, 2016
View reviewed changes

mscdex force-pushed the perf-querystring branch from ed61c27 to ceffae4Compare February 4, 2016 02:11

MylesBorins mentioned this pull request Feb 5, 2016
querystring: check that maxKeys is finite #5066
Merged

dolphin278 reviewed Feb 5, 2016
View reviewed changes

mscdex force-pushed the perf-querystring branch from ceffae4 to 6060e98Compare February 6, 2016 18:48

mscdex added 3 commits February 12, 2016 19:56

mscdex force-pushed the perf-querystring branch from 6060e98 to 3df8e85Compare February 13, 2016 00:57

mscdex closed this Feb 13, 2016

mscdex deleted the perf-querystring branch February 13, 2016 01:33

rvagg mentioned this pull request Feb 18, 2016
Release proposal: 5.7.0 (Stable) #5295
Merged

MylesBorins mentioned this pull request Mar 10, 2016
Audit commits not found on v4.x #5647
Closed

MylesBorins added the lts-watch-v4.x label Mar 10, 2016

MylesBorins added dont-land-on-v4.x and removed lts-watch-v4.x labels May 17, 2016

This was referenced Apr 24, 2023
[Snyk] Fix for 44 vulnerabilities aliscco/alisco-node#115
Open
[Snyk] Fix for 8 vulnerabilities aliscco/alisco-node#325
Open

Uh oh!

querystring: improve parse() and escape() performance#5012

querystring: improve parse() and escape() performance #5012

Uh oh!

Conversation

mscdex commented Jan 31, 2016

Uh oh!

jbergstroem commented Jan 31, 2016

Uh oh!

infusion commented Jan 31, 2016

Uh oh!

mscdex commented Jan 31, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mscdex commented Feb 1, 2016

Uh oh!

jasnell commented Feb 1, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mscdex commented Feb 11, 2016

Uh oh!

jasnell commented Feb 11, 2016

Uh oh!

silverwind commented Feb 11, 2016

Uh oh!

mcollina commented Feb 11, 2016

Uh oh!

mscdex commented Feb 13, 2016

Uh oh!

mscdex commented Feb 13, 2016

Uh oh!

MylesBorins commented Mar 10, 2016

Uh oh!

jasnell commented Mar 11, 2016

Uh oh!

rvagg commented Mar 14, 2016

Uh oh!

MylesBorins commented Mar 14, 2016

Uh oh!

MylesBorins commented May 17, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants