Skip to content

Conversation

@sethbrenith
Copy link
Contributor

@sethbrenithsethbrenith commented Jan 25, 2018

In the spirit of 17399, we can also simplify checkInvalidHeaderChar to use regex matching instead of a loop. This makes it faster on long matches and slower on short matches or non-matches. This change also includes some sample data from an AcmeAir benchmark run, as a rough proxy for real-world data.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)
  • http
Benchmark results
 improvement confidence p.value http\\check_invalid_header_char.js n=1000000 input="\177" -56.65 % *** 1.320705e-40 http\\check_invalid_header_char.js n=1000000 input="" -70.08 % *** 1.329603e-60 http\\check_invalid_header_char.js n=1000000 input="\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tFoo bar baz" 198.82 % *** 1.052799e-76 http\\check_invalid_header_char.js n=1000000 input="1" -50.00 % *** 7.800619e-22 http\\check_invalid_header_char.js n=1000000 input="20091" -6.44 % *** 7.491287e-04 http\\check_invalid_header_char.js n=1000000 input="ä,-æ-╪å`¢" -33.16 % *** 5.648190e-21 http\\check_invalid_header_char.js n=1000000 input="close" 22.88 % *** 1.497247e-25 http\\check_invalid_header_char.js n=1000000 input="en-US" 24.31 % *** 4.582121e-46 http\\check_invalid_header_char.js n=1000000 input="foo\\nbar" -5.84 % *** 2.581802e-08 http\\check_invalid_header_char.js n=1000000 input="group_acmeair" 19.81 % *** 2.764867e-37 http\\check_invalid_header_char.js n=1000000 input="gzip" 4.54 % * 1.921983e-02 http\\check_invalid_header_char.js n=1000000 input="Here is a value that is real..."(truncated) 280.31 % *** 8.462204e-37 http\\check_invalid_header_char.js n=1000000 input="keep-alive" 80.49 % *** 4.132616e-53 http\\check_invalid_header_char.js n=1000000 input="private" 41.84 % *** 1.856953e-15 http\\check_invalid_header_char.js n=1000000 input="SAMEORIGIN" 82.00 % *** 2.317592e-50 http\\check_invalid_header_char.js n=1000000 input="Sat, 07 May 2016 16:54:48 GMT" 199.94 % *** 2.698766e-29 http\\check_invalid_header_char.js n=1000000 input="text/html; charset=utf-8" 176.87 % *** 1.854601e-33 http\\check_invalid_header_char.js n=1000000 input="text/plain" 75.47 % *** 1.241438e-24 

@nodejs-github-botnodejs-github-bot added the http Issues or PRs related to the http subsystem. label Jan 25, 2018
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Rather than doing !/^[...]*$/.test(val), it might be simpler to do /[^...]/.test(val).
  2. It seems there are some regressions in the benchmark. If they still exist after swapping the conditions as I did above, we might want to add a fast case for things like the empty string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am relatively certain that there will still be a certain amount of regression even with that change. If not: wonderful. But in case the regressions continue to exist: I would like to keep the implementation and add the RegExp for val.length > 5.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed response, and thanks for the feedback!

  1. Agreed, /[^...]/.test(val) is prettier. I'll update it.
  2. I think regressing some particular strings is okay if the expected overall throughput on realistic data is improved. I don't have real-world data, but I do have data from an AcmeAir run (which uses popular modules in relatively straightforward ways, so might be somewhat representative of real data), and it tends to call this method with longer strings where the regex is faster. The same sort of suggestions came up in the previous change to simplify checkIsHttpToken, and people ended up deciding that the readability win was worth regressing a few benchmark cases. Furthermore, V8 folks like @bmeurer and @psmarshall are committed to ensuring that idiomatic code is performant, and there is already an open bug for improving regex performance on short inputs.

Copy link
ContributorAuthor

@sethbrenithsethbrenithFeb 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparison between iterations (old is !/^[...]*$/.test(val), new is /[^...]/.test(val)):

 improvement confidence p.value http\\check_invalid_header_char.js n=1000000 input="\177" -10.43 % *** 1.145512e-11 http\\check_invalid_header_char.js n=1000000 input="" 20.55 % *** 5.135039e-20 http\\check_invalid_header_char.js n=1000000 input="\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tFoo bar baz" -1.96 % 3.430864e-01 http\\check_invalid_header_char.js n=1000000 input="1" 18.08 % *** 1.373559e-22 http\\check_invalid_header_char.js n=1000000 input="20091" 13.54 % *** 4.766386e-09 http\\check_invalid_header_char.js n=1000000 input="ä,-æ-╪å`¢" -9.09 % *** 2.598671e-09 http\\check_invalid_header_char.js n=1000000 input="close" 14.75 % *** 5.152685e-10 http\\check_invalid_header_char.js n=1000000 input="en-US" 15.73 % *** 4.681282e-09 http\\check_invalid_header_char.js n=1000000 input="foo\\nbar" -1.83 % 1.630593e-01 http\\check_invalid_header_char.js n=1000000 input="group_acmeair" -13.64 % *** 8.914585e-15 http\\check_invalid_header_char.js n=1000000 input="gzip" 14.60 % *** 7.965079e-26 http\\check_invalid_header_char.js n=1000000 input="Here is a value that is real..."(truncated) 55.78 % *** 9.914895e-30 http\\check_invalid_header_char.js n=1000000 input="keep-alive" 10.51 % *** 5.515997e-09 http\\check_invalid_header_char.js n=1000000 input="private" 9.98 % *** 4.858798e-09 http\\check_invalid_header_char.js n=1000000 input="SAMEORIGIN" 7.52 % *** 1.855577e-13 http\\check_invalid_header_char.js n=1000000 input="Sat, 07 May 2016 16:54:48 GMT" -7.75 % *** 1.633302e-04 http\\check_invalid_header_char.js n=1000000 input="text/html; charset=utf-8" -8.10 % *** 2.957433e-07 http\\check_invalid_header_char.js n=1000000 input="text/plain" 7.21 % *** 2.673265e-11 

These changes are generally pretty small compared to the change-versus-baseline numbers. I'll update the table in the description as soon as I push the updated code.

@BridgeAR
Copy link
Member

Ping @sethbrenith

@maclover7maclover7 added the wip Issues and PRs that are still a work in progress. label Feb 3, 2018
@sethbrenithsethbrenithforce-pushed the invalid-header-char-regex branch from e4b1747 to 54f84e9CompareFebruary 6, 2018 00:09
In the spirit of [17399](nodejs#17399), we can also simplify checkInvalidHeaderChar to use regex matching instead of a loop. This makes it faster on long matches and slower on short matches or non-matches. This change also includes some sample data from an AcmeAir benchmark run, as a rough proxy for real-world data.
@sethbrenithsethbrenithforce-pushed the invalid-header-char-regex branch from 54f84e9 to 1179babCompareFebruary 8, 2018 16:37
@BridgeARBridgeAR added author ready PRs that have at least one approval, no pending requests for changes, and a CI started. and removed wip Issues and PRs that are still a work in progress. labels Feb 10, 2018
@BridgeAR
Copy link
Member

Copy link
Member

@mcollinamcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mcollina
Copy link
Member

This should no be backported to 4 or 6, as I suspect the performance might be completely different.

Copy link
Member

@bmeurerbmeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

BridgeAR pushed a commit to BridgeAR/node that referenced this pull request Feb 16, 2018
In the spirit of [17399](nodejs#17399), we can also simplify checkInvalidHeaderChar to use regex matching instead of a loop. This makes it faster on long matches and slower on short matches or non-matches. This change also includes some sample data from an AcmeAir benchmark run, as a rough proxy for real-world data. PR-URL: nodejs#18381 Reviewed-By: Ruben Bridgewater <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Benedikt Meurer <[email protected]> Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
@BridgeAR
Copy link
Member

Landed in 862389b 🎉

@sethbrenith
Copy link
ContributorAuthor

Thanks everyone!

MylesBorins pushed a commit that referenced this pull request Feb 21, 2018
In the spirit of [17399](#17399), we can also simplify checkInvalidHeaderChar to use regex matching instead of a loop. This makes it faster on long matches and slower on short matches or non-matches. This change also includes some sample data from an AcmeAir benchmark run, as a rough proxy for real-world data. PR-URL: #18381 Reviewed-By: Ruben Bridgewater <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Benedikt Meurer <[email protected]> Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
@MylesBorinsMylesBorins mentioned this pull request Feb 21, 2018
MylesBorins pushed a commit that referenced this pull request Feb 21, 2018
In the spirit of [17399](#17399), we can also simplify checkInvalidHeaderChar to use regex matching instead of a loop. This makes it faster on long matches and slower on short matches or non-matches. This change also includes some sample data from an AcmeAir benchmark run, as a rough proxy for real-world data. PR-URL: #18381 Reviewed-By: Ruben Bridgewater <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Benedikt Meurer <[email protected]> Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
MayaLekova pushed a commit to MayaLekova/node that referenced this pull request May 8, 2018
In the spirit of [17399](nodejs#17399), we can also simplify checkInvalidHeaderChar to use regex matching instead of a loop. This makes it faster on long matches and slower on short matches or non-matches. This change also includes some sample data from an AcmeAir benchmark run, as a rough proxy for real-world data. PR-URL: nodejs#18381 Reviewed-By: Ruben Bridgewater <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Benedikt Meurer <[email protected]> Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
@jasnell
Copy link
Member

Does not land cleanly on 8.x. Will need a backport PR in order to land.

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

author readyPRs that have at least one approval, no pending requests for changes, and a CI started.httpIssues or PRs related to the http subsystem.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants

@sethbrenith@BridgeAR@mcollina@jasnell@bmeurer@TimothyGu@joyeecheung@maclover7@nodejs-github-bot