benchmark: make compare.R easier to understand#18373

AndreasMadsen · 2018-01-25T14:57:28Z

As talked about in #18112 (comment) this shows more clearly the variance of each comparison. This should also help us prevent over-running the benchmarks. If you see an accuracy of ±0.1% then properly you could spend fewer iterations running that ;)

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
commit message follows commit guidelines

Affected core subsystem(s)

benchmark

example output:

 confidence improvement accuracy (*) (**) (***) fs/bench-readdir.js n=10000 -7.20 % ±9.94% ±13.23% ±17.22% fs/bench-readdirSync.js n=10000 -0.46 % ±6.97% ±9.27% ±12.08% fs/bench-realpath.js pathType="relative" n=10000 1.62 % ±4.25% ±5.65% ±7.35% fs/bench-realpath.js pathType="resolved" n=10000 -1.68 % ±3.89% ±5.17% ±6.74% fs/bench-realpathSync.js pathType="relative" n=10000 0.72 % ±5.87% ±7.81% ±10.18% fs/bench-realpathSync.js pathType="resolved" n=10000 0.16 % ±1.61% ±2.15% ±2.80% fs/bench-stat.js statType="fstat" n=200000 -2.39 % ±4.50% ±6.00% ±7.83% fs/bench-stat.js statType="lstat" n=200000 -2.94 % ±5.34% ±7.11% ±9.25% fs/bench-stat.js statType="stat" n=200000 2.30 % ±4.29% ±5.72% ±7.45% fs/bench-statSync.js statSyncType="fstatSync" n=1000000 -1.15 % ±5.69% ±7.57% ±9.85% fs/bench-statSync.js statSyncType="lstatSync" n=1000000 0.47 % ±2.15% ±2.90% ±3.84% fs/bench-statSync.js statSyncType="statSync" n=1000000 0.62 % ±2.99% ±3.98% ±5.19% fs/read-stream-throughput.js size=1024 filesize=1048576000 encodingType="asc" 1.08 % ±2.96% ±3.94% ±5.13% fs/read-stream-throughput.js size=1024 filesize=1048576000 encodingType="buf" -1.86 % ±3.85% ±5.13% ±6.68% fs/read-stream-throughput.js size=1024 filesize=1048576000 encodingType="utf" -1.09 % ±2.29% ±3.04% ±3.96% fs/read-stream-throughput.js size=1048576 filesize=1048576000 encodingType="asc" -1.70 % ±4.55% ±6.07% ±7.95% fs/read-stream-throughput.js size=1048576 filesize=1048576000 encodingType="buf" -0.51 % ±3.54% ±4.71% ±6.14% fs/read-stream-throughput.js size=1048576 filesize=1048576000 encodingType="utf" 0.21 % ±10.19% ±13.56% ±17.65% fs/read-stream-throughput.js size=4096 filesize=1048576000 encodingType="asc" -1.84 % ±3.14% ±4.17% ±5.43% fs/read-stream-throughput.js size=4096 filesize=1048576000 encodingType="buf" -2.14 % ±4.09% ±5.44% ±7.08% fs/read-stream-throughput.js size=4096 filesize=1048576000 encodingType="utf" 1.17 % ±3.53% ±4.70% ±6.11% fs/read-stream-throughput.js size=65535 filesize=1048576000 encodingType="asc" 1.25 % ±7.25% ±9.65% ±12.56% fs/read-stream-throughput.js size=65535 filesize=1048576000 encodingType="buf" * 6.19 % ±5.13% ±6.86% ±9.01% fs/read-stream-throughput.js size=65535 filesize=1048576000 encodingType="utf" 2.12 % ±6.49% ±8.64% ±11.25% fs/readfile.js concurrent=1 len=1024 dur=5 -0.33 % ±3.64% ±4.84% ±6.31% fs/readfile.js concurrent=1 len=16777216 dur=5 -3.03 % ±5.35% ±7.12% ±9.27% fs/readfile.js concurrent=10 len=1024 dur=5 -0.98 % ±2.13% ±2.83% ±3.68% fs/readfile.js concurrent=10 len=16777216 dur=5 0.14 % ±1.72% ±2.29% ±2.98% fs/readFileSync.js n=600000 -4.79 % ±7.59% ±10.10% ±13.17% fs/write-stream-throughput.js size=1024 encodingType="asc" dur=5 -1.60 % ±8.77% ±11.66% ±15.18% fs/write-stream-throughput.js size=1024 encodingType="buf" dur=5 0.39 % ±12.28% ±16.34% ±21.27% fs/write-stream-throughput.js size=1024 encodingType="utf" dur=5 -0.77 % ±5.77% ±7.69% ±10.03% fs/write-stream-throughput.js size=1048576 encodingType="asc" dur=5 0.18 % ±0.99% ±1.31% ±1.71% fs/write-stream-throughput.js size=1048576 encodingType="buf" dur=5 * 23.42 % ±23.37% ±31.12% ±40.58% fs/write-stream-throughput.js size=1048576 encodingType="utf" dur=5 2.73 % ±7.07% ±9.45% ±12.41% fs/write-stream-throughput.js size=2 encodingType="asc" dur=5 0.74 % ±11.24% ±14.96% ±19.47% fs/write-stream-throughput.js size=2 encodingType="buf" dur=5 -2.84 % ±5.64% ±7.52% ±9.81% fs/write-stream-throughput.js size=2 encodingType="utf" dur=5 * -15.35 % ±14.18% ±18.87% ±24.58% fs/write-stream-throughput.js size=65535 encodingType="asc" dur=5 -3.07 % ±10.67% ±14.21% ±18.53% fs/write-stream-throughput.js size=65535 encodingType="buf" dur=5 * -12.87 % ±11.94% ±15.88% ±20.67% fs/write-stream-throughput.js size=65535 encodingType="utf" dur=5 3.28 % ±9.81% ±13.08% ±17.06% Be aware that when doing many comparisons the risk of a false-positive result increases. In this case there are 41 comparisons, you can thus expect the following amount of false-positive results: 2.05 false positives, when considering a 5% risk acceptance (*, **, ***), 0.41 false positives, when considering a 1% risk acceptance (**, ***), 0.04 false positives, when considering a 0.1% risk acceptance (***)

AndreasMadsen · 2018-01-25T15:05:55Z

/cc @joyeecheung
/cc @jasnell @Matteo - who have been my "I don't understand statistic" test subjects.

joyeecheung

Not a R expert, but LGTM if benchmark CI is happy.

joyeecheung · 2018-01-25T15:16:16Z

Benchmark CI: https://ci.nodejs.org/job/benchmark-node-micro-benchmarks/103/

Looking at https://github.com/nodejs/benchmarking/blob/master/experimental/benchmarks/community-benchmark/run.sh I think this should be using the new R script for the results, cc @gareth-ellis

joyeecheung · 2018-01-25T17:46:56Z

 confidence improvement accuracy (*) (**) (***) arrays/var-int.js n=25 type="Array" 4.97 % ±13.05% ±17.96% ±24.65% arrays/var-int.js n=25 type="Buffer" -0.85 % ±12.22% ±16.76% ±22.88% arrays/var-int.js n=25 type="Float32Array" -0.31 % ±0.43% ±0.58% ±0.80% arrays/var-int.js n=25 type="Float64Array" 0.34 % ±0.50% ±0.69% ±0.95% arrays/var-int.js n=25 type="Int16Array" 3.92 % ±9.76% ±14.00% ±20.56% arrays/var-int.js n=25 type="Int32Array" -0.21 % ±0.48% ±0.66% ±0.93% arrays/var-int.js n=25 type="Int8Array" -5.21 % ±9.54% ±13.70% ±20.15% arrays/var-int.js n=25 type="Uint16Array" -0.14 % ±12.71% ±17.42% ±23.73% arrays/var-int.js n=25 type="Uint32Array" -0.79 % ±1.49% ±2.13% ±3.09% arrays/var-int.js n=25 type="Uint8Array" 4.23 % ±10.50% ±15.08% ±22.17% arrays/zero-float.js n=25 type="Array" -1.47 % ±4.95% ±7.10% ±10.41% arrays/zero-float.js n=25 type="Buffer" 3.84 % ±9.00% ±12.92% ±18.98% arrays/zero-float.js n=25 type="Float32Array" 0.30 % ±0.53% ±0.73% ±0.99% arrays/zero-float.js n=25 type="Float64Array" -0.07 % ±1.77% ±2.50% ±3.59% arrays/zero-float.js n=25 type="Int16Array" 2.44 % ±7.92% ±11.26% ±16.31% arrays/zero-float.js n=25 type="Int32Array" -0.04 % ±0.63% ±0.88% ±1.24% arrays/zero-float.js n=25 type="Int8Array" 0.10 % ±2.76% ±3.78% ±5.16% arrays/zero-float.js n=25 type="Uint16Array" -0.72 % ±1.92% ±2.74% ±3.98% arrays/zero-float.js n=25 type="Uint32Array" 0.67 % ±0.92% ±1.27% ±1.76% arrays/zero-float.js n=25 type="Uint8Array" -7.27 % ±11.99% ±17.15% ±25.07% arrays/zero-int.js n=25 type="Array" 2.16 % ±8.68% ±11.95% ±16.41% arrays/zero-int.js n=25 type="Buffer" 3.86 % ±8.99% ±12.91% ±18.97% arrays/zero-int.js n=25 type="Float32Array" -0.26 % ±0.57% ±0.78% ±1.07% arrays/zero-int.js n=25 type="Float64Array" -0.42 % ±0.65% ±0.90% ±1.25% arrays/zero-int.js n=25 type="Int16Array" 4.87 % ±9.30% ±13.36% ±19.64% arrays/zero-int.js n=25 type="Int32Array" 0.16 % ±1.78% ±2.51% ±3.61% arrays/zero-int.js n=25 type="Int8Array" 4.88 % ±9.88% ±14.18% ±20.82% arrays/zero-int.js n=25 type="Uint16Array" 3.03 % ±9.31% ±13.27% ±19.29% arrays/zero-int.js n=25 type="Uint32Array" 1.28 % ±2.26% ±3.24% ±4.74% arrays/zero-int.js n=25 type="Uint8Array" 4.17 % ±9.60% ±13.79% ±20.27% Be aware that when doing many comparisions the risk of a false-positive result increases. In this case there are 30 comparisions, you can thus expect the following amount of false-positive results: 1.50 false positives, when considering a 5% risk acceptance (*, **, ***), 0.30 false positives, when considering a 1% risk acceptance (**, ***), 0.03 false positives, when considering a 0.1% risk acceptance (***) Notifying upstream projects of job completion Finished: SUCCESS

The accuracy there probably means those benchmarks are just not that reliable in nature...

gareth-ellis · 2018-01-27T11:00:36Z

@joyeecheung / @AndreasMadsen

LGTM, i seem to be having issues with my email notifications at the moment.
I've also noticed in the output that its not particularly easy to confirm what has been done - i should be able to update the script to make it very clear what change we make to the build under test, which should make this a lot clearer.

As there is the extra warning about false positives in the benchmark output, I think we can be sure that the change was in this build - but i think i can make it clearer in the future.!

Note, the job is actually running https://github.com/nodejs/benchmarking/blob/core-benchmark/experimental/benchmarks/community-benchmark/run.sh

I'll get this into master, as that's going to lead to even more confusion (I made some changes to try and reduce output from the build, but I need to also get it to take stderr away, as we have a lot of warnings that make the rest of the output trickier to understand.)

joyeecheung · 2018-01-29T05:26:55Z

Landed in 368517c, thanks!

PR-URL: #18373 Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>

PR-URL: nodejs#18373 Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>

benchmark: make compare.R easier to understand
5bb948c

nodejs-github-bot added the benchmark Issues and PRs related to the benchmark subsystem. label Jan 25, 2018

joyeecheung approved these changes Jan 25, 2018
View reviewed changes

jasnell approved these changes Jan 25, 2018
View reviewed changes

joyeecheung mentioned this pull request Jan 25, 2018
benchmark: cut down http benchmark run time #18379
Closed
3 tasks

maclover7 force-pushed the master branch from bb5575a to 993b716Compare January 26, 2018 22:03

cjihrig force-pushed the master branch from 993b716 to 082f952Compare January 26, 2018 22:36

gareth-ellis approved these changes Jan 27, 2018
View reviewed changes

BridgeAR approved these changes Jan 27, 2018
View reviewed changes

joyeecheung added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Jan 29, 2018

joyeecheung closed this Jan 29, 2018

joyeecheung removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Jan 29, 2018

joyeecheung pushed a commit that referenced this pull request Jan 29, 2018
benchmark: make compare.R easier to understand
368517c
PR-URL: #18373 Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>

evanlucas mentioned this pull request Jan 30, 2018
v9.5.0 release proposal #18464
Merged

evanlucas pushed a commit that referenced this pull request Jan 30, 2018
benchmark: make compare.R easier to understand
b5ec6ea
PR-URL: #18373 Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>

MylesBorins added lts-watch-v6.x and removed lts-watch-v8.x labels Feb 27, 2018

MylesBorins pushed a commit that referenced this pull request Feb 27, 2018
benchmark: make compare.R easier to understand
f779a8b
PR-URL: #18373 Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>

MylesBorins added the dont-land-on-v6.x label Feb 27, 2018

MylesBorins removed the lts-watch-v6.x label Feb 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

benchmark: make compare.R easier to understand#18373

benchmark: make compare.R easier to understand #18373

Uh oh!

AndreasMadsen commented Jan 25, 2018•
edited
Loading

Uh oh!

AndreasMadsen commented Jan 25, 2018

Uh oh!

joyeecheung left a comment

Uh oh!

joyeecheung commented Jan 25, 2018

Uh oh!

joyeecheung commented Jan 25, 2018

Uh oh!

gareth-ellis commented Jan 27, 2018•
edited
Loading

Uh oh!

joyeecheung commented Jan 29, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Uh oh!

benchmark: make compare.R easier to understand#18373

benchmark: make compare.R easier to understand #18373

Uh oh!

Conversation

AndreasMadsen commented Jan 25, 2018• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Affected core subsystem(s)

Uh oh!

AndreasMadsen commented Jan 25, 2018

Uh oh!

joyeecheung left a comment

Choose a reason for hiding this comment

Uh oh!

joyeecheung commented Jan 25, 2018

Uh oh!

joyeecheung commented Jan 25, 2018

Uh oh!

gareth-ellis commented Jan 27, 2018• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joyeecheung commented Jan 29, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

AndreasMadsen commented Jan 25, 2018•
edited
Loading

gareth-ellis commented Jan 27, 2018•
edited
Loading