Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34.2k
util: refactor format method.#12407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
jseijas commented Apr 14, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Method format refactored to make it more maintenable, replacing the switch by a function factory, that returns the appropiated function given the character (d, i , f, j, s). Also, performance when formatting an string that contains several consecutive % symbols is improved. The test: `const numSamples = 10000000; const strPercents = '%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%s%%%%%%%%%%%%%%%%%i%%%%%%%%%%%%%%%%%%%%%%%%%%'; var s; console.time('Percents'); for (let i = 0; i < numSamples; i++){s = util.format(strPercents, 'test', 12)} console.timeEnd('Percents');` Original time: 28399.708ms After refactor: 23763.788ms Improved: 16%vsemozhetbyt commented Apr 14, 2017
Thank you for the contribution! |
| break; | ||
| if(lastPos<i) | ||
| str+=f.slice(lastPos,i); | ||
| str+=tryStringify(arguments[a++]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this function was used only here (linter reports 'tryStringify' is defined but never used)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. I commited the changes. Also passed all tests.
lib/util.js Outdated
| continue; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linter: Trailing spaces not allowed
vsemozhetbyt commented Apr 14, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
|
hiroppy commented Apr 14, 2017
Thanks for your pull request. I think you should write a test before refactoring. |
mscdex commented Apr 14, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
These changes as-is introduce performance regressions: I didn't bother running with I believe similar changes had been suggested in the past (although with a lookup table instead of a function with a switch) and there were performance regressions back then as well. I'm not sure if (or how much) inlining the helper function would make a positive difference. |
vsemozhetbyt commented Apr 14, 2017
jseijas commented Apr 14, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
I updated the PR with these changes:
About performance asked by @mscdex: Previous master: This PR Also I checked with my own script to compare between current master and this PR. This is the script that I used: And these are the results with my script: Previous master This PR: After 10 different runs, I always get similar times for the Short, Normal, and Less Parameters, but for the Percents (use of '%%%%%%%%%%' inside the string) I'm getting always best performance with the PR code. |
test/parallel/test-util-format.js Outdated
| assert.strictEqual(util.format('o: %j, a: %j',{}),'o:{}, a: %j'); | ||
| assert.strictEqual(util.format('o: %j, a: %j'),'o: %j, a: %j'); | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: unneccesary newline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
silverwind left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if perf is the same.
benjamingr left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM, I'm fine with a ~5% regression in perf of util.format since it's not used in "hot paths" of node apps anyway and I think the new code is a lot more readable.
mscdex commented Apr 14, 2017
@benjamingr I wouldn't say it's not in a "hot path." I would bet at least logging modules and such make use of it. I would also bet that having consecutive %'s like that is uncommon. |
jseijas commented Apr 14, 2017
@mscdex You're right, consecutive %'s is uncommon. But the objective of this PR is not to improve the performance, as you can read in the description the objective is to make it more maintenable. I checked the perf in a Linux, a Windows and a docker, and I'm not able to reproduce the -5%. If you provide me a way to reproduce it I will put my hands on it! :) Also, the final target of this PR is, perhaps in the future, add more usable formats (%u, %o, %x, %X, %e and %E) without having duplicated code go total nightmare. |
benjamingr commented Apr 14, 2017
I was under the impression logging modules are never in hot paths anyway - but I'm ready to be educated :) |
mscdex commented Apr 14, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
@benjamingr Obviously it depends on the application. It could easily be a bigger issue when you enable extensive debugging in your application in production (even temporarily). You could make a similar argument for other node core modules/functions that may/may not be directly used by core itself (e.g. @jseijas Here's what I used: Where |
Improved performance
jseijas commented Apr 15, 2017
@mscdex Changed the way of doing the refactor. With the new way: I don't understand why for %j the performance is decreased. But also while refactoring I found some odd behaviours with the performance, example: |
mscdex commented Apr 15, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
@jseijas It's hard to say why offhand without running the benchmark with I would expect it to be somewhat slower (but not necessarily a regression) because the object type case is farther down in the switch and V8 matches each in order. |
vsemozhetbyt commented Apr 15, 2017
lib/util.js Outdated
| str+='%'; | ||
| lastPos=i=i+2; | ||
| continue; | ||
| if(f.charCodeAt(i+1)===37||a<argLen){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linter: Infix operators must be spaced
Fix lint error: Infix operators must be spaced
benjamingr commented Apr 15, 2017
I don't know - I just want to see a case where the performance is meaningful in util.format. Even a single telemetry tool that logs requests and uses
I might be seeing this the wrong way - but
Of course, a faster implementation would parse the format string once and then compile it to a function with |
vsemozhetbyt commented Apr 15, 2017
jseijas commented Apr 15, 2017
@benjamingr Well, I just was doing several changes and learning how they affect to the performance. I'm with you, I never considered util.format as critical, I consider it more like an utility, and I think that could be interesting to evolve it to take into account the different formats of "printf". I think that can be interesting to have at least %e and %x provided by the core, without installing any other package. Even more interesting to have things like "%8.2f", but in this case a deeper modification of the function is needed. But if we think "ok, it's critical, we are using it for the logs of the app", then we can see that we got it improved a 15% for numbers and strings (the usual case of logs). For those that wants to stringify objects inside the messages, we are talking of a -5% performance, in my laptop it means 0.0001ms per message... then I have to store this log, so I don't think that this time is really relevant if we consider the I/O operation for storing. We can consider it relevant if our system is storing huge amounts of logs per second, but in this case I think there are a bunch of problems at architecture level of the app before even looking to this 5%. But well, is my point of view ^^ |
evanlucas commented Apr 15, 2017
I would prefer we not knowingly allow a performance regression to util.format(). It is used in things like |
jseijas commented Apr 16, 2017
Ok, I changed the code to focus on performance. Is still more readable than the original one, but now the performance is: So we have more readable code, and with better performance. Yay! Even with that, I'm not happy. I want to say:
|
jseijas commented Apr 16, 2017 • edited by gibfahn
Loading Uh oh!
There was an error while loading. Please reload this page.
edited by gibfahn
Uh oh!
There was an error while loading. Please reload this page.
One thing that I don't understand: There is a real problem with the no-replace performance, and is due to the fact that the behaviour of util.format(1,2) is totally different from util.format('', 1, 2), even returning the same result. If we take a look into the code the first is resolved with constobjects=newArray(arguments.length);for(varindex=0;index<arguments.length;index++){objects[index]=inspect(arguments[index]);}returnobjects.join(' ');and the second one with: while(a<argLen){constx=arguments[a++];if(x===null||(typeofx!=='object'&&typeofx!=='symbol')){str+=' '+x;}else{str+=' '+inspect(x);}}Where we see that the inspect is only called when the object is not null, and is an object or symbol, so no inspect will be called for strings or numbers. That's the reason why no-replace benchmark takes so much time compared with the other options. Replacing the first part of the util.format with this: vara=0;varstr='';if(typeoff!=='string'){while(a<arguments.length){if(a>0)str+=' ';constx=arguments[a++];if(x===null||(typeofx!=='object'&&typeofx!=='symbol')){str+=x;}else{str+=inspect(x);}}returnstr;}then we get this benchmark result for no-replace: @mscdex Is this fix good, or it something that I'm not taking into account and there is a good reason for doing inspect for all the parameters only at the no-replace part? I think that the differences should be the stylize, the "-0" case, and the replacements done in strings. But if we are defending that |
aqrln commented Apr 16, 2017
refack commented Apr 16, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
[Silly comment] how about adding |
mscdex commented Apr 16, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
@jseijas I think backwards compatibility should be preferred since then more branches can receive at least some amount of increased performance, but if backwards compatibility can't be met and you feel the performance improvement is great enough and is worth marking as semver-major, then so be it. @refack I don't think getting into that kind of habit with the node API is a good goal. If anything it's usually a sign that the changes should just go into a userland module instead. |
jseijas commented Apr 16, 2017
@mscdex Ok, then I'll make this change ina new PR, if this one is merged, because currently this one has not backwards impact. And also are different topics. To give more information about how critical is the impact that this have, running |
jasnell commented Apr 17, 2017
LGTM but I'd like @mscdex to sign off before it lands |
addaleax left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fresh CI: https://ci.nodejs.org/job/node-test-commit/9516/
I’m good with landing this as it is once CI comes back good.
jseijas commented May 23, 2017
Hi @mscdex! Do you need more changes to be made, or you are ok with this? Or any other suggestion. |
lib/util.js Outdated
| varlastPos=0; | ||
| for(vari=0;i<f.length;){ | ||
| if(f.charCodeAt(i)===37/*'%'*/&&i+1<f.length){ | ||
| if(f.charCodeAt(i)===37/*'%'*/&&f.length>i+1){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having the "static" portion on the right-hand side is more traditional.
lib/util.js Outdated
| for(vari=0;i<f.length;){ | ||
| if(f.charCodeAt(i)===37/*'%'*/&&i+1<f.length){ | ||
| if(f.charCodeAt(i)===37/*'%'*/&&f.length>i+1){ | ||
| if(f.charCodeAt(i+1)!==37&&a>=arguments.length){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The /*'%'*/ is missing here for consistency.
| returnstr; | ||
| }; | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unintended whitespace change?
| case100: // 'd' | ||
| if(a>=argLen) | ||
| break; | ||
| if(lastPos<i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer this style because lastPos will always be either less than or equal to i.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't understand this point. You meain only remain lastPos < i instead of i > lastPos ? Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean I prefer lastPos < i
- Correct whitespace - Static parts of if before unstatic.
jseijas commented May 31, 2017
@mscdex Changes done! |
jseijas commented May 31, 2017
After rebase: |
silverwind commented Jun 1, 2017
mscdex commented Jun 1, 2017
LGTM |
silverwind commented Jun 2, 2017
Thanks! Landed in 4191962. |
Method format refactored to make it more maintenable, replacing the switch by a function factory, that returns the appropiated function given the character (d, i , f, j, s). Also, performance when formatting an string that contains several consecutive % symbols is improved. The test: `const numSamples = 10000000; const strPercents = '%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%s%%%%%%%%%%%%%%%%%i%%%%%%%%%%%%%%%%%%%%%%%%%%'; var s; console.time('Percents'); for (let i = 0; i < numSamples; i++){s = util.format(strPercents, 'test', 12)} console.timeEnd('Percents');` Original time: 28399.708ms After refactor: 23763.788ms Improved: 16% PR-URL: #12407 Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: Roman Reiss <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Brian White <[email protected]>Method format refactored to make it more maintenable, replacing the switch by a function factory, that returns the appropiated function given the character (d, i , f, j, s). Also, performance when formatting an string that contains several consecutive % symbols is improved. The test: `const numSamples = 10000000; const strPercents = '%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%s%%%%%%%%%%%%%%%%%i%%%%%%%%%%%%%%%%%%%%%%%%%%'; var s; console.time('Percents'); for (let i = 0; i < numSamples; i++){s = util.format(strPercents, 'test', 12)} console.timeEnd('Percents');` Original time: 28399.708ms After refactor: 23763.788ms Improved: 16% PR-URL: #12407 Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: Roman Reiss <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Brian White <[email protected]>mscdex commented Jun 13, 2017
FWIW it looks like this could be causing issues due to the change in behavior (see #13665), so I've marked this as semver-major now. |
Method format refactored to make it more maintenable, replacing the
switch by a function factory, that returns the appropiated function
given the character (d, i , f, j, s).
Also, performance when formatting an string that contains several
consecutive % symbols is improved. The test:
Original time: 28399.708ms
After refactor: 23763.788ms
Improved: 16%
Checklist
make -j4 test(UNIX), orvcbuild test(Windows) passesAffected core subsystem(s)