Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34.3k
test: fix test-inspector-port-zero-cluster#13373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: fix test-inspector-port-zero-cluster #13373
Uh oh!
There was an error while loading. Please reload this page.
Conversation
refack commented Jun 1, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Trott commented Jun 1, 2017
Does this fix a bug in the test or is this more of a workaround for a bug in Node.js? Seems like the latter to me, but maybe I'm wrong? |
refack commented Jun 1, 2017
Stress (fedora24 / -j 64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're in here fixing up the test, I wonder if we want to change this? It's not really guaranteed that the ports are sequential, is it? If nothing else, won't it wrap around to lower port numbers after 65535? I wonder if we want to change this to a check that we have three unique port numbers rather than sequential port numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are supposed to be sequential #13343 (comment)
It's a bug/feature of clusterhttps://github.com/nodejs/node/blob/master/lib/internal/cluster/master.js#L113
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooof, thanks for the pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test might be more fragile since with --inspect=0 we ask for ports from the ephemeral range, and with --inspect=3993 it's a less used range.
Trott commented Jun 1, 2017
The CI output in #13343 does not show two workers grabbing the same port so I'm not sure this is going to solve the issue. Maybe the worker that fails is conflicting with a pre-existing process on the host? In which case, I'm not sure there's anything we can do about it in the test itself? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
common.mustCall() failing in a worker won't do anything, so I wouldn't include it here because it gives the reader a false sense that it's an effective check. (You can test this by passing common.mustCall(fn, 1000) or something like that.)
refack commented Jun 1, 2017
I think it narrows down the test-case, to make sure the ports are assigned and are sequential.
Mark as flaky? Add a known issue (create a server listening to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
common.mustCall() failing in a worker won't do anything, so I wouldn't include it here because it gives the reader a false sense that it's an effective check. (You can test this by passing common.mustCall(fn, 1000) or something like that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked at in again and added cluster.on('exit') with assertions that the worker exited properly.
Trott commented Jun 1, 2017
Maybe an OK workaround would be to check to see if the I'm +1 on creating a (Thanks for tackling this, by the way. I'd probably be going down a completely wrong rabbit hole.) |
refack commented Jun 1, 2017
Yes, makes sense.
I was in the neighborhood with #12941 |
refack commented Jun 1, 2017
@Trott I did a mix-and-match
|
refack commented Jun 1, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
P.S. test is still flaky unless we allow for more than 1 fail |
Trott commented Jun 1, 2017
I don't feel so strongly about this that I'd stop this from landing, but: I think blocking one of the ports with a listening socket intentionally kind of changes what this test is about. This is happy-path testing and if we want to check for problems, we should probably set up a different test. |
Trott commented Jun 1, 2017
@nodejs/testing |
Trott commented Jun 1, 2017
Oops, never mind! It is an additional test case. Ignore me. Sorry for the noise. |
refack commented Jun 1, 2017
Accept for asserting current behavior, I'm also not sure what this test is looking for. |
bnoordhuis left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #13343 (comment), if my hypothesis is correct, I don't think this PR will materially fix that while rather obscuring the actual feature under test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert.strictEqual(signal, null)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be shortened to cluster.on('error', assert.fail) but is this event actually emitted? If yes, the documentation in cluster.md is incomplete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: sentinel (ditto a few lines up.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't call process.exit(), call process.disconnect() and let the worker exit naturally.
refack commented Jun 3, 2017
Ack. Re: #13373 (comment) |
2b44385 to 90dadd9Comparerefack commented Jun 3, 2017
refack commented Jun 3, 2017
Gate is green, Full CI: https://ci.nodejs.org/job/node-test-pull-request/8458/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix comment after CI finished
refack commented Jun 3, 2017
@bnoordhuis@Trott@jasnell @nodejs/testing I've split this in two:
|
Trott commented Jun 3, 2017
I think @bnoordhuis is saying that the behavior is not a bug but rather expected behavior. If so a |
refack commented Jun 11, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
So as I see it:
I agree that (2) is an edge case (the real issue is #12941), and (3) has a fix so if you want I could kick them out. |
Trott commented Jun 11, 2017
If there's consensus that we really do want to provide a way for users to opt out, then my preference for 2 would be that we keep it after all but include a comment that explains something along the lines of "known limitation, currently working as intended, but we need a way to allow the user to opt out of sequential port assignment". If another PR is going to fix the issue in #3, then the test can go right into that PR, but I don't object to it being here instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this comment be clarified a bit? This makes it sound like "If Node.js were working correctly and also addressing this issue, then this test should fail." Maybe instead of "should", something like "This test currently fails with:" or maybe something even more verbose like:
With the current behavior of Node.js (at least as late as 8.1.0), this test fails with the following error:
AssertionError [ERR_ASSERTION]: worker 2 failed to bind port
Ideally, there would be a way for the user to opt out of sequential port assignment and this test would then pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
refack commented Jun 11, 2017
mutantcornholio commented Jun 14, 2017
I can't say that it would fix this specific test flakiness, but what if This approach still has some probability of race-conditions: even if we would guarantee that [checking the port and starting the test] would be atomic, tests will not start to listen on the port right away. But if the port is taken by some system or long-enough-running process, it could do the trick. Separating tests that use zero port and tests that use My opinion is we need to find a way to provide a stable infrastructure to the tests, not to rewrite the tests to the point they would be aware of the problem. |
Trott commented Jun 14, 2017
We will never be able to completely guarantee that a specified port is available for a test before the test actually tries to use it. So the task at hand is to determine a Good Enough solution. I think the solution we have now is adequate and adding more layers of engineering will likely make things worse, not better. Solution now is: If you are using If someone slips up and puts the test in |
mutantcornholio commented Jun 15, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Wait, shouldn't we disable port incrementing behavior in port 0? If we're letting OS decide which port to allocate, incrementing after that port is not a good idea IMHO. User have no control on that port range => it will lead to port collisions. If master started with |
refack commented Jun 15, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
It was discussed and for now we believe that deterministic worker port allocation makes more sense. #13343 (comment) Hopefully your second PR (manual override) will cover the other use cases. [edit] |
mutantcornholio commented Jun 15, 2017
Yeah, I think, users will be able to start master with
I thought that whole idea of |
refack commented Jun 16, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Validating that |
* re-implemented test to parse args instead of post binding (exit 12) * saved failing case as known issue PR-URL: nodejs#13373Fixes: nodejs#13343 Reviewed-By: James M Snell <[email protected]>
81bc47f to fe2caf6Comparerefack commented Jun 16, 2017
|
refack commented Jun 16, 2017
Extra stress on |
refack commented Jun 16, 2017
@mutantcornholio IMHO your input is as valuable as anyone else's 👍 |
* re-implemented test to parse args instead of post binding (exit 12) * saved failing case as known issue PR-URL: #13373Fixes: #13343 Reviewed-By: James M Snell <[email protected]>
* re-implemented test to parse args instead of post binding (exit 12) * saved failing case as known issue PR-URL: #13373Fixes: #13343 Reviewed-By: James M Snell <[email protected]>
Fixes: #13343
Checklist
make -j4 test(UNIX), orvcbuild test(Windows) passesAffected core subsystem(s)
test,cluster,inspector