test: fix test-inspector-port-zero-cluster#13373

refack · 2017-06-01T17:08:14Z

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
commit message follows commit guidelines

Affected core subsystem(s)

test,cluster,inspector

Trott · 2017-06-01T17:17:17Z

Does this fix a bug in the test or is this more of a workaround for a bug in Node.js? Seems like the latter to me, but maybe I'm wrong?

refack · 2017-06-01T17:17:40Z

Stress (fedora24 / -j 64 inspector X 100) https://ci.nodejs.org/job/node-stress-single-test/1248/nodes=fedora24/

Trott · 2017-06-01T17:18:32Z

test/inspector/test-inspector-port-zero-cluster.js

While we're in here fixing up the test, I wonder if we want to change this? It's not really guaranteed that the ports are sequential, is it? If nothing else, won't it wrap around to lower port numbers after 65535? I wonder if we want to change this to a check that we have three unique port numbers rather than sequential port numbers?

They are supposed to be sequential #13343 (comment)
It's a bug/feature of clusterhttps://github.com/nodejs/node/blob/master/lib/internal/cluster/master.js#L113

Ooof, thanks for the pointer.

This test might be more fragile since with --inspect=0 we ask for ports from the ephemeral range, and with --inspect=3993 it's a less used range.

Trott · 2017-06-01T17:29:38Z

The CI output in #13343 does not show two workers grabbing the same port so I'm not sure this is going to solve the issue. Maybe the worker that fails is conflicting with a pre-existing process on the host? In which case, I'm not sure there's anything we can do about it in the test itself?

Trott · 2017-06-01T17:31:41Z

test/inspector/test-inspector-port-zero-cluster.js

common.mustCall() failing in a worker won't do anything, so I wouldn't include it here because it gives the reader a false sense that it's an effective check. (You can test this by passing common.mustCall(fn, 1000) or something like that.)

refack · 2017-06-01T17:32:16Z

Does this fix a bug in the test or is this more of a workaround for a bug in Node.js? Seems like the latter to me, but maybe I'm wrong?

I think it narrows down the test-case, to make sure the ports are assigned and are sequential.
Maybe the code before [fork(), fork(), fork()] is more "real world" 🤔

The CI output in #13343 does not show two workers grabbing the same port so I'm not sure this is going to solve the issue. Maybe the worker that fails is conflicting with a pre-existing process on the host? In which case, I'm not sure there's anything we can do about it in the test itself?

Mark as flaky? Add a known issue (create a server listening to process.debugPort + 1 before a fork())

Trott · 2017-06-01T17:32:45Z

test/inspector/test-inspector-port-zero-cluster.js

common.mustCall() failing in a worker won't do anything, so I wouldn't include it here because it gives the reader a false sense that it's an effective check. (You can test this by passing common.mustCall(fn, 1000) or something like that.)

Looked at in again and added cluster.on('exit') with assertions that the worker exited properly.

Trott · 2017-06-01T17:36:13Z

Maybe an OK workaround would be to check to see if the fork() failed and, if so, fork() again (perhaps checking that it failed because the address was in use before forking again)?

I'm +1 on creating a known_issues test while we're at it (if one doesn't already exist). And include a comment in this test indicating that the check-for-error-and-fork-again workaround can be removed once that known issue is fixed?

(Thanks for tackling this, by the way. I'd probably be going down a completely wrong rabbit hole.)

refack · 2017-06-01T17:44:34Z

Maybe an OK workaround would be to check to see if the fork() failed

Yes, makes sense.

(Thanks for tackling this, by the way. I'd probably be going down a completely wrong rabbit hole.)

I was in the neighborhood with #12941

refack · 2017-06-01T18:15:17Z

@Trott I did a mix-and-match

forking is back to non-deterministic Promise.all()
bind process.debugPort + 2 so worker2 should fail
added a ton of assertions

refack · 2017-06-01T18:16:23Z

P.S. test is still flaky unless we allow for more than 1 fail
...
~~Ohh, I have an idea~~

Trott · 2017-06-01T20:41:54Z

I don't feel so strongly about this that I'd stop this from landing, but: I think blocking one of the ports with a listening socket intentionally kind of changes what this test is about. This is happy-path testing and if we want to check for problems, we should probably set up a different test.

Trott · 2017-06-01T20:42:00Z

@nodejs/testing

Trott · 2017-06-01T20:44:09Z

I don't feel so strongly about this that I'd stop this from landing, but: I think blocking one of the ports with a listening socket intentionally kind of changes what this test is about. This is happy-path testing and if we want to check for problems, we should probably set up a different test.

Oops, never mind! It is an additional test case. Ignore me. Sorry for the noise.

refack · 2017-06-01T21:15:21Z

Oops, never mind! It is an additional test case. Ignore me. Sorry for the noise.

Accept for asserting current behavior, I'm also not sure what this test is looking for.
I think I'll split it for testing proper worker setup (not actual connection), and the port clash as a known issue.

bnoordhuis

See #13343 (comment), if my hypothesis is correct, I don't think this PR will materially fix that while rather obscuring the actual feature under test.

bnoordhuis · 2017-06-03T10:52:34Z

test/inspector/test-inspector-port-zero-cluster.js

assert.strictEqual(signal, null)?

bnoordhuis · 2017-06-03T10:54:28Z

test/inspector/test-inspector-port-zero-cluster.js

Could be shortened to cluster.on('error', assert.fail) but is this event actually emitted? If yes, the documentation in cluster.md is incomplete.

bnoordhuis · 2017-06-03T10:58:51Z

test/inspector/test-inspector-port-zero-cluster.js

Typo: sentinel (ditto a few lines up.)

bnoordhuis · 2017-06-03T11:06:21Z

test/inspector/test-inspector-port-zero-cluster.js

Don't call process.exit(), call process.disconnect() and let the worker exit naturally.

refack · 2017-06-03T12:27:48Z

See #13343 (comment), if my hypothesis is correct, I don't think this PR will materially fix that while rather obscuring the actual feature under test.

Ack. Re: #13373 (comment)
I think I'm going to turn this into a failing known_issue and leave it at that.

refack · 2017-06-03T13:21:48Z

Gate CI: https://ci.nodejs.org/job/node-test-commit-linuxone/6351/

refack · 2017-06-03T13:23:52Z

Gate is green, Full CI: https://ci.nodejs.org/job/node-test-pull-request/8458/

refack · 2017-06-03T13:27:57Z

test/inspector/test-inspector-port-zero-cluster.js

Will fix comment after CI finished

refack · 2017-06-03T13:33:50Z

@bnoordhuis @Trott @jasnell @nodejs/testing I've split this in two:

RFC: Original test would skip if detects a port clash. Not sure if it's better than marking as Flaky?
new known_issue to demonstrate a deterministic port clash
PTAL?

Trott · 2017-06-03T17:40:12Z

I think @bnoordhuis is saying that the behavior is not a bug but rather expected behavior. If so a known_issue test seems inappropriate. If the port must be 12345 and 12345 is in use by another process, there's nothing Node.js can do about that. Or am I missing something?

refack · 2017-06-11T22:29:39Z

So as I see it:

test/inspector/test-inspector-port-zero-cluster.js - works as designed
test/known_issues/test-inspector-cluster-port-clash.js - known limitation, need to find a way to let user opt out
test/known_issues/test-inspector-port-zero-cluster.js - just a bug that Fixed cluster inspect port logic #13619 fixes

I agree that (2) is an edge case (the real issue is #12941), and (3) has a fix so if you want I could kick them out.

Trott · 2017-06-11T22:46:11Z

I agree that (2) is an edge case (the real issue is #12941), and (3) has a fix so if you want I could kick them out.

If there's consensus that we really do want to provide a way for users to opt out, then my preference for 2 would be that we keep it after all but include a comment that explains something along the lines of "known limitation, currently working as intended, but we need a way to allow the user to opt out of sequential port assignment".

If another PR is going to fix the issue in #3, then the test can go right into that PR, but I don't object to it being here instead.

Trott · 2017-06-11T22:48:53Z

test/known_issues/test-inspector-cluster-port-clash.js

Can this comment be clarified a bit? This makes it sound like "If Node.js were working correctly and also addressing this issue, then this test should fail." Maybe instead of "should", something like "This test currently fails with:" or maybe something even more verbose like:
With the current behavior of Node.js (at least as late as 8.1.0), this test fails with the following error:
AssertionError [ERR_ASSERTION]: worker 2 failed to bind port
Ideally, there would be a way for the user to opt out of sequential port assignment and this test would then pass.

refack · 2017-06-11T23:23:08Z

Quick sanity: https://ci.nodejs.org/job/node-test-commit-linuxone/6557/

mutantcornholio · 2017-06-14T06:38:42Z

I can't say that it would fix this specific test flakiness, but what if testpy would check if all ports in the range of next common.PORT are free, and if not, use another range? Seems like a better solution to me.

This approach still has some probability of race-conditions: even if we would guarantee that [checking the port and starting the test] would be atomic, tests will not start to listen on the port right away. But if the port is taken by some system or long-enough-running process, it could do the trick.

Separating tests that use zero port and tests that use common.PORT should lower the probability of this, from the other side.

My opinion is we need to find a way to provide a stable infrastructure to the tests, not to rewrite the tests to the point they would be aware of the problem.

Trott · 2017-06-14T20:08:53Z

I can't say that it would fix this specific test flakiness, but what if testpy would check if all ports in the range of next common.PORT are free, and if not, use another range? Seems like a better solution to me.

We will never be able to completely guarantee that a specified port is available for a test before the test actually tries to use it.

So the task at hand is to determine a Good Enough solution.

I think the solution we have now is adequate and adding more layers of engineering will likely make things worse, not better.

Solution now is: If you are using common.PORT, put the test in sequential.

If someone slips up and puts the test in parallel, it will still work nearly 100% of the time. We'll probably get a failure once a month or less in CI, at which point we'll ignore it or realize what is going on and move it to sequential.

PR changed considerably after last review

mutantcornholio · 2017-06-15T15:45:56Z

Wait, shouldn't we disable port incrementing behavior in port 0?

If we're letting OS decide which port to allocate, incrementing after that port is not a good idea IMHO. User have no control on that port range => it will lead to port collisions.

If master started with --inspect=0, let's start workers with --inspect=0 too.

refack · 2017-06-15T17:33:46Z

Wait, shouldn't we disable port incrementing behavior in port 0?
If we're letting OS decide which port to allocate, incrementing after that port is not a good idea IMHO. User have no control on that port range => it will lead to port collisions.
If master started with --inspect=0, let's start workers with --inspect=0 too.

It was discussed and for now we believe that deterministic worker port allocation makes more sense. #13343 (comment)

Hopefully your second PR (manual override) will cover the other use cases.

[edit]
As for port collisions, the user should be aware that worker forking might fail, and retry anyway...

mutantcornholio · 2017-06-15T18:38:15Z

Hopefully your second PR (manual override) will cover the other use cases.

Yeah, I think, users will be able to start master with --inspect=0 and use cluster.settings to set--inspect=0 to workers.

As for port collisions, the user should be aware that worker forking might fail, and retry anyway...

I thought that whole idea of port=0 is "OS, all I want is to avoid port collision".
But if you're guys sure this is the right way, sorry to bother :)

refack · 2017-06-16T10:56:25Z

Validating that test-inspector-port-zero-cluster.js has been solved: ~~https://ci.nodejs.org/job/node-test-commit-linuxone/6673/~~https://ci.nodejs.org/job/node-test-commit-linuxone/6674/nodes=rhel72-s390x/

* re-implemented test to parse args instead of post binding (exit 12) * saved failing case as known issue PR-URL: nodejs#13373Fixes: nodejs#13343 Reviewed-By: James M Snell <[email protected]>

refack · 2017-06-16T11:17:19Z

test-inspector-port-zero-cluster.js removed since bug was fixed in 2777a7e
Pre land CI: https://ci.nodejs.org/job/node-test-commit/10603/

refack · 2017-06-16T16:19:09Z

Extra stress on master (30 cycles for all suites) https://ci.nodejs.org/job/node-stress-single-test/1306/nodes=rhel72-s390x/ ✔️

refack · 2017-06-16T23:59:45Z

But if you're guys sure this is the right way, sorry to bother :)

@mutantcornholio IMHO your input is as valuable as anyone else's 👍
"Sure" 🤣 Me definitely not "sure", but it was given some thought, and currently that's the side of the tradeoff we estimate will make most sense.

* re-implemented test to parse args instead of post binding (exit 12) * saved failing case as known issue PR-URL: #13373Fixes: #13343 Reviewed-By: James M Snell <[email protected]>

Ref: #13343

nodejs-github-bot added dont-land-on-v4.x inspector Issues and PRs related to the V8 inspector protocol test Issues and PRs related to the tests. labels Jun 1, 2017

refack mentioned this pull request Jun 1, 2017
Investigate flaky test-inspector-port-zero-cluster #13343
Closed

Trott reviewed Jun 1, 2017
View reviewed changes

jasnell approved these changes Jun 1, 2017
View reviewed changes

refack changed the title ~~test: serialize forking~~test: fix test-inspector-port-zero-clusterJun 3, 2017

refack mentioned this pull request Jun 3, 2017
test: rearrange inspector headers into convention #13428
Merged
2 tasks

bnoordhuis previously requested changes Jun 3, 2017
View reviewed changes

refack force-pushed the fix-test-inspector-port-zero-cluster branch from 2b44385 to 90dadd9Compare June 3, 2017 13:19

refack commented Jun 3, 2017
View reviewed changes

test/inspector/test-inspector-port-zero-cluster.js Outdated
Copy link
ContributorAuthor
refackJun 3, 2017
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix comment after CI finished

Trott reviewed Jun 11, 2017
View reviewed changes

refack mentioned this pull request Jun 13, 2017
Fixed cluster inspect port logic #13619
Closed
3 tasks

refack mentioned this pull request Jun 16, 2017
test: fix flaky test-inspector-port-zero-cluster #13711
Closed
2 tasks

test: fix test-inspector-port-zero-cluster
fe2caf6
* re-implemented test to parse args instead of post binding (exit 12) * saved failing case as known issue PR-URL: nodejs#13373Fixes: nodejs#13343 Reviewed-By: James M Snell <[email protected]>

refack force-pushed the fix-test-inspector-port-zero-cluster branch from 81bc47f to fe2caf6Compare June 16, 2017 11:14

refack merged commit fe2caf6 into nodejs:masterJun 16, 2017

refack deleted the fix-test-inspector-port-zero-cluster branch June 16, 2017 16:19

addaleax mentioned this pull request Jun 17, 2017
v8.2.0 proposal #13744
Merged

mutantcornholio mentioned this pull request Jun 18, 2017
Added inspect port overriding in cluster #13761
Closed
4 tasks

addaleax mentioned this pull request Jun 21, 2017
v8.1.3 proposal #13861
Merged

MylesBorins referenced this pull request Jul 17, 2017
test: mark inspector-port-zero-cluster as flaky
2a29c07
Ref: #13343

MylesBorins added the lts-watch-v6.x label Jul 17, 2017

MylesBorins added dont-land-on-v6.x and removed lts-watch-v6.x labels Aug 14, 2017

Uh oh!

test: fix test-inspector-port-zero-cluster#13373

test: fix test-inspector-port-zero-cluster #13373

Uh oh!

Conversation

refack commented Jun 1, 2017• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Affected core subsystem(s)

Uh oh!

Trott commented Jun 1, 2017

Uh oh!

refack commented Jun 1, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

refackJun 1, 2017• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Trott commented Jun 1, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

refack commented Jun 1, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Trott commented Jun 1, 2017

Uh oh!

refack commented Jun 1, 2017

Uh oh!

refack commented Jun 1, 2017

Uh oh!

refack commented Jun 1, 2017• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Trott commented Jun 1, 2017

Uh oh!

Trott commented Jun 1, 2017

Uh oh!

Trott commented Jun 1, 2017

Uh oh!

refack commented Jun 1, 2017

Uh oh!

bnoordhuis left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

refack commented Jun 3, 2017

Uh oh!

refack commented Jun 3, 2017

Uh oh!

refack commented Jun 3, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

refack commented Jun 3, 2017

Uh oh!

Trott commented Jun 3, 2017

Uh oh!

refack commented Jun 11, 2017• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

refack commented Jun 1, 2017•
edited
Loading

refackJun 1, 2017•
edited
Loading

refack commented Jun 1, 2017•
edited
Loading

refack commented Jun 11, 2017•
edited
Loading

mutantcornholio commented Jun 15, 2017•
edited
Loading

refack commented Jun 15, 2017•
edited
Loading

refack commented Jun 16, 2017•
edited
Loading