Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34.2k
test: improve test-child-process-fork-dgram#8697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
mhdawson commented Sep 21, 2016 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
imyller left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
gibfahn commented Sep 21, 2016
Commit message nit:
Maybe: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/guarranteed/guaranteed/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok will update commit message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mhdawson The typo is in the test comment on line 12.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to add the serverPort variable? It only seems to be used here, and makes things a bit more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its needed because the server is closed before we have stopped sending. Therefore, you can't call server.address().port. I found this the hard way while building up the changeset.
gibfahn commented Sep 21, 2016
So if we're now closing the child/parent server once they receive a message, the test should finish after two messages unless some are disappearing (this is UDP so I guess messages can fail). I haven't seen any dropped messages at all in any of the test runs, so having a 45s internal timeout (i.e. 45,000 messages) may be overkill? Changes LGTM if CI passes (probably also worth running a stress test on AIX as well to make sure). |
gibfahn commented Sep 21, 2016
Also just to document, this fixes #8271 |
mhdawson commented Sep 22, 2016
Already ran the stress run of 500 on AIX and all passed versus about at 1/10 failure before. CI run here: https://ci.nodejs.org/job/node-test-pull-request/4223/ |
mhdawson commented Sep 22, 2016
In terms of the timeout as long as it is shorter than the 60s from test harness I don't think it really matters as we will normally finish fast anyway. |
mhdawson commented Sep 22, 2016
Commit message updated. |
mhdawson commented Sep 22, 2016 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Lots of failures in the CI run, but none related to the test being changed. Several test tick processor related timeouts as well as checkouts failing across a number of platforms. |
mhdawson commented Sep 22, 2016
Another CI run: https://ci.nodejs.org/job/node-test-pull-request/4225/ |
mhdawson commented Sep 22, 2016
Ok second CI run only had failures (windows and smartos) related to test-tick-processor which were pre-existing and not related to the changes to this test. |
gibfahn commented Sep 23, 2016
@mhdawson You missed a nit on L20, otherwise LGTM. @misterdjules LGTY? (also @AndreasMadsen , but feel free to ignore if you don't think you could usefully review). |
AndreasMadsen commented Sep 23, 2016
@gibfahn Could you explain why this comment isn't a concern here:
|
gibfahn commented Sep 23, 2016
@AndreasMadsen I think the point is that due to the unreliable nature of the protocol, we have no grounds to expect that messages will reach both while both are listening. The system may route all messages to either child or parent. |
Previous implementation was subject to timing out as there was no guarantee that the parent or child would ever receive a message. Modified test so that once parent or child receives a message it closes its connection to the socket so we can guarantee the next messsage will be recieved by the other node. The test also left lingering processes when it timed out so pulled the timeout into the test itself so that it could properly cleanup in case of failure.
mhdawson commented Sep 23, 2016
@gibfahn thats for the catch, think I got the nit this time :) |
gibfahn commented Sep 23, 2016
LGTM |
imyller commented Sep 23, 2016
mhdawson commented Oct 11, 2016
There are 2 hung processes on test-osuosl-aix61-ppc64_be-1, one from Oct 3 and one from Oct 7 so I think the problem still exists unless it was fixed by something after Oct 7. |
mhdawson commented Oct 11, 2016
@gibfahn I think the two get separate copies of the handle so one closing may not affect the other. It should however be possible to write a test that validates it one way or the other by making sure it either passes/fails after one side has done the close. |
mhdawson commented Oct 11, 2016
Lastest process on test-osuosl-aix61-ppc64_be-1 is from Oct 8th. |
Trott commented Oct 13, 2016
Annnnnddddd....now it's failing consistently on freebsd in CI. So, this is probably a thing again. I guess someone should write some code to confirm the speculation in #8697 (comment) that the two copies get separate handles and then, if so, this can get a rebase and land? Or am I mistaken? /cc @mhdawson@gibfahn |
gibfahn commented Oct 13, 2016
@Trott Well we need to work out why it's failing on macOS 10.12 first I think |
Trott commented Oct 13, 2016
Ooof. OK, for fun, or at least "fun", CI stress test across all available platforms: https://ci.nodejs.org/job/node-stress-single-test/999/ |
mhdawson commented Oct 13, 2016
@gibfahn and I did discuss writing the test to confirm close behavior across processes. I think he'll be looking at that and the failures on macOS 10.12 soon. |
santigimeno commented Oct 13, 2016
I think the |
gibfahn commented Oct 14, 2016
@santigimeno This might well be the reason that the test was failing on AIX as well. If it fixes the test for all platforms, then I think it is preferable over this PR. |
Trott commented Oct 14, 2016
@santigimeno@gibfahn That unfortunately does not fix it on FreeBSD on our CI, but it is a good thing to add to the test anyway, IMO. And if it clears it up on AIX and/or macos, awesome. |
Trott commented Oct 14, 2016 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Perhaps this is a libuv issue ultimately? EDIT: Eh, or maybe not... |
`test-child-process-fork-dgram` is unreliable on some platforms, especially FreeBSD and AIX within the project's continuous integration testing. It has also been observed to be flaky on macos. * Confirm child has received the server before sending packets * Close the server instance on the parent or child after receiving a Refs: nodejs#8697Fixes: nodejs#8949Fixes: nodejs#8271
Trott commented Oct 14, 2016
Alternative fix: #9098 Haven't tested it yet on AIX. About to run CI. But if someone is doing these tests locally and whatnot, maybe give it a shot? That alternative definitely works on macos and works on FreeBSD. |
`test-child-process-fork-dgram` is unreliable on some platforms, especially FreeBSD and AIX within the project's continuous integration testing. It has also been observed to be flaky on macos. * Confirm child has received the server before sending packets * Close the server instance on the parent or child after receiving a Refs: nodejs#8697Fixes: nodejs#8949Fixes: nodejs#8271 PR-URL: nodejs#9098 Reviewed-By: Santiago Gimeno <[email protected]>
c133999 to 83c7a88Comparemhdawson commented Oct 18, 2016
Closing since alternative has landed |
`test-child-process-fork-dgram` is unreliable on some platforms, especially FreeBSD and AIX within the project's continuous integration testing. It has also been observed to be flaky on macos. * Confirm child has received the server before sending packets * Close the server instance on the parent or child after receiving a Refs: #8697Fixes: #8949Fixes: #8271 PR-URL: #9098 Reviewed-By: Santiago Gimeno <[email protected]>
`test-child-process-fork-dgram` is unreliable on some platforms, especially FreeBSD and AIX within the project's continuous integration testing. It has also been observed to be flaky on macos. * Confirm child has received the server before sending packets * Close the server instance on the parent or child after receiving a Refs: nodejs#8697Fixes: nodejs#8949Fixes: nodejs#8271 PR-URL: nodejs#9098 Reviewed-By: Santiago Gimeno <[email protected]>
`test-child-process-fork-dgram` is unreliable on some platforms, especially FreeBSD and AIX within the project's continuous integration testing. It has also been observed to be flaky on macos. * Confirm child has received the server before sending packets * Close the server instance on the parent or child after receiving a Refs: nodejs#8697Fixes: nodejs#8949Fixes: nodejs#8271 PR-URL: nodejs#9098 Reviewed-By: Santiago Gimeno <[email protected]>
`test-child-process-fork-dgram` is unreliable on some platforms, especially FreeBSD and AIX within the project's continuous integration testing. It has also been observed to be flaky on macos. * Confirm child has received the server before sending packets * Close the server instance on the parent or child after receiving a Refs: #8697Fixes: #8949Fixes: #8271 PR-URL: #9098 Reviewed-By: Santiago Gimeno <[email protected]>
`test-child-process-fork-dgram` is unreliable on some platforms, especially FreeBSD and AIX within the project's continuous integration testing. It has also been observed to be flaky on macos. * Confirm child has received the server before sending packets * Close the server instance on the parent or child after receiving a Refs: #8697Fixes: #8949Fixes: #8271 PR-URL: #9098 Reviewed-By: Santiago Gimeno <[email protected]>
Checklist
make -j4 test(UNIX), orvcbuild test nosign(Windows) passesAffected core subsystem(s)
test
Description of change
Previous implementation was subject to timing out as there
was no guarantee that the parent or child would ever
receive a message. Modified test so that once parent or
child receives a message it stops receiving them so that
we are guaranteed the next message will be received by the
other end.
The test also left lingering processes when it timed out
so pulled the timeout into the test itself so that it
could properly cleanup in case of failure.