Skip to content

Conversation

@santigimeno
Copy link
Member

The test would sometimes time out on some platforms. Take the initial
version of the test and instead of sending 100 handles, just send 2.
It still fails when run with the code before the change was introduced
and passes afterwards.

@santigimeno
Copy link
MemberAuthor

I've checked that it passes when run on 36b969f (where the test was initially introduced) and fails with the previous commit 6ac79bc in OS X and Debian Jessie 64.

@mscdexmscdex added child_process Issues and PRs related to the child_process subsystem. test Issues and PRs related to the tests. labels Feb 6, 2016
@Trott
Copy link
Member

Trott commented Feb 6, 2016

This one has been failing a lot on pi2-raspbian-wheezy in CI a lot lately, so here's a stress test with this fix:

And here's a stress test using current master:

(CI is closed off from public view for another couple of days until the security release of node is out, but I'll summarize the results here.)

@Trott
Copy link
Member

Trott commented Feb 6, 2016

A change that just landed on master today was to mark this test as flaky in test/parallel/parallel.status so this PR should probably update that file too.

@Trott
Copy link
Member

Trott commented Feb 6, 2016

I stopped the master branch stress test because the test failed 5 times out of 9 runs. So there's the unsurprising confirmation that the test is highly flaky in current master.

With this fix in place, the stress test has run 250 times without a single failure. So there's confirmation that this fix improves the situation.

@Trott
Copy link
Member

Trott commented Feb 6, 2016

@santigimeno Once you push the change to the parallel.status file, I (or someone else) will run this through a full CI. (Nice work, as always!)

The test would sometimes time out on some platforms. Take the initial version of the test and instead of sending 100 handles, just send 2. It still fails when run with the code before the change was introduced and passes afterwards. Remove test from parallel.status.
@santigimenosantigimenoforce-pushed the child-process-fork-regr-gh-2847 branch from cf59b37 to 65de3e4CompareFebruary 6, 2016 21:53
@santigimeno
Copy link
MemberAuthor

@Trott PR updated. Thanks!

@Trott
Copy link
Member

Trott commented Feb 6, 2016

@Trott
Copy link
Member

Trott commented Feb 7, 2016

LGTM, but given the extent of the rewrite, I'd like to get a second from at least one of:

@mscdex
Copy link
Contributor

Has this been stress tested on Windows? Also, I know this commit didn't change this part, but I'd be a bit concerned about the low timeout value (especially on pi1).

@Trott
Copy link
Member

Trott commented Feb 7, 2016

Stress tests on:

console.log(err);
sendcount++;
});
// Send 2 handles to make `process.disconnect()` wait
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment. Where does the test call process.disconnect()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bnoordhuis The original version of that comment landed in 36b969f and was authored by @indutny. I think it might be referring to target.disconnect() being called in lib/internal/child_process.js in channel.onread() but I'm not actually sure.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what @Trott says. It was in the original version of the test. I can try to find out the reason

@mscdex
Copy link
Contributor

It looks like the pi1 stress test ran 0 times?

@Trott
Copy link
Member

Trott commented Feb 7, 2016

@mscdex That's what happens when I tell the test runner to run the test 000 times when I mean to type 999. Guess I better try that again. Here's hoping no typos this time: https://ci.nodejs.org/job/node-stress-single-test/442/nodes=pi1-raspbian-wheezy/console

@mhdawson
Copy link
Member

I'm ok with the change provided it still catches the original problem and it passes on AIX. CI run on AIX here: https://ci.nodejs.org/job/node-test-commit-aix/37/ (unfortunately is does take a while to run)

@mscdex
Copy link
Contributor

LGTM

@Trott
Copy link
Member

Trott commented Feb 9, 2016

I confirmed that the test still fails (as expected) in Node.js v3.3.1 (with ES6-isms removed from common.js) and passes in Node.js v4.0.0. (This is a bit redundant because @santigimeno already confirmed that it fails with the fix commit backed out.)

LGTM but awaiting AIX results...

@mhdawson
Copy link
Member

lgtm tests still passs on AIX.

Trott pushed a commit to Trott/io.js that referenced this pull request Feb 10, 2016
The test would sometimes time out on some platforms. Take the initial version of the test and instead of sending 100 handles, just send 2. It still fails when run with the code before the change was introduced and passes afterwards. Remove test from parallel.status. PR-URL: nodejs#5121 Reviewed-By: Brian White <[email protected]> Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Fixes: nodejs#3635
@Trott
Copy link
Member

Landed in 4e4b260

Feels good to get that one off the flaky list...Thanks, @santigimeno !

@TrottTrott closed this Feb 10, 2016
rvagg pushed a commit that referenced this pull request Feb 15, 2016
The test would sometimes time out on some platforms. Take the initial version of the test and instead of sending 100 handles, just send 2. It still fails when run with the code before the change was introduced and passes afterwards. Remove test from parallel.status. PR-URL: #5121 Reviewed-By: Brian White <[email protected]> Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Fixes: #3635
MylesBorins pushed a commit that referenced this pull request Feb 18, 2016
The test would sometimes time out on some platforms. Take the initial version of the test and instead of sending 100 handles, just send 2. It still fails when run with the code before the change was introduced and passes afterwards. Remove test from parallel.status. PR-URL: #5121 Reviewed-By: Brian White <[email protected]> Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Fixes: #3635
MylesBorins pushed a commit that referenced this pull request Feb 18, 2016
The test would sometimes time out on some platforms. Take the initial version of the test and instead of sending 100 handles, just send 2. It still fails when run with the code before the change was introduced and passes afterwards. Remove test from parallel.status. PR-URL: #5121 Reviewed-By: Brian White <[email protected]> Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Fixes: #3635
@MylesBorinsMylesBorins mentioned this pull request Feb 18, 2016
MylesBorins pushed a commit that referenced this pull request Mar 2, 2016
The test would sometimes time out on some platforms. Take the initial version of the test and instead of sending 100 handles, just send 2. It still fails when run with the code before the change was introduced and passes afterwards. Remove test from parallel.status. PR-URL: #5121 Reviewed-By: Brian White <[email protected]> Reviewed-By: Rich Trott <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Fixes: #3635
@refackrefack mentioned this pull request Apr 27, 2017
3 tasks
refack added a commit to refack/node that referenced this pull request May 19, 2017
According to the explanation in nodejs#3635#issuecomment-157714683 And as a continuation to nodejs#5422 we also ignore EMFILE "No more file descriptors are available,so no more files can be opened" PR-URL: nodejs#12698Fixes: nodejs#10286 Refs: nodejs#3635 (comment) Refs: nodejs#5178 Refs: nodejs#5179 Refs: nodejs#4005 Refs: nodejs#5121 Refs: nodejs#5422 Refs: nodejs#12621 (comment) Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
MylesBorins pushed a commit that referenced this pull request Jul 17, 2017
According to the explanation in #3635#issuecomment-157714683 And as a continuation to #5422 we also ignore EMFILE "No more file descriptors are available,so no more files can be opened" PR-URL: #12698Fixes: #10286 Refs: #3635 (comment) Refs: #5178 Refs: #5179 Refs: #4005 Refs: #5121 Refs: #5422 Refs: #12621 (comment) Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

child_processIssues and PRs related to the child_process subsystem.testIssues and PRs related to the tests.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants

@santigimeno@Trott@mscdex@mhdawson@bnoordhuis@jasnell@MylesBorins