Skip to content

Conversation

@MoLow
Copy link
Member

@MoLowMoLow commented Feb 19, 2023

Fixes: #46747

I believe this will address these test failures:
https://ci.nodejs.org/job/node-test-binary-windows-js-suites/19156/

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/test_runner

@MoLowMoLow added flaky-test Issues and PRs related to the tests with unstable failures on the CI. fast-track PRs that do not need to wait for 48 hours to land. test_runner Issues and PRs related to the test runner subsystem. labels Feb 19, 2023
@github-actions

This comment was marked as outdated.

@nodejs-github-botnodejs-github-bot added dont-land-on-v14.x needs-ci PRs that need a full CI run. labels Feb 19, 2023
@MoLowMoLow added the request-ci Add this label to start a Jenkins CI on a PR. label Feb 19, 2023
@github-actionsgithub-actionsbot removed the request-ci Add this label to start a Jenkins CI on a PR. label Feb 19, 2023
@nodejs-github-bot
Copy link
Collaborator

@MoLow
Copy link
MemberAuthor

MoLow commented Feb 19, 2023

@nodejs/test_runner this only solves one issue, there seems to be another issue with the process not being alive long enough to pipe the entire stream, see
https://ci.nodejs.org/job/node-test-commit-plinux/nodes=rhel8-ppc64le/47011/
and #22088 (comment)

not ok 9 - should support a 'js' file as a custom reporter 21:45:52 --- 21:45:52 duration_ms: 1125.36918 21:45:52 failureType: 'testCodeFailure' 21:45:52 error: |- 21:45:52 The input did not match the regular expression /{"test:start":4,"test:pass":2,"test:fail":2,"test:plan":2,"test:diagnostic":\d+}$/. Input: 21:45:52 21:45:52 'custom.js{"test:start":1}' 21:45:52 21:45:52 code: 'ERR_ASSERTION' 21:45:52 actual: 'custom.js{"test:start":1}' 21:45:52 operator: 'match' 21:45:52 stack: |- 21:45:52 Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-plinux/nodes/rhel8-ppc64le/test/parallel/test-runner-reporters.js:93:14) 21:45:52 ItTest.runInAsyncScope (node:async_hooks:206:9) 21:45:52 ItTest.run (node:internal/test_runner/test:552:25) 21:45:52 ItTest.start (node:internal/test_runner/test:468:17) 21:45:52 node:internal/test_runner/test:811:71 21:45:52 node:internal/per_context/primordials:470:82 21:45:52 new Promise (<anonymous>) 21:45:52 new SafePromise (node:internal/per_context/primordials:438:29) 21:45:52 node:internal/per_context/primordials:470:9 21:45:52 Array.map (<anonymous>) 

I am trying to work on the additional issue on a separate PR

@nodejs-github-bot
Copy link
Collaborator

@cjihrig
Copy link
Contributor

Should we wait for the child process's 'close' event instead of 'exit'?

@MoLow
Copy link
MemberAuthor

MoLow commented Feb 19, 2023

Should we wait for the child process's 'close' event instead of 'exit'?

once the child process is not alive, transforming the output asynchronously is not enough for keeping the main process alive

I have solved that like this:

@@ -123,13 +125,16 @@ function setup(root){const rejectionHandler = createProcessEventHandler('unhandledRejection', root); const coverage = configureCoverage(root); - const exitHandler = () =>{+ const exitHandler = async () =>{ root.coverage = collectCoverage(root, coverage); root.postRun(new ERR_TEST_FAILURE( 'Promise resolution is still pending but the event loop has already resolved', kCancelledByParent)); hook.disable(); + const handle = setInterval(() =>{}, 1000);+ await finished(root.reporter);+ clearInterval(handle); process.removeListener('unhandledRejection', rejectionHandler); process.removeListener('uncaughtException', exceptionHandler)};

but that is kind of a hack

@MoLow
Copy link
MemberAuthor

if anyone wants to tackle this it reproduces running tools/test.py -J test/parallel/test-runner-reporters.js --repeat 1000

@cjihrig
Copy link
Contributor

it reproduces running

Interestingly, I haven't been able to get it to reproduce on my machine (M1 mac).

but that is kind of a hack

Definitely a hack, but if there is no way to keep the process alive, I'd be OK with doing something like that, but I think it should exist in run() and the interval period should be large.

@MoLow
Copy link
MemberAuthor

Interestingly, I haven't been able to get it to reproduce on my machine (M1 mac).

Mine is a M1 mac as well 😕

@MoLow
Copy link
MemberAuthor

I think it should exist in run()

@cjihrig I have tried that but it creates a deadlock - adding a timer prevents the process from reaching beforeExit since the event loop hasn't ended,
but the reporter/s are not ended/drained yet since root.postRun is only called within beforeExit
WDYT?

@MoLowMoLowforce-pushed the wait-for-parser-to-finish branch from afc52f4 to 6a24768CompareFebruary 21, 2023 06:42
Comment on lines 653 to 656
// In case the event loop has ended and reporter has not drained,
// we use a timer to keep the process alive until the reporter is done.
consthandle=setInterval(()=>{},TIMEOUT_MAX);
this.reporter.once('close',()=>clearInterval(handle));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the stream still working but not keeping the process alive? Is there a resource somewhere that we unref()d ?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #22088 (comment)
the reporters transform the output after the child process are done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, do we have any idea which change introduced the current flakiness? After the CI was unlocked from the security release, a flurry of changes landed and then this test became flaky. Are we able to bisect it?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am bisecting now

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cjihrig according to git bisect flakiness started after b4a962d landed, so it would have probably become flaky even without the latest test runner changes.
CC @debadree25 @nodejs/streams

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, investigating then, could you guide me if at any place in the test_runner code where pipeline maybe being used?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the timing condition we might be facing is within setupTestReporters
where getReportersMap is fulfilled after all the tests completed running

asyncfunctionsetupTestReporters(testsStream){
const{ reporters, destinations }=parseCommandLine();
constreportersMap=awaitgetReportersMap(reporters,destinations);
for(leti=0;i<reportersMap.length;i++){
const{ reporter, destination }=reportersMap[i];
compose(testsStream,reporter).pipe(destination);
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah understood, thank you checking this

Copy link
MemberAuthor

@MoLowMoLowFeb 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record: I think the issue is in the test runner, not in change performed to streams

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far no success in thinking with reference to b4a962d only think that comes to mind is if anyway testsStream ended before somehow? but that also doesn't hold up much 😕 nonetheless will try explore a little more just incase

@MoLowMoLow removed the fast-track PRs that do not need to wait for 48 hours to land. label Feb 21, 2023
@MoLowMoLowforce-pushed the wait-for-parser-to-finish branch from 6a24768 to 6ea5d9cCompareFebruary 21, 2023 18:00
@MoLowMoLow removed the request for review from benjamingrFebruary 21, 2023 18:00
Copy link
Contributor

@cjihrigcjihrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if the CI passes - definitely nicer than adding a keep alive timer.

@MoLowMoLowforce-pushed the wait-for-parser-to-finish branch from d92b981 to d326348CompareFebruary 21, 2023 18:10
@MoLowMoLow added request-ci Add this label to start a Jenkins CI on a PR. author ready PRs that have at least one approval, no pending requests for changes, and a CI started. labels Feb 21, 2023
@github-actionsgithub-actionsbot removed the request-ci Add this label to start a Jenkins CI on a PR. label Feb 21, 2023
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@MoLowMoLow added the commit-queue Add this label to land a pull request using GitHub Actions. label Feb 21, 2023
@nodejs-github-botnodejs-github-bot added commit-queue-failed An error occurred while landing this pull request using GitHub Actions. and removed commit-queue Add this label to land a pull request using GitHub Actions. labels Feb 21, 2023
@nodejs-github-bot
Copy link
Collaborator

Commit Queue failed
- Loading data for nodejs/node/pull/46737 ✔ Done loading data for nodejs/node/pull/46737 ----------------------------------- PR info ------------------------------------ Title test_runner: bootstrap reporters before running tests (#46737) Author Moshe Atlow (@MoLow) Branch MoLow:wait-for-parser-to-finish -> nodejs:main Labels flaky-test, author ready, needs-ci, dont-land-on-v14.x, test_runner Commits 1 - test_runner: bootstrap reporters before running tests Committers 1 - Moshe Atlow PR-URL: https://github.com/nodejs/node/pull/46737 Fixes: https://github.com/nodejs/node/issues/46747 Reviewed-By: Benjamin Gruenbaum Reviewed-By: Colin Ihrig ------------------------------ Generated metadata ------------------------------ PR-URL: https://github.com/nodejs/node/pull/46737 Fixes: https://github.com/nodejs/node/issues/46747 Reviewed-By: Benjamin Gruenbaum Reviewed-By: Colin Ihrig -------------------------------------------------------------------------------- ℹ This PR was created on Sun, 19 Feb 2023 19:16:17 GMT ✔ Approvals: 2 ✔ - Benjamin Gruenbaum (@benjamingr): https://github.com/nodejs/node/pull/46737#pullrequestreview-1308076791 ✔ - Colin Ihrig (@cjihrig) (TSC): https://github.com/nodejs/node/pull/46737#pullrequestreview-1307932394 ✖ GitHub CI is still running ℹ Last Full PR CI on 2023-02-21T19:30:15Z: https://ci.nodejs.org/job/node-test-pull-request/49848/ - Querying data for job/node-test-pull-request/49848/ ✔ Last Jenkins CI successful -------------------------------------------------------------------------------- ✔ Aborted `git node land` session in /home/runner/work/node/node/.ncu
https://github.com/nodejs/node/actions/runs/4236769504

MoLow added a commit that referenced this pull request Feb 21, 2023
PR-URL: #46737Fixes: #46747 Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
@MoLowMoLow merged commit b6f5a67 into nodejs:mainFeb 21, 2023
@MoLow
Copy link
MemberAuthor

Landed in ce49f79

@MoLowMoLow deleted the wait-for-parser-to-finish branch February 21, 2023 21:14
MoLow added a commit to MoLow/node that referenced this pull request Feb 25, 2023
PR-URL: nodejs#46737Fixes: nodejs#46747 Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
MoLow added a commit to MoLow/node that referenced this pull request Feb 25, 2023
PR-URL: nodejs#46737Fixes: nodejs#46747 Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
@MoLowMoLow added backport-open-v18.x and removed commit-queue-failed An error occurred while landing this pull request using GitHub Actions. labels Feb 25, 2023
MoLow added a commit to MoLow/node that referenced this pull request Feb 25, 2023
PR-URL: nodejs#46737Fixes: nodejs#46747 Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
juanarbol pushed a commit that referenced this pull request Mar 3, 2023
PR-URL: #46737 Backport-PR-URL: #46839Fixes: #46747 Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
@juanarboljuanarbol mentioned this pull request Mar 3, 2023
juanarbol pushed a commit that referenced this pull request Mar 5, 2023
PR-URL: #46737 Backport-PR-URL: #46839Fixes: #46747 Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
targos pushed a commit that referenced this pull request Mar 13, 2023
PR-URL: #46737Fixes: #46747 Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
targos pushed a commit that referenced this pull request Mar 14, 2023
PR-URL: #46737Fixes: #46747 Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: Colin Ihrig <[email protected]>
@targostargos mentioned this pull request Mar 14, 2023
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

author readyPRs that have at least one approval, no pending requests for changes, and a CI started.flaky-testIssues and PRs related to the tests with unstable failures on the CI.needs-ciPRs that need a full CI run.test_runnerIssues and PRs related to the test runner subsystem.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

parallel/test-runner-reporters is flaking

6 participants

@MoLow@nodejs-github-bot@cjihrig@benjamingr@targos@debadree25