Skip to content

Conversation

@benjamingr
Copy link
Member

This change makes the eventloopdelay test less flakey on some platforms
by replacing a blocking-wait with a busy-wait which means the eventloop
will always be positive increase.

This is a partial revert of #30787 that returns the test to the state it
had originally.

Example failure:

https://ci.nodejs.org/job/node-test-commit-linuxone/30460/nodes=rhel7-lto-s390x/testReport/junit/(root)/test/sequential_test_performance_eventloopdelay_/

AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value: assert(histogram.min > 0) at Timeout.spinAWhile [as _onTimeout] (/home/iojs/build/workspace/node-test-commit-linuxone/test/sequential/test-performance-eventloopdelay.js:65:7) at listOnTimeout (node:internal/timers:559:17) at processTimers (node:internal/timers:502:7){generatedMessage: true, code: 'ERR_ASSERTION', actual: false, expected: true, operator: '==' } Node.js v18.0.0-pre 

This change makes the eventloopdelay test less flakey on some platforms by replacing a blocking-wait with a busy-wait which means the eventloop will always be positive increase. This is a partial revert of nodejs#30787 that returns the test to the state it had originally.
@benjamingrbenjamingr added test Issues and PRs related to the tests. async_hooks Issues and PRs related to the async hooks subsystem. flaky-test Issues and PRs related to the tests with unstable failures on the CI. labels Jan 27, 2022
@benjamingrbenjamingr added the request-ci Add this label to start a Jenkins CI on a PR. label Jan 27, 2022
@nodejs-github-botnodejs-github-bot added the needs-ci PRs that need a full CI run. label Jan 27, 2022
@Trott
Copy link
Member

@nodejs/testing, especially @cjihrig

Trott
Trott previously approved these changes Jan 27, 2022
Copy link
Member

@TrottTrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if CI is green

@github-actionsgithub-actionsbot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jan 27, 2022
@nodejs-github-bot

This comment has been minimized.

@targos
Copy link
Member

should we do a stress test?

@nodejs-github-bot
Copy link
Collaborator

@cjihrig
Copy link
Contributor

I'm not so sure about this fix. It's good if we can get the test passing, but I think either Node's event loop monitoring needs to be able to handle this scenario, or something else in this test is incorrect.

should we do a stress test?

I think so, yes.

@benjamingr
Copy link
MemberAuthor

I'm not so sure about this fix.

Me neither - if there is a way to synthetically get the histogram to report event loop delay that would be preferable.

@cjihrig
Copy link
Contributor

I'm curious how long this test has been flaky for. I've definitely noticed it lately, but the change being partially reverted here landed over two years ago. Maybe I just missed the failures in the past, but is it possible that a recent change introduced the flakiness?

@richardlau
Copy link
Member

I'm curious how long this test has been flaky for. I've definitely noticed it lately, but the change being partially reverted here landed over two years ago. Maybe I just missed the failures in the past, but is it possible that a recent change introduced the flakiness?

It seemed to start appearing around 22 December 2021 #41286 (comment).

@cjihrig
Copy link
Contributor

cjihrig commented Jan 27, 2022

It seemed to start appearing around 22 December 2021

23637e9 landed three days prior to that and could be worth investigating.

@richardlau
Copy link
Member

It seemed to start appearing around 22 December 2021

23637e9 landed three days prior to that and could be worth investigating.

#41286 (comment)

Walking backwards from commits landed on 22 December 2021, I've started two stress runs around 23637e9:

665b404: https://ci.nodejs.org/view/Stress/job/node-stress-single-test/306/nodes=rhel7-s390x/ (✔️ no failures in 1000 runs) 23637e9: https://ci.nodejs.org/view/Stress/job/node-stress-single-test/307/nodes=rhel7-s390x/ (❌ 100 failures in 1000 runs)

#41286 (comment) added some debug and it looks like in the failing case we end up with histogram.min being 0 which trips an assertion in the test. And at that point I don't know if it's the test or the implementation that's at fault.

@FlarnaFlarna added perf_hooks Issues and PRs related to the implementation of the Performance Timing API. and removed async_hooks Issues and PRs related to the async hooks subsystem. labels Jan 27, 2022
@TrottTrott dismissed their stale reviewJanuary 28, 2022 05:41

dismissing my review while this is discussed

@benjamingrbenjamingr added the blocked PRs that are blocked by other issues or PRs. label Jan 28, 2022
@benjamingr
Copy link
MemberAuthor

Adding 'blocked' so this doesn't land before other avenues have been explored.

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blockedPRs that are blocked by other issues or PRs.flaky-testIssues and PRs related to the tests with unstable failures on the CI.needs-ciPRs that need a full CI run.perf_hooksIssues and PRs related to the implementation of the Performance Timing API.testIssues and PRs related to the tests.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants

@benjamingr@Trott@nodejs-github-bot@targos@cjihrig@richardlau@Flarna