Uh oh!
There was an error while loading. Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork 34.2k
build: clear stalled jobs on POSIX CI hosts#11246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uh oh!
There was an error while loading. Please reload this page.
Conversation
Trott commented Feb 8, 2017 • edited
Loading Uh oh!
There was an error while loading. Please reload this page.
edited
Uh oh!
There was an error while loading. Please reload this page.
Makefile Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about simply using pkill instead? Isn't that available on all *nix platforms in CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not on our AIX hosts.
e537333 to f402ec6CompareTrott commented Feb 8, 2017
Trott commented Feb 8, 2017
CI is green. Removing /cc @nodejs/build for reviews |
mhdawson left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for this, we've had to clean the AIX machines several times in the last few days so this is great to see.
Trott commented Feb 9, 2017
I'm terrible at grok'ing Makefile stuff. I put this as an order-only prerequisite. If someone knowledgable could indicate whether that's the right thing to do or not in this case, that would be appreciated. |
richardlau commented Feb 9, 2017
The If the target is declared phony, I'm not sure that it matters if it is order-only or not as a prerequisite. |
gibfahn commented Feb 9, 2017
Related to nodejs/build#591 As discussed in that issue, I'd rather have something in the test runner that kept track of processes that hadn't been cleaned up at the end of each run, and failed the job if there were any. However I do think this is better than what we currently have (later jobs randomly failing and Currently I don't think this is logging what processes it is cleaning up though, which is information I do think we need, so I'd be -1 on this without that. I'm also not sure under what circumstances processes get left behind, doesn't tools/test.py clean up once the TIMEOUT happens? I assumed it was for orphaned node subprocesses, but I'm not sure. |
Makefile Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could just add the same line up to the awk above to list the processes that will be killed, i.e.:
ps awwwx | grep Release/node | grep -v grep ps awwwx | grep Release/node | grep -v grep | awk '{print $$1}'|$(XARGS)killTrott commented Feb 9, 2017
That seems right to me. Always seems to be subprocesses from cluster or debug tests. |
Trott commented Feb 9, 2017
Implemented changes per @richardlau and @gibfahn. |
gibfahn left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code addition LGTM, no idea about the Makefile syntax
santigimeno commented Feb 9, 2017
I agree on this. If a test is leaving processes behind I think it should fail.
I also agree on this. |
gibfahn commented Feb 9, 2017
@santigimeno So with the latest update this will now list the processes it's killing, so I'd say this is an improvement over what we currently do. Are you -1 on this landing as a step in the right direction? If we could fix |
santigimeno commented Feb 9, 2017
Sorry, I had overlooked the |
Trott commented Feb 9, 2017
Updated Re-running CI: https://ci.nodejs.org/job/node-test-pull-request/6317/ |
Trott commented Feb 9, 2017
@santigimeno wrote:
Queued up for a subsequent PR: Trott@48693fc |
Sometimes, after a cluster or debug test fails, a fixture hangs around and holds onto a needed port, causing subsequent CI runs to fail. This adds a command I've been running manually when this occurs. The command will clear the stalled jobs before a CI run. PR-URL: nodejs#11246 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Josh Gavant <[email protected]>
Trott commented Feb 10, 2017
Landed in 90ab68b |
Sometimes, after a cluster or debug test fails, a fixture hangs around and holds onto a needed port, causing subsequent CI runs to fail. This adds a command I've been running manually when this occurs. The command will clear the stalled jobs before a CI run. PR-URL: #11246 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Josh Gavant <[email protected]>
Sometimes, after a cluster or debug test fails, a fixture hangs around and holds onto a needed port, causing subsequent CI runs to fail. This adds a command I've been running manually when this occurs. The command will clear the stalled jobs before a CI run. PR-URL: nodejs#11246 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Josh Gavant <[email protected]>
Sometimes, after a cluster or debug test fails, a fixture hangs around and holds onto a needed port, causing subsequent CI runs to fail. This adds a command I've been running manually when this occurs. The command will clear the stalled jobs before a CI run. PR-URL: nodejs#11246 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Josh Gavant <[email protected]>
jasnell commented Mar 7, 2017
would need backport PRs to land on v6 and v4 |
gibfahn commented Jun 17, 2017
Sometimes, after a cluster or debug test fails, a fixture hangs around and holds onto a needed port, causing subsequent CI runs to fail. This adds a command I've been running manually when this occurs. The command will clear the stalled jobs before a CI run. PR-URL: nodejs#11246 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Josh Gavant <[email protected]>
Trott commented Jun 18, 2017
Sometimes, after a cluster or debug test fails, a fixture hangs around and holds onto a needed port, causing subsequent CI runs to fail. This adds a command I've been running manually when this occurs. The command will clear the stalled jobs before a CI run. PR-URL: #11246 Backport-PR-URL: #13754 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Josh Gavant <[email protected]>
Sometimes, after a cluster or debug test fails, a fixture hangs around and holds onto a needed port, causing subsequent CI runs to fail. This adds a command I've been running manually when this occurs. The command will clear the stalled jobs before a CI run. PR-URL: #11246 Backport-PR-URL: #13754 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Josh Gavant <[email protected]>
Sometimes, after a cluster or debug test fails, a fixture hangs around and holds onto a needed port, causing subsequent CI runs to fail. This adds a command I've been running manually when this occurs. The command will clear the stalled jobs before a CI run. PR-URL: #11246 Backport-PR-URL: #13754 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Josh Gavant <[email protected]>
Sometimes, after a cluster or debug test fails, a fixture hangs around
and holds onto a needed port, causing subsequent CI runs to fail. This
adds a command I've been running manually when this occurs. The command
will clear the stalled jobs before a CI run.
Checklist
make -j4 test(UNIX), orvcbuild test(Windows) passesAffected core subsystem(s)
build