Skip to content

Conversation

@mhsmith
Copy link
Member

@mhsmithmhsmith commented Aug 31, 2025

Original PR title: "Change CI arguments: remove --randomize, add --fail-rerun"

--randomize can discover real problems, but they're often ordering dependencies between tests, which are difficult to diagnose, and usually have nothing to do with the PR on which they occur.

I assume this was the reason for removing --fail-rerun from the --fast-ci and --slow-ci arguments in #110849. However, this renders --randomize almost useless, because the failing test will usually pass on the rerun in a fresh process, and nobody will ever know that there was a failure.

Also, --rerun without --fail-rerun means that a test which ALWAYS fails the first time and passes the second time will still be treated as a pass. This seems unsafe.

So I propose removing --randomize and restoring --fail-rerun.

This will also allow iOS and Android to switch to --fast-ci on GitHub Actions and --slow-ci on the buildbots. They were previously unable to do this because of the frequent failures caused by --randomize, which were not hidden on the rerun because these platforms use --single-process mode.

@mhsmithmhsmith requested a review from vstinnerAugust 31, 2025 18:17
@mhsmithmhsmith added skip news needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels Aug 31, 2025
@mhsmithmhsmith removed the request for review from freakboy3742August 31, 2025 18:17
@picnixzpicnixz added the infra CI, GitHub Actions, buildbots, Dependabot, etc. label Aug 31, 2025
@mhsmithmhsmith changed the title Change CI arguments: remove --randomize, add --fail-rerungh-137242: Change CI arguments: remove --randomize, add --fail-rerunAug 31, 2025
@mhsmith
Copy link
MemberAuthor

mhsmith commented Aug 31, 2025

Also, --rerun without --fail-rerun means that a test which ALWAYS fails the first time and passes the second time will still be treated as a pass.

It looks like some tests are already doing that on some runners. I'll switch this PR back to draft until this has been resolved.

@mhsmithmhsmith marked this pull request as draft August 31, 2025 18:57
Copy link
Member

@vstinnervstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike changing CI defaults.

# -j0 --fail-env-changed --rerun --fail-rerun --slowest --verbose3
ifns.use_mpisNone:
ns.use_mp=0
ns.randomize=True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep randomization for --fast-ci and --slow-ci options.

Copy link
Member

@vstinnervstinnerSep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer adding an option to disable randomization, option which could be used with --fast-ci / --slow-ci.

ns.fail_env_changed=True
ifns.pythonisNone:
ns.rerun=True
ns.fail_rerun=True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can already pass --fail-rerun. Making it the default would make all CIs way more strict, I dislike this idea. When I tried a few years ago, I discovered tons of flaky tests and it was a pain to fix all of them.

@mhsmithmhsmith changed the title gh-137242: Change CI arguments: remove --randomize, add --fail-rerungh-137242: Add a --no-randomize option, and use it in Android CISep 8, 2025
self.list_tests=False
self.single=False
self.randomize=False
self.randomize=None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer avoiding None for randomize and keep a boolean instead. I wrote #138649 which is a clone of this PR with my additional change avoiding None. What do you think?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense; I'll close this PR in favor of that one.

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infraCI, GitHub Actions, buildbots, Dependabot, etc.needs backport to 3.13bugs and security fixesneeds backport to 3.14bugs and security fixesskip news

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

@mhsmith@vstinner@picnixz