Flaky Test Management
Flaky tests — tests that sometimes pass and sometimes fail without code changes — create noise in CI. DAGZ detects and manages them automatically.
Test Selection and Flakiness
Flaky test handling is a great "bonus feature" of test selection. If a code change didn't impact a flaky test, it just won't run and won't clutter the job's results.
If a code change did indeed cause a flaky test to run, the committer is more likely to have context to investigate and fix it — instead of just ignoring it as "flaky" noise.
DAGZ considers tests flaky when they fail on the first attempt but pass on a retry, on the same job.
Retry Mechanism
When a test fails, DAGZ retries it in two phases:
- Immediate retry — the test is re-run right away, without teardown. This catches transient failures (timing issues, network blips) with minimal overhead.
- Deferred retry — at the end of the session, failed tests are retried again with full setup/teardown. This catches failures caused by leaked state from other tests.
Safety limits
- Max 10% of total tests — if more than 10% of tests are failing, retries stop. The failures are probably real, not flaky.
- Skip long-running tests — tests that take a disproportionate amount of time are not retried. The cost isn't worth it.
- Up to 2 retries per test — one immediate, one deferred.
Result Classification
| Result | Meaning | Counts as |
|---|---|---|
OK | Passed on first attempt | Passed |
OK_FLAKY | Failed, then passed on retry | Passed (flagged as flaky) |
FAILED | Failed on all attempts | Failed |
OK_FLAKY tests don't block the pipeline, but they're tracked separately.
Dashboard
The dashboard provides:
- Top flaky tests — ranked by flaky rate, the tests that fail-then-pass most often
- Historical execution view — full logs and failure trend for each test over time
- Per-job flaky count — each job shows how many tests were flaky
pytest Integration
DAGZ respects pytest markers:
@pytest.mark.xfail— expected failures are tracked correctly (XFAILwhen they fail as expected,XPASSwhen they unexpectedly pass)XPASSresults are usually groups with flaky tests - they indicate tests that are expected to fail but are currently passing, which can be a sign of flakiness or a fix in progress.- Retried tests are assigned to any available worker, so they get a clean process — which often resolves state-related flakiness