Flaky Test Management

Flaky tests — tests that sometimes pass and sometimes fail without code changes — create noise in CI. DAGZ detects and manages them automatically.

Test Selection and Flakiness

Flaky test handling is a great "bonus feature" of test selection. If a code change didn't impact a flaky test, it just won't run and won't clutter the job's results.

If a code change did indeed cause a flaky test to run, the committer is more likely to have context to investigate and fix it — instead of just ignoring it as "flaky" noise.

DAGZ considers tests flaky when they fail on the first attempt but pass on a retry, on the same job.

Retry Mechanism

When a test fails, DAGZ retries it in two phases:

Immediate retry — the test is re-run right away, without teardown. This catches transient failures (timing issues, network blips) with minimal overhead.
Deferred retry — at the end of the session, failed tests are retried again with full setup/teardown. This catches failures caused by leaked state from other tests.

Safety limits

Max 10% of total tests — if more than 10% of tests are failing, retries stop. The failures are probably real, not flaky.
Skip long-running tests — tests that take a disproportionate amount of time are not retried. The cost isn't worth it.
Up to 2 retries per test — one immediate, one deferred.

Result Classification

Result	Meaning	Counts as
`OK`	Passed on first attempt	Passed
`OK_FLAKY`	Failed, then passed on retry	Passed (flagged as flaky)
`FAILED`	Failed on all attempts	Failed

OK_FLAKY tests don't block the pipeline, but they're tracked separately.

Dashboard

The dashboard provides:

Top flaky tests — ranked by flaky rate, the tests that fail-then-pass most often
Historical execution view — full logs and failure trend for each test over time
Per-job flaky count — each job shows how many tests were flaky

pytest Integration

DAGZ respects pytest markers:

@pytest.mark.xfail — expected failures are tracked correctly (XFAIL when they fail as expected, XPASS when they unexpectedly pass)
XPASS results are usually groups with flaky tests - they indicate tests that are expected to fail but are currently passing, which can be a sign of flakiness or a fix in progress.
Retried tests are assigned to any available worker, so they get a clean process — which often resolves state-related flakiness

Test Selection and Flakiness​

Retry Mechanism​

Safety limits​

Result Classification​

Dashboard​

pytest Integration​