Blog

“That some of us should venture to embark on a synthesis of facts and theories, albeit with second-hand and incomplete knowledge of some of them – and at the risk of making fools of ourselves” (Erwin Schrödinger)

Reproduce Flaky Tests in pytest

At work, I occasionally run into flaky tests in CI that prove troublesome to reproduce. Running the test by itself does not fail, nor does running all tests in the module. Sometimes it takes running the entire test suite to trigger the failure - a behemoth at work, currently taking about 10 minutes parallelized 🫠 Clearly this isn't a feasible approach to debugging the test.

My theory for why these tests usually fail in the test suite but not in isolation is randomness, or more specifically, the randomness within the test data generation. We use faker to generate fake data for our Django models. Importantly, we fix the faker seed at the start of the test run. For a given random seed, the flaky test may pass or fail.

This has been borne out in practice. Usually the test is flaking for some specific random value. In one case, it was due to a uuid having a specific substring, b2b.

Reproducing flaky tests

Rather than running the entire test suite, I came up with the idea of re-running the flaky test using a pytest parameter:

@pytest.mark.parametrize("iterations", range(5))
@pytest.mark.django_db
def test_flaky_test(iterations):
    pass

Run just the flaky test with pytest -k path.to.test_flaky_test.

Sometimes a low number of iterations like 5 suffices, occasionally 100 or even 1,000 iterations are necessary to flake. Once the test is reliably failing, debugging is straightforward, and feedback loops are much faster.