Joe Magerramov's blog: The Valley of Calm

A few months ago I wrote about the new calculus of AI-based coding, how 10x velocity gains require corresponding investment in testing, deployment, and coordination, or the bottleneck just moves. I've been living that reality for a while now, and one piece of it has surprised me more than the others: the CI/CD pipeline. At ~100 commits a day, it stops behaving like a pipeline and starts behaving like a traffic jam.

A typical CI/CD pipeline batches changes to remain efficient. A batch of new commits gets built, tested, packaged, and deployed together. The longer the cycle, the larger these batches grow. At AWS, most CI/CD pipelines start with a test or pre-prod stage of some sort. This is the last chance for the CI/CD pipeline to test a "fully dressed" set of changes before they begin deploying to real production environments. If any problems are detected here, the whole batch is rejected and won't be allowed to continue to the production environment, where it could impact customers. This is analogous to the quality control inspection station in manufacturing, which rejects items that don't meet the quality bar.

However, unlike a manufacturing pipeline where items are discrete, CI/CD pipeline carries cumulative changes, each batch contains all the changes from all the prior batches. So when a batch contains a defect, rejecting it alone is not enough. A subsequent batch must include a fix, typically a revert in order to allow CI/CD pipeline to flow through again.

Here's the intuition. A pipeline takes a few hours from commit to a verifiable signal in staging. Whatever commits land in that window go out as one batch. At normal velocity, batches are small and most of them are clean. At high velocity, batches get big, and the probability that something in the batch is broken goes up. When that happens, you revert and try again, but more commits have landed in the meantime, so the next attempt has an even bigger batch behind it. Bigger batch, bigger chance of another defect, longer queue piling up behind it.

It looks a lot like a traffic jam.

The simulator

I wasn't sure if my intuition matched the math, so I wrote a small Monte Carlo simulator:

https://github.com/joemag1/DeploymentForHighVelocityTeamsSimulator.

It's deliberately simple. Commits arrive at ~100 per day, shaped like a real workday. This matches what we observe from high velocity teams. Each commit independently has some probability of being defective. The pipeline takes N hours to detect a defect, and when a deployment fails, one bad commit gets reverted but the rest of the unresolved batch persists into the next attempt along with everything that landed in the meantime. I sweep two parameters: pipeline duration (1 to 12 hours) and per-commit defect rate (1 in 400 to 1 in 40).

The model is pessimistic, and assumes full-batch revert on failure, but real pipelines have smarter options: bisection to localize a bad commit, surgical reverts that only undo the offending changes, merge queues that test commits speculatively before they land. These all help, but they help less than you'd hope, because none of them escape the underlying cost. Bisection runs the full pipeline multiple times to localize the bug. Surgical revert still has to build, test, and deploy the resulting tree. Merge queues run the full pipeline more often, not less. They reshape which commits get punished or when the cost is paid, but they don't shorten the feedback loop.

The only way to actually shorten it is to make the pipeline itself capable of doing less work per run: incremental builds that only recompile what changed, test impact analysis that only runs tests touching changed code, partial deploys that only redeploy affected components. Most production pipelines I've encountered don't do any of these. At 100 commits a day, that math inverts, and the smart batch strategies only pay off once you've done the underlying work to make each pipeline run cheap.

The shape is the interesting part. The success rate doesn't degrade smoothly across the parameter space, it falls off a cliff. There's a broad region in the lower-left where most deployments succeed and everything feels fine; I started thinking of that as the valley of calm. There's a region in the upper-right where almost nothing gets through - the plateau of misery. And the transition between them is narrow and steep. You don't drift from one to the other; you fall off the edge.

A few concrete points along that cliff:

At 1 in 400 defects, deployment success drops from 97.6% at a 1-hour pipeline to 78.1% at 12 hours.
At 1 in 100, it drops from 89.1% to 42.0% over the same range.
At 1 in 40, it drops from 71.5% to 0.7%.

The same 12x increase in pipeline duration takes a comfortable team to merely strained, a strained team to broken, and a broken team to deadlocked. Pipeline duration matters disproportionately more as defect rate climbs, the two axes don't act independently, they multiply.

Staying in the Valley of Calm

If the two axes multiply, the question is which one to push on. There are really two directions: lower the defect rate per commit, or shorten the time it takes to catch a defect once it's been committed. Most of the interventions I can think of land on one of those two axes.

On the defect rate side, my intuition is that most software teams operate somewhere around one defect per 40 commits. That's the rightmost column in the chart, and it's not a comfortable place to be at high velocity. Our team has put a lot of effort into pre-commit testing. We run our standard service canaries against the full system spun up on a developer's box, using fakes for external dependencies. I'd estimate we're somewhere in the one defect per 100 to 200 commits range now, which is meaningfully better but still not in the valley of calm. If I trust the simulation, getting to roughly 1 in 400 is where things really start to feel comfortable.

The other axis is pipeline speed, and I think this is where I see the most untapped headroom. An hour for a build, another for packaging, two or three for deployment, those numbers feel reasonable when writing code is the bottleneck. They feel less reasonable at 100 commits a day. None of those phases is close to its true entitlement, and I suspect each could become an order of magnitude faster with techniques that are already well understood.

What I take away from playing with the simulator is that the 10x velocity gains from agentic coding aren't free. They come with a tax, paid in deployment friction, and the bill comes due faster than most teams expect.

The two knobs aren't equal. Pushing the defect rate from 1 in 100 to 1 in 400 is hard, it requires sustained investment in pre-commit testing, tighter review, better tooling, and there are diminishing returns as you approach the limits of what humans and agents can catch before code lands. Pipeline speed is different. Most pipelines have an order of magnitude of headroom sitting there untouched, because the work to claim it (incremental builds, test impact analysis, partial deploys) only became worth doing once velocity got high enough to expose it.

That's the bet I'd make. The teams that turn agentic coding into real velocity won't be the ones with the smartest agents, they'll be the ones who looked at their CI/CD pipeline, saw the traffic jam forming, and rebuilt it before it brought everything to a halt.

Sunday, May 3, 2026

The Valley of Calm

The simulator

Staying in the Valley of Calm