Plan Slow, Ship Reflexes

The dominant story about where AI is heading is "let it think." Bigger reasoning models, longer deliberation at the moment of use, more compute spent per question while you wait. The mental image is a vast brain grinding on your problem in real time.

A pair of results — one from robotics and control, one from the guts of how language models generate text — quietly sketch a different architecture. Not one brain thinking hard in the moment. Two speeds: a slow, expensive mind that does its thinking offline, and a fast, cheap reflex that does the acting live. And the relationship between them changes where the value in an AI system actually sits.

The Expensive Planner Is a Factory, Not a Product

In control, there's a classic way to make an agent act well: at every moment, imagine many possible futures, score them, and take the first step of the best one. It works, and it's slow — call it ninety milliseconds of deliberation per decision. Far too slow for anything that has to act in real time.

Here's the result that matters. You can take that slow, deliberating planner and distill it into a cheap reflex policy — a small model that just maps the situation directly to an action in a third of a millisecond — and keep roughly 98.6% of the performance. Two hundred and sixty-five times faster, almost none of the quality lost.

Read that again, because it reframes what the expensive planner is for. The planner's job was never to act. Its job was to generate examples of good decisions that teach a fast policy how to behave. The deliberation is a training-time activity. The thing you actually deploy is the reflex. The slow mind is a data factory; the fast mind is the product.

Data center infrastructure running overnight — The factory runs in the idle hours — the reflex ships by morning

Result Two: Don't Wait — Predict the Next Step and Pre-Compute It

The second result comes from making language models generate faster, and it rhymes with the first in a way that's almost eerie. There's a well-known trick where a small draft model proposes several tokens and a big model verifies them in parallel. But even that has a hidden stall: the draft has to sit idle, waiting to find out how the verification came back before it can propose the next batch.

The fix: don't wait. While the verification is still running, have the draft predict the most likely outcomes and pre-compute its next move for each one. When the real result lands, the answer is usually already sitting there. It's branch prediction — the decades-old trick that makes CPUs fast — ported into AI. You don't deliberate at the moment you need the answer. You deliberate ahead, on the likely branch, and you only pay the slow cost on a miss.

Both results are the same idea wearing different clothes: move the thinking out of the critical moment. Either ahead in time, or offline entirely.

The Two-Speed Mind

Stack them and an architecture emerges that looks a lot like the old psychology distinction between fast and slow thinking — but engineered. A slow, deliberate system runs in the background, overnight, in the idle hours: it explores, it plans, it considers many futures, it spends real compute. A fast, reflexive system runs in the live moment: cheap, instant, distilled from everything the slow system worked out.

The expensive thinking doesn't get spent at the moment of action. It gets compounded into cheap acting. And there's a free lunch hiding in here: the slow system can run on compute that would otherwise sit idle — nights, gaps, wait states. Idle capacity spent on deliberation ahead of time is, quite literally, latency bought for free.

"The point of thinking hard is to earn the right to stop thinking."

deliberate offline → distill → act fast → log exhaust → re-deliberate

Two Pieces That Make It Work

Two less obvious lessons from the same research make the two-speed mind practical rather than fragile.

Factorize what you'll need to repair. In the control work, the systems that kept the world-model (what reality is) separate from the policy (how to act) could be fixed when reality drifted — retrain only the part that models the world, leave the behavior alone. The monolithic, end-to-end systems that fused everything into one blob got worse when you tried to patch them; you couldn't touch one part without damaging the rest. Anything you'll need to maintain over years should be built so you can repair one component without re-validating the whole.

Cheap exhaust beats scarce expert data for staying current. The same world-model could be repaired after a simulated breakage using a hundred episodes of bad, clumsy interaction — not curated expert demonstrations, just diverse, low-quality logs of stuff happening. Everyone hoards expensive expert data. The cheap interaction exhaust nobody bothers to log is often what keeps a system adapted to a changing world. Whoever owns the boring exhaust owns the adaptation loop.

Long-exposure light trails representing speed — Slow deliberation, distilled into motion that looks instant

What This Changes

If the two-speed mind is the real shape of things, several assumptions shift:

Value migrates from the big model to the cheap reflex. The impressive deliberating system is infrastructure. The thing customers touch — the thing that has to be fast and cheap and reliable — is the distilled policy. Deployed deliberation may be quietly overrated.

Deliberation becomes a cadence, not a moment. The question stops being "how long does it think per query" and becomes "how often does it re-think and re-distill" — nightly, per incident, per drift. The unit of progress is the loop between slow thought and fast action.

The most efficient systems will think hardest when it's safe to be slow. The real fast/slow switch isn't fixed — it's spending deliberation where uncertainty is high and reflexes where it's low. Think slow at the dangerous step; fire instantly at the routine one.

"The winners won't run the biggest model live. They'll have the tightest loop between slow thought and fast action."

The Honest Takeaway

The industry is selling deliberation at the moment of action — pay more, wait longer, let the big model think while you stand there. These results suggest that's the expensive way to do it, and often the wrong one. The cheaper, sturdier shape is to push the hard thinking into the background, distill it into something fast, and act in the moment with a reflex that already knows the answer.

Deliberate slowly in the idle hours. Distill it into cheap reflexes. Act fast, log everything that happens, and feed it back into the next night's thinking. Plan slow. Ship reflexes. Let the night do the thinking so the moment doesn't have to.