Stop Letting Your Agent Thrash

Someone asked our agent a simple question. "How many entities are in my graph?" One number. The answer is a single count, the kind of thing a database returns in a few milliseconds.

It took two and a half minutes.

The model wasn't slow. The graph wasn't slow. What happened in between was the agent thrashing: calling a tool, reading the result, deciding it wasn't quite enough, calling another tool, narrowing, widening, second-guessing, and circling the answer it already had. No single step was wrong. The whole was absurd. A question a human would answer by glancing at a number instead became a two-minute exploration because nothing in the system told the agent when to stop.

That is the failure mode I want to talk about. Not hallucination, not a bad tool call. Thrashing: a loop with no floor, the agent feeling its way to an answer it already has. Some people call it looping, or a doom loop. The behavior is the same, and it is almost never the model's fault.

What thrashing actually is

An agent runs a loop. Observe, decide, act, observe again. It keeps going until it believes the task is done. That belief is the only thing ending the loop, and a model's sense of "done" is soft. Given the option, it will keep gathering. More context feels safer than less. So a loop with no floor under it will happily spend twenty calls building confidence it could have had in one.

We had built exactly that floor-less loop. When a question fit a known template, a fast path answered it. When it didn't, the system fell through to a general agent loop with no early budget. The only limits were a soft gate that parked the run for five minutes and a hard stop at a couple hundred calls. Those are not guardrails. Those are the edge of the cliff. Everything before the cliff was open road, and the agent drove the whole length of it to count to a number.

The instinct, when you watch this, is to reach for a smarter model. Surely a better reasoner would know it already had the answer. Sometimes. But we profiled it, and the lesson was humbling: on these queries, almost all of the wall-clock time was the model thinking and the loop turning. The actual graph work was under a second. You cannot buy your way out of a structural problem with a bigger engine. The engine was never the bottleneck. The loop was.

A loop needs a floor

The first fix is the least glamorous and the most important. Give the loop a budget.

Not a kill switch at the far edge, but a real budget the harness enforces from the first turn: a wall-clock ceiling and a tool-call ceiling, both modest, both checked before every call. When the agent hits it, the harness does not abort. It forces the agent to stop gathering and synthesize an answer from what it already has.

This one change turned a two-and-a-half-minute loop into a thirty-second honest answer. Sometimes that answer is partial. "Here is what I found in the time I had" is a far better outcome than a perfect answer that arrives after the user has given up. A budget is not a constraint on quality. It is a constraint on the model's appetite for certainty, which left unchecked is infinite.

This is a harness responsibility, not a model one. You should not be prompting the model to please hurry up. The scaffolding around the model should make hurrying structural.

A plan beats a loop

A budget stops the bleeding. It does not cure the disease. The agent was thrashing because it was solving the question and executing it at the same time, one tool call at a time, with no commitment to a route. Every call was a fresh decision, and fresh decisions are where loops go to wander.

The cure is to split those two jobs. The agent plans. A deterministic engine executes.

When a question comes in, the agent's first job is not to call a tool. It is to produce a plan: a small, explicit description of what would answer the question. Count these entities. Group them by kind. Keep the ones above a threshold. Then a real execution engine takes that plan and runs it, in one shot, against the data. No loop. No thrashing. The agent decided the route once, up front, and the engine drove it.

"How many entities are in my graph?" stops being an exploration and becomes a plan with one step: count, no filter. It executes in under a second, exact, every time. The same machinery composes upward. "Which kinds have more than two hundred and fifty entities?" is three operations stacked into one plan: enumerate the kinds, count each, keep the ones over the line. The agent never touches the data directly. It describes the shape of the answer, and the engine produces it.

The reason this is so much faster is not that any one step got cheaper. It is that you replaced N rounds of model-in-the-loop with one. The expensive part of an agent turn is the model, and a plan collapses many model turns into a single decision followed by deterministic work.

Why not just let the model write the query?

The obvious shortcut is to skip the plan and have the model emit a raw query. Hand it the database, let it write SQL or Cypher, run whatever comes out. It is tempting because it feels direct.

It is the wrong trade. The moment the model emits raw, executable queries, you have thrown away the three things that made the plan worth having. You lose validation, because there is no longer a closed set of operations you can check. You lose safety, because a raw query bypasses the permission boundary that a structured plan can be checked against. And you lose the bound, because freeform queries are exactly the open-ended surface you were trying to escape.

The discipline that matters is this: the plan is descriptive, and the engine is authoritative. The agent says what it wants in a vocabulary of operations the engine understands. The engine, not the model, decides how to execute that safely against real data. The model's job is judgment about intent. The engine's job is correctness, permissions, and speed. Keep those separated and you get the flexibility of an agent with the reliability of a query planner. Blur them and you are back to thrashing, just with a bigger blast radius.

What still belongs in the loop

None of this means the loop goes away. Some questions are genuinely open. "Why is this account at risk?" is not a count and not a grouping. It is synthesis across systems, the kind of reasoning that does need to gather, weigh, and reconsider. That work belongs in the agent loop, and it should stay there.

The point is not to eliminate the loop. It is to stop using it for things that were never open questions in the first place. A count is not a research project. A grouping is not an investigation. When the harness can recognize the analytical questions and route them to a plan, the loop is freed to do the work that actually needs a loop. And because the loop now has a budget, even the genuinely hard questions can no longer run forever.

That is the shape of a harness that respects the user's time. A floor under every loop. A plan in front of every analytical question. A clean line between what the model decides and what the engine executes. The model gets to be smart. The harness makes sure smart doesn't mean slow.

The model was never the problem. It was doing what models do, reaching for more. The job of the harness is to know when more is just thrashing, and to stop it before the user does.

What thrashing actually is

A loop needs a floor

A plan beats a loop

Why not just let the model write the query?

What still belongs in the loop

We're onboarding design partners now.