sixdegree

Progressive Disclosure for Agents

Giving an LLM 40 tools and hoping it picks the right one is the same mistake dashboards made for humans. Progressive disclosure fixes it, but for agents the mechanism is different.

Craig Tracey
Progressive Disclosure for Agents

Progressive disclosure is an old idea in interface design: show the most important information first, reveal detail as it becomes relevant. Your phone settings work this way. Good forms work this way. The principle exists because human attention is finite.

LLM context windows are also finite. And unlike humans, LLMs don't have the luxury of ignoring what's irrelevant. Every token in the context window gets attended to. Every tool definition competes for attention with every other tool definition. The cost isn't just cognitive, it's literal: more tokens, more latency, more money, worse results.

This makes progressive disclosure not just a design principle for agents, but an architectural requirement.

The 40-tool problem

Connect a few MCP servers to an agent and you quickly end up with dozens of tools in the context window. GitHub tools, Kubernetes tools, Slack tools, Jira tools, database tools. The agent sees all of them, all the time, regardless of what it's actually working on.

Ask it a question about a GitHub repository and it's choosing from a list that includes kubectl_rollout_restart, jira_create_issue, and slack_send_message. It usually picks the right tool. But "usually" isn't good enough when the cost of picking the wrong one is a hallucinated action against a system the agent had no business touching.

There's a subtler problem too. When an agent sees tools from multiple systems that do similar things, it has to reason about which one applies. GitHub, GitLab, and Zendesk all have a create_issue tool. The names are nearly identical. The descriptions overlap. But they operate on completely different entity types: a GitHub repository, a GitLab project, a Zendesk support queue. The only way to know which one to call is to know what kind of entity you're operating on. Without that knowledge, the agent is guessing.

You'd think semantic similarity would save you here. The tool names and descriptions should give the LLM enough signal to pick the right one. Some tools try to solve this with semantic filtering: embed the tool descriptions, embed the user's query, surface the closest matches. But semantic similarity is not truth. "Create issue" means something different in each of those systems, despite looking the same to an embedding model. Two tools can be semantically identical and operationally unrelated. You're filtering by vibes.

Relying on the LLM to infer the right tool from name and description alone is asking it to solve an ambiguity that the interface created. The answer isn't better descriptions or smarter embeddings. It's not presenting the ambiguity in the first place.

And every tool the agent doesn't need is a tax on the ones it does. More tool definitions means more tokens spent on descriptions the agent has to read and discard. It means more surface area for confusion. It means the actually relevant tools get diluted in a sea of irrelevant ones.

Why static filtering doesn't work

The obvious fix is to curate the tool list per use case. Build a "GitHub agent" with only GitHub tools. Build a "Kubernetes agent" with only Kubernetes tools. Problem solved.

Except real work crosses boundaries. An engineer asking "what happened to the payments service" might need to check the GitHub repo for recent commits, the Kubernetes deployment for pod status, and PagerDuty for recent alerts. A static partition means you either have an agent that can't cross boundaries, or you're back to loading everything.

Hardcoded tool lists also create a maintenance burden. Every new integration means updating every agent configuration that might need it. Every removed tool means checking whether any workflow depends on it. The rigidity that makes static filtering simple is the same rigidity that makes it break.

The ontology is the disclosure mechanism

What makes progressive disclosure work for agents is that you don't decide upfront which tools to show. The ontology decides, based on what the agent discovers.

SixDegree maintains a live ontology of your engineering systems: repositories, services, deployments, teams, incidents, and the relationships between them. When an agent asks "what do we know about the payments service?", the ontology doesn't just return a description. It returns entities and their relationships. The payments service is a Kubernetes deployment. It's backed by a GitHub repository. That repository is owned by the platform team. The team's on-call rotation is in PagerDuty.

Each of those entities has a type. And each tool in the system declares which entity types it operates on. So when the ontology surfaces a GithubRepository, the agent gains access to github_create_issue, github_create_pull_request, and github_get_file_contents. When a KubernetesDeployment appears through a relationship, the Kubernetes tools become available. The toolset isn't configured. It emerges from what the agent has learned about the world.

This is what makes it dynamic rather than static. The agent doesn't start with a curated list of tools for "payments service investigations." It starts with the ontology, discovers entities and relationships, and the relevant tools show up as a consequence. The conversation drives the disclosure, not the other way around.

The result: an agent investigating a GitHub repository sees GitHub tools. It does not see the GitLab equivalents. It does not see Kubernetes tools until a Kubernetes entity appears through a relationship. The agent's context window contains exactly the tools that apply to the entities it knows about, and nothing else.

The mechanics

The implementation has two layers.

Per-tool entity types. Each MCP tool declares which entity types it requires. github_create_issue declares entities.sixdegree.ai/v1/GithubRepository. gitlab_create_merge_request declares entities.sixdegree.ai/v1/GitlabProject. This is metadata, not logic. The tool itself doesn't change. It just advertises what it's for.

Runtime tracking. As the agent works, every ontology query result is inspected for entity types. When a GithubRepository entity appears in a search result, that type gets recorded. On the next turn, tools that require GithubRepository become visible. Tools that require entity types the agent hasn't encountered remain hidden.

The effect is that tool disclosure is dynamic, automatic, and driven by what the agent is actually doing rather than what someone predicted it might do.

This isn't about token budgets

Context windows are getting huge. Some models already support millions of tokens. The temptation is to conclude that progressive disclosure is a solved problem: just load everything, you have the room.

But bigger windows don't fix the ambiguity problem. An agent with a 10-million-token context window that sees three create_issue tools is just as confused as one with a 32k window. The confusion isn't caused by running out of space. It's caused by presenting options that the agent has no basis to choose between. More room for tools doesn't help when the problem is too many tools that look the same.

Larger windows also don't change the economics. Every token in the context is a token you're paying for, in cost and in latency. Loading 75 tool definitions when 10 are relevant isn't a rounding error at scale. It's waste that compounds across every request, every agent, every user.

But accuracy matters more than either. Fewer tools means less ambiguity. Less ambiguity means fewer wrong tool calls. Fewer wrong tool calls means fewer retries, less wasted computation, and less risk of the agent acting on the wrong system.

Progressive disclosure is not just for tools

The same principle extends beyond tool definitions. An agent that knows what entity it's operating on can also scope its retrieved context. The ownership information, the dependency graph, the recent change history, the runbook: all of these are entity-specific. Loading them for every entity type in every conversation is the same mistake as loading every tool.

The ontology is the key to both. It's what tells you which entity is in play, what's related to it, and therefore what context the agent actually needs. Tools, context, and relationships, all scoped by the entity at hand.

As we wrote in MCP Needs Connective Tissue, giving an agent tools without relationships turns every request into trial and error. Progressive disclosure takes that further: even with relationships, giving an agent all tools when it only needs some is a tax on every interaction.

The goal isn't to give agents access to everything. It's to give them access to exactly what's relevant, exactly when it becomes relevant.

Measuring it

Everything above is a design argument. But does progressive disclosure actually improve tool-calling accuracy? By how much? At what toolset sizes does it start to matter?

We built Boundary, an open-source framework for finding where LLM context breaks, to answer these questions empirically. Boundary runs reproducible benchmarks against LLM providers using 150 tool definitions across 16 services. It measures tool selection accuracy, cross-service confusion, latency, and token usage as the number of available tools scales from 25 to 150.

The tool-overload test includes disclosure mode comparison: you can benchmark the same model with all tools loaded upfront versus only the relevant tools disclosed. The results quantify exactly how much accuracy you lose by dumping every tool into the context window, and how much you recover with progressive disclosure.

If you're building agents with more than a handful of tools, run the benchmarks yourself and see what happens.


We're solving this at the platform level.

Design partners get early access to progressive disclosure, live ontology, and context-aware tool scoping. Limited spots.

Building agents with complex tool sets? See how SixDegree handles tool overload.