10 Best Practices for AI Context Management

At a CTO Lunch this week, we discussed how teams were managing context for their AI tools. About a dozen engineering leaders, companies ranging from 5 to 500 engineers. The answers were almost identical: a CLAUDE.md in the repo root, maybe a .cursor/rules directory, some internal wiki pages nobody keeps current.

Markdown files. That's the state of the art.

Here's what better looks like.

1. Context Is Infrastructure

Most teams treat context as a documentation problem. Write it down, keep it somewhere the model can find it. That framing produces markdown files. Context that actually serves agents is an engineering artifact. It should be designed, maintained, and tested like any other piece of infrastructure.

The test: if your context layer requires a human to update it, it will eventually be wrong. If it can't be queried programmatically, it's not infrastructure. It's documentation.

2. Derive, Don't Author

The relationships between your services, teams, codebases, and infrastructure already exist. They're encoded in your GitHub history, your Jira projects, your deployment configs, your dependency manifests. They don't need to be written. They need to be derived. Anything that requires a human to keep current will eventually be wrong.

The failure mode for authored context is quiet and slow. The day it's written, it's accurate. Over the next six months it drifts. Nobody notices until an agent does something wrong that a human would immediately flag as out of date. By then you've lost trust in the agent, not in the process that let the context rot. Tribal knowledge has the same problem: it exists, it's just not where agents can reach it.

3. Structure Beats Narrative

A paragraph describing what a service does is not the same as a structured record of what that service depends on, who owns it, and what deploys it.

Consider the difference:

Prose: "The payments service handles transaction processing and is owned by the platform team."
Structured:

{
  "service": "payments",
  "owner": "platform-team",
  "dependencies": ["postgres", "stripe", "kafka"],
  "tier": "critical"
}

Agents can traverse relationships. They can't reliably extract structure from prose. Build context that is queryable, not readable.

4. Keep It Live

A CONTEXT.md written at project kickoff reflects the organization at project kickoff. Six months later the team reorganized, the service split in two, and the deployment pipeline changed. The file didn't.

This isn't a hypothetical. It's what happens to every static catalog. The lag between organizational change and documentation update is measured in weeks, if it happens at all. An agent operating on stale context will confidently give wrong answers. It won't know it's wrong.

5. Scope to the Task

Loading everything available can be nearly as damaging as loading nothing. At any given step the agent needs a narrow slice of context, not a full org graph.

Most teams haven't built the mechanism for deciding what's relevant. The default is to load everything and let the model figure it out. This produces the same degradation you see with too many tools: quality collapses. We measured this directly: accuracy starts degrading between 25 and 50 tools, well before any API limit. It also burns tokens on every call. At scale that's not a performance problem, it's a cost problem.

6. Separate Context Types

Organizational context (who owns what), task context (what is this agent trying to do), and conversational context (what has been said in this session) are different things with different lifecycles.

Organizational context changes quarterly. Task context changes by request. Conversational context changes by turn. Treating them the same means they all get managed with the same staleness, the same storage, and the same retrieval strategy. That's why markdown files seem reasonable at first: everything goes in one place. The problem surfaces when organizational context is stale because it lives in the same file as last week's task notes, or when context from a previous session bleeds into the current one.

7. Context Should Evolve

Live isn't enough on its own. The more important property is that context can change as a session progresses.

A flat file gets loaded at the start and stays fixed regardless of where the conversation goes. An agent working through a complex infrastructure incident needs different context at step 12 than it needed at step 1. It knows more. The problem space has narrowed. The tools it needs have changed. A dynamic context layer reflects that: load what's relevant, retire what isn't, introduce new context as the task reveals itself. This is the same principle behind progressive disclosure for tools, applied to context.

8. Context Is a Process

Context isn't an input you configure before the session starts. It's a process that runs alongside the agent throughout: introducing new context as tasks evolve, pruning stale context to reduce noise, keeping the active window aligned with what the agent is actually working on.

The byproduct of doing this well is a significantly smaller average context window. The token savings are non-linear: the more integrations you have connected, the more you would have loaded, and the more you save by scoping it. Multiply that across thousands of agent sessions and the inference cost savings are material. This is one of those cases where the right engineering decision and the cheaper operating cost point in the same direction.

9. Respect Access Boundaries

What an agent can see should reflect what the requesting user is allowed to see. Context pipelines that bypass normal access controls are a real attack surface.

This tends to get skipped in early implementations. The agent needs to query org data to do its job, so someone threads a service account through the whole pipeline. Works great in dev. In production, that service account can see everything: compensation data, unreleased roadmaps, customer data under NDA. The agent doesn't discriminate. It uses what it can see. This becomes a compliance problem at the worst possible time, usually during a security review for a deal you actually want.

10. Track Provenance

When an agent makes a bad decision, you need to know what context it had when it made it. Most teams can't answer that question today.

Provenance means knowing where every piece of context came from and when. Not just that the agent used the org graph, but which version, at what timestamp, sourced from which systems. That's what makes agent behavior debuggable when something goes wrong and auditable when someone asks why. Enterprises will require it.

The markdown problem isn't a tooling problem. It's a framing problem. Teams that frame context as documentation will keep producing documentation. Teams that frame it as infrastructure will build something that actually scales.

Frequently Asked Questions

What is AI context management?

AI context management is how you supply agents with the organizational, task, and conversational context they need to act correctly, and keep it current and scoped. Done well it is an engineering process that runs alongside the agent, not a static document, introducing relevant context and pruning stale context as the task evolves.

What are the best practices for AI context management?

Treat context as infrastructure, derive it rather than author it, prefer structure over narrative, keep it live, scope it to the task, separate organizational from task and conversational context, let it evolve during a session, run it as a process, respect the requesting user's access boundaries, and track provenance so agent behavior is debuggable and auditable.

Why aren't markdown files like CLAUDE.md enough for context management?

A CLAUDE.md or wiki page is accurate the day it is written and drifts from there, because anything a human has to keep current eventually goes stale. It is also unstructured and unscoped, so agents cannot reliably traverse relationships and end up loading everything, which degrades accuracy and burns tokens. Markdown is documentation, not queryable infrastructure.

What is an example of context management in an AI application?

Instead of a prose description of a service, expose a structured, queryable record of its owner, dependencies, and tier derived from GitHub, Jira, and deploy configs, scope it to the task at hand, refresh it live from the source systems, and record where each fact came from and when. The agent traverses the relationships rather than parsing prose.

Thinking about context management at scale?

We're building live operational state for the agentic enterprise. Let's talk.