AI Context Management: One Graph, Every System of Record

AI context management isn't a curation problem. It's a substrate problem, and the existing category is solving it with the wrong architecture.

Categories that have tried to organize enterprise knowledge declaratively have hit the same wall again and again. Data catalogs, CMDBs, service catalogs, business glossaries. Each was the right answer to a real problem: humans needed structured ways to understand what existed in their organizations and what it meant. Each has decayed in the same way, between the maintenance cycles that keep curated artifacts current and the underlying systems that move faster than any human can keep up with.

The AI context layer category is currently being built on the same model. Atlan, DataHub, Collibra, the data infrastructure category broadly. They're producing a new generation of curated artifacts (semantic models, metric definition stores, business glossaries with AI-assisted curation) and proposing them as the substrate that AI agents should read from. These tools were built for human consumers. Retrofitting them for agents doesn't change the underlying model.

The architecture hasn't changed. And the architecture is the problem.

This post argues that the enterprise AI context layer is one graph, derived continuously from every system of record an organization runs. By system of record we mean any system that holds authoritative state about some part of how the business runs. Not just the traditional CRM/ERP/warehouse triad, but the repositories, the on-call schedulers, the support tools, the internal applications that get built and retired, anything emitting events that describe what's true right now. The graph composes itself from those sources. It doesn't get curated into existence. The split between "data context" and "operational context" that the category is currently drawing isn't a real architectural distinction. It's an artifact of which vendors got to which territory first.

Context management vs context engineering

A quick note on terms, because the phrase "context management" is doing double duty.

Context engineering is the inference-time discipline of deciding what goes into the model's context window on a given turn: compaction, retrieval, tool result handling, sub-agent message routing. Anthropic's Effective context engineering for AI agents is the cleanest articulation of this. Context management is the upstream discipline of guaranteeing that the structured sources the agent reaches for are trustworthy, governed, and current. Engineering picks what to load. Management guarantees what's loadable is true. Both are necessary; their failure modes are different. The deeper comparison lives in a separate post: context engineering vs context management.

The rest of this post is about management.

The existing category got the substrate wrong

The data infrastructure category has spent the last two years pulling the term "context layer" toward a specific shape. Atlan's recent writing on context layers for AI agents is among the clearest articulations of this view. DataHub is explicitly positioning around "context management" as a category name. Collibra, Snowflake, ElixirData, and several smaller entrants are positioning around adjacent versions of the same architecture.

What they're building has real value. A metric definition store that resolves "active customer" consistently across Marketing, Finance, and Sales. An identity resolution layer that maps the same customer across Salesforce, Stripe, and HubSpot. A lineage graph that traces where a number came from and which transformations applied. These are real problems and the products solve real subsets of them.

But notice what the category's flagship features are about: active metadata, automated lineage, AI-assisted curation, continuous discovery. Every one is a mechanism for keeping curated artifacts from going stale. The entire feature surface is structured around fighting decay.

That's the tell. The underlying substrate is declarative. A human declares what a metric means, what an entity is, what depends on what, and the tooling works to drag those declarations back into alignment with systems that keep moving. It's the same pattern that decayed the CMDB and the service catalog before it.

The right move isn't to add more automation on top of curation. It's a different architectural starting point. The data infrastructure category begins with metadata governance over the data estate and works outward. A derived context layer begins with operational systems of record across every workflow agents act on (code, customers, contracts, incidents, deployments, deals, support, infrastructure) and composes a live enterprise state graph from all of it. The difference isn't curation versus derivation as techniques. The difference is what the architecture is denominated in.

The application layer is in flux. The substrate isn't.

There's a debate playing out across AI infrastructure right now about what happens to traditional SaaS as agents become the primary interface to enterprise systems. Some people think the major SaaS applications evolve toward agent-readable APIs and survive. Some think they get replaced by purpose-built databases with thin agent interfaces on top. Some think entirely new categories of tool emerge, vibe-coded internal applications that didn't exist three years ago and won't be on anyone's roadmap until someone builds them in a weekend.

We don't have a strong opinion on which scenario dominates. We have a strong opinion that the answer doesn't change the substrate problem.

Whether an organization is running Salesforce in 2030, or a vibe-coded internal CRM that emerged from a Saturday afternoon, or three best-of-breed tools that didn't exist when you read this, the same fact is true. Those systems will emit entities. They'll record state. They'll generate events. Agents will need to read across them coherently. The substrate that does that reading isn't denominated in any specific application. It's denominated in the property that organizations run systems and those systems emit state.

The data infrastructure category is bound to the existing application landscape in a way a derived context layer isn't. Atlan, DataHub, and the rest of the data infrastructure category started from metadata governance over the data estate. Their architectures are downstream of the data estate. Their connector breadth, even where it now reaches into business systems, is shaped by data-team buyer motions and data-estate-shaped pricing. The product gravity is denominated in data systems, even as the marketing surface expands. A derived context layer just needs the systems an organization actually runs, whatever those are, today or in five years, to be emitting state. New system comes online, new connector, graph absorbs it. Old system retires, connector retires, graph rebalances. The architecture persists across application-layer churn because it isn't a function of which applications are running.

This is also why the "data half vs operational half" framing the category has drawn is artifactual. The category is now expanding past that line, but the line was never structural in the first place. The framing assumes that enterprise systems sort cleanly into "data systems" (warehouses, BI tools, semantic layers) and "operational systems" (CRMs, support tools, CI/CD, infrastructure). They don't. Salesforce is a CRM (operational) and a primary source of revenue and pipeline truth (data). Stripe is billing infrastructure (operational) and a primary source of revenue reality (data). Workday is HR operations and the canonical source of headcount truth. The warehouse is itself a derived view over many of these systems. The clean line the category wants to draw between "data" and "operational" doesn't survive contact with the actual enterprise stack. The category's recent expansion toward "business context" and "operational metadata" is itself an acknowledgment that the line doesn't hold.

The right model is simpler. Every system of record is a source. The graph derives itself from all of them. Whether a particular SoR feels "data-shaped" or "operations-shaped" is incidental. It's a categorization that comes from how vendors organized their product catalogs, not from anything architecturally meaningful.

One graph, every SoR

The substrate we're describing has a specific shape. It's worth being precise about the architectural commitments, because the contrast with the curation approach is sharpest there.

The graph is derived from systems of record. Humans define the integration and identity rules; the graph state itself is derived from systems of record continuously, rather than maintained by hand. The integration layer subscribes to the events each SoR emits: webhooks, change feeds, polling for systems that don't push. The graph composes itself from those streams. When a new pull request opens in GitHub, an entity appears. When a Salesforce opportunity moves stages, the edge updates. When a customer's support ticket gets closed, the closure timestamp propagates. The graph reflects the current state of the underlying systems within a window measured in minutes, not maintenance cycles.

Identity is resolved across systems. The same customer has different IDs in Salesforce, Stripe, HubSpot, Zendesk, and the warehouse. That's a feature of the underlying systems, not a bug. Identity resolution is the substrate's job, not the agent's. It happens at ingest time, not query time. When an agent asks about a customer, it gets the unified entity, and the provenance trail back to each source ID is queryable if it needs to be.

Identity resolution is load-bearing infrastructure. Every integration needs a strategy for emitting entities with stable identifiers that the graph can join across system boundaries. Done well, the agent never has to know that "service X" has eight different aliases or that "customer Acme" lives under five different IDs. Done poorly, every query breaks at the first traversal and the graph effectively isn't connected.

The graph is exposed through MCP. MCP has emerged as the default protocol surface for many agent systems, and a context layer that doesn't speak MCP is asking enterprise teams to bridge protocols themselves. There's craft in what makes an MCP server actually useful versus what just technically works: exposing a small, semantically coherent surface per entity type, rather than dumping the underlying API on the model. The MCP layer is where the architecture meets the agent.

Policy is enforced across ingest, storage, and query: query-time projection determines what each agent or user can see, and ingest-time controls determine what enters the graph in the first place. The same user asking the same question can get different graph projections depending on entitlement. The agent doesn't get to bypass policy by virtue of being an agent. Governance becomes meaningful, not theatrical, when the underlying graph is real and the policy enforcement is happening at the layer that actually sees the data.

These architectural commitments are uniform across SoRs. They don't change based on whether the system is GitHub or Salesforce or Snowflake or a vibe-coded internal tool. The substrate doesn't have a different mode for data systems and another mode for operational systems. It has one mode, and it absorbs whatever SoRs the organization runs.

The shape of the integrations catalog reflects this. SixDegree runs 60+ connectors across sales, support, operations, comms, data, and infrastructure. Not because the product is many things, but because the substrate is one thing. Warehouse and metadata-vendor integrations follow the same architectural pattern as the rest and ship as the design partner pipeline pulls them in.

We've made a related argument elsewhere about why a live knowledge graph is the missing context layer for safe agentic AI. The architectural commitments above are what make that thesis concrete in production.

SoR coherence is the hard work; the graph follows from it.

Dimension	Curated context	Derived context
Substrate	Declared entities and relationships	Derived from system events
Maintenance	Continuous human curation	Continuous SoR integration
Staleness window	Days to months between updates	Minutes
Failure mode	Artifact drifts from reality	Connector breaks, surface visible
What happens when SoRs change	Curators update the artifact	Graph updates from the source
Source of truth	The curated artifact	The underlying SoR
Representative pattern	Data catalogs, CMDBs, service catalogs, business glossaries	SoR-derived graphs, live ontologies
Center of gravity	Data estate metadata governance	Operational intelligence graph

Why curation fails

The pattern is well-documented historically. The recurring pattern across declarative organizational-knowledge categories is the same.

CMDBs were the right idea. Operations teams genuinely needed to know what existed and what depended on what. The implementation was wrong. They were maintained by humans declaring state, and the infrastructure moved faster than the declarations. Within months, the graph drifted from reality.

Service catalogs followed the same arc, and the distinction between an IDP and a derived context layer is architectural, not a feature gap. Business glossaries and semantic layers are now repeating it: humans declare what metrics mean, the calculations change, the glossary drifts. Layering automation on top makes curation less manual, but the artifact is still separate from the systems it describes, with its own drift. The pattern is consistent: declarative substrates for organizational knowledge decay.

What a derived graph actually contains

The shape of an SoR-derived graph isn't theoretical. The questions agents need to answer across enterprise workloads cluster into a few recognizable patterns, and each pattern has the same structural property: the answer lives across multiple SoRs, can't be assembled by curation, and falls naturally out of a graph derived from the systems themselves.

Ownership across systems. Who owns this service, this account, this incident, this contract, this region? Ownership is recorded in different systems for different entity types: CODEOWNERS files, Salesforce account owner fields, on-call rotations, territory assignments, contract signatories. Each link in the chain comes from a different SoR. A derived graph composes the ownership picture from all of them continuously and reflects organizational changes the day they happen, not the next time someone updates a YAML file. The same problem appears for service ownership in engineering, account ownership in sales, and territory mapping in real estate. Different surfaces, same substrate.

Lineage and history. What's the current state of this thing, what was its previous state, what changed between the two, and what caused the change? This question shows up everywhere. Deployment lineage from the source repo through CI to the runtime environment. Contract history on an account. Pipeline stage transitions on a deal. Feature rollout sequences against a specific customer cohort. The graph has to reflect both current state and recent history, because cause-and-effect questions inherently require comparison across time. A static catalog can't do this. A derived graph differs from an IDP precisely on this dimension.

Blast radius. Before any agent takes an action, the responsible question is what else this action affects. Blast radius isn't an engineering concept. It's a pattern. Changing a pricing tier has a blast radius across customer accounts. Pulling a feature has a blast radius across renewal motions and reference customers. Rolling back a service has a blast radius across consuming services and the on-calls responsible for them. Blast radius analysis is one of the clearest tests for whether a context layer is actually useful at runtime.

Incident and decision history. Has this thing failed in this way before? What was the resolution? Are the same conditions recurring? This applies to production incidents, to customer escalations, to deal regressions, to security events. The graph needs PagerDuty, Linear, postmortem documents, Slack channels, support tickets, and account history, all correlated through identity. Without it, every agent investigation starts from scratch, which is the same problem tribal knowledge creates for humans, now magnified because agents have no prior experience to fall back on.

Cross-domain chains. The most consequential queries cross domains. Which PR introduced the bug that's affecting Acme Corp right now? Which deploy correlates with the support ticket spike that started Tuesday? Which feature rollout caused the conversion drop on the checkout flow? Which customer signals (escalations, churn risk, support volume) should resolve back to the engineering changes that caused them? Connecting code to customer and customer back to code requires bridging GitHub, the CI/CD platform, the deployment system, the customer analytics platform, and the support tool. No single system owns this chain. The shift from dashboards to conversations only works when the chain is real and queryable.

Identifier semantics across systems. Nothing has a stable cross-system identifier. The "service" in your catalog has a different name in Kubernetes, a different label in Datadog, a different repository name in GitHub, a different alert routing key in PagerDuty. The "customer" in Salesforce is a different ID in HubSpot, a different account in Zendesk, a different row in the warehouse. Identity resolution across systems is the load-bearing infrastructure underneath every other pattern. Without it, every cross-system query breaks at the first hop. MCP needs connective tissue for exactly this reason: the protocol assumes coherent identity across tools, but coherence doesn't happen by itself.

Two scenarios

A coding agent is asked to make a rollback decision. A new version of a payments service shipped at 14:02. By 14:11 the error rate on the dependent frontend has tripled. The agent has to assemble: which exact service deployed at 14:02, which version was previously running and whether it was healthy, which downstream services consume the affected APIs, what the historical error rate baseline looks like, who is on-call for both the regressing service and its dependents, what the blast radius of the rollback itself looks like, whether the deploy has pending verification jobs still running, whether there's a feature flag that could be disabled instead.

A CX agent is asked to assemble a churn-risk picture. A high-value enterprise customer at $1.4M ARR has a renewal in 52 days. The CSM's auto-flagged churn score just jumped from yellow to red. The agent has to assemble: who owns this account across sales, CS, and support; what the last QBR covered and what action items came out of it; which support tickets are open and which were recently closed; whether any P1 incidents touched this customer's deployment in the last quarter; what features they've requested that are still pending; which executive sponsor has the relationship; whether the renewal motion has started in the CRM and at what stage; what the blast radius of losing them looks like across reference accounts and pipeline.

Look at the systems each agent has to traverse. The rollback crosses GitHub, the CI/CD platform, Argo, Kubernetes, PagerDuty, Datadog, and the service ownership graph. The renewal crosses Salesforce, Zendesk, the incident tracker, Slack, the product analytics system, the CRM, and the account ownership graph. The systems overlap. The same customer that appears in the renewal scenario is the affected downstream account in the rollback scenario, and the same engineer who owns the regressing service is the one whose on-call rotation determines who hears about it first.

Different SoRs, different agents, different surface queries. Same substrate. One graph, derived from every system the organization runs, with identity resolved across all of it, exposed to both agents through MCP.

A curation-based approach would require maintaining separate artifacts for each domain: a data catalog for the customer side, a service catalog for the engineering side, glossaries for each, lineage trees for each. Each artifact drifts from its underlying systems on its own schedule, and the cross-domain queries (the customer affected by the rollback, the engineering change that caused the support spike) become integration work between artifacts rather than the native shape of the substrate.

A derived graph spans both because it's downstream of the SoRs, not upstream of them.

Implementation patterns

The patterns that actually make derived context layers work at production scale are well-understood individually. The work is in combining them into a coherent platform rather than a bag of integrations.

SoR coherence is the hard work. Each operational system is modeled as an integration that emits clean, stable, identity-resolved entities and relationships continuously, and the graph composes itself from the union of those streams. This is where the real effort lives. The graph follows from it.

Live updates. A graph that refreshes nightly is a catalog with extra steps. For agents acting on live state, yesterday's graph can be fiction, so updates propagate within minutes. It's a correctness requirement, not a performance one.

Identity resolution. Every integration emits entities with stable identifiers the graph can join across system boundaries. Done well, the agent never sees that one service has eight aliases or that one customer lives under five IDs. Done poorly, every cross-system query breaks at the first hop.

Query-time governance. Policy is enforced at the layer that actually sees the data. The same question from two users can return different projections based on entitlement, and an agent doesn't get to bypass policy by virtue of being an agent.

Progressive disclosure. A full context layer can expose thousands of entities and hundreds of tools, and surfacing them all at once collapses model performance (we benchmarked it). Start with what the current task needs and let the graph reveal more as the conversation traverses. Progressive disclosure is what keeps the agent from drowning.

MCP as protocol. MCP is the default surface agents speak, so the graph exposes itself there. There's craft in what makes an MCP server useful: a small, semantically coherent set of tools per entity type, not the raw SoR API dumped on the model.

Operating MCP servers in production covers the day-two operational realities of running this kind of system at enterprise scale. For tactical guidance on what to fix first if you're already running agents, the companion piece on AI context management best practices covers the disciplines that make the architecture above actually work.

Frequently asked questions

What is AI context management?

AI context management is the upstream discipline of maintaining a graph derived from systems of record that AI agents can read from at inference time. It guarantees that the structured sources the agent reaches for are accurate, current, governed, and coherent across systems. It operates alongside context engineering (the inference-time discipline of deciding what to load into the prompt on a given turn), but the two are distinct disciplines with different failure modes. For enterprise agents, context management is typically the binding constraint, because a perfectly engineered prompt against a stale source produces a confident wrong answer.

What's the difference between context engineering and context management?

Context engineering is the inference-time discipline of deciding what goes into the model's context window on a given turn: compaction, retrieval, tool result handling, sub-agent routing. Context management is the upstream discipline of guaranteeing that the structured sources the agent reaches for are governed, current, and trustworthy. Engineering picks what to load. Management guarantees what's loadable is true. Most teams need both; their failure modes are different.

Why isn't a data catalog enough?

Data catalogs are primarily metadata-governance systems. Even with active metadata and automated lineage, their center of gravity is curated understanding of the data estate. A derived context layer composes itself from systems of record continuously. The difference is the substrate, not the surface. Both can expose similar-looking metadata to agents, but one is a snapshot drifting from reality and the other is a live view of it. We cover the contrast in depth in context layer vs data catalog.

Do I need a context layer if I'm already using RAG?

RAG is excellent at passage retrieval. It falls apart on relational reasoning. 'Who owns this service?' requires navigating an org hierarchy, not semantic similarity. 'What's the blast radius of this deploy?' requires transitive closure over a dependency graph, not document matching. 'Which incidents correlate with this incident type?' requires temporal reasoning and cross-system identity, not relevance ranking. RAG answers 'what similar facts exist?' Graphs answer 'what's connected?' They solve different problems, and most enterprises need both. We benchmarked where RAG falls short for agentic workloads.

Is SixDegree a competitor to Atlan or DataHub?

Not in the traditional sense. Atlan and DataHub start from metadata governance over the data estate. SixDegree builds a derived graph across all systems of record. The architectures are different. If you're running Atlan, we can integrate it as one of your systems of record; the relationship is integrative, not zero-sum.

What systems does SixDegree integrate with today?

SixDegree runs 60+ connectors across sales (Salesforce, HubSpot, Outreach, Gong), support (Zendesk, Intercom, Genesys, Qualtrics), operations (Workday, NetSuite, QuickBooks, Okta), comms (Slack, Notion, Confluence), data (Looker, Mixpanel, Google Analytics), and infrastructure (GitHub, Jira, Argo, Kubernetes, PagerDuty, Datadog, Stripe, and more). Warehouse and data-catalog integrations follow the same architectural pattern as the rest and ship as the design partner pipeline pulls them in.

How does MCP relate to context management?

MCP is the protocol surface through which agents discover and invoke tools and resources at inference time. It's how the context layer exposes itself to the model. By itself, MCP doesn't guarantee that the context being queried is accurate, current, or coherent across systems. That's the context management work. A poorly built MCP surface against a stale graph is still stale. MCP needs connective tissue underneath it to actually deliver on what the protocol promises.

The substrate decision

The enterprise AI context layer is one graph, derived continuously from every system of record an organization runs. Not two halves. Not data context versus operational context. One substrate, with uniform architectural commitments (graph-shaped, derived, MCP-exposed, identity-resolved, policy-enforced across ingest, storage, and query) that absorbs whatever SoRs the organization happens to be running.

The clearest way to state the architectural difference: Atlan, DataHub, and the rest of the data infrastructure category start from metadata governance over the data estate. SixDegree starts from operational systems of record and derives a live enterprise state graph across every workflow agents act on. Both can claim automation, both can claim graphs, both can claim MCP exposure. The distinction is what each architecture is denominated in. Once an organization's agentic workloads span beyond the data estate, the denomination matters more than the feature surface.

The data infrastructure category is building real products with real value, but its center of gravity is the wrong substrate for agentic workloads that span the enterprise. Curation has been failing at this category of problem for decades. The CMDBs, the service catalogs, the business glossaries. The pattern keeps recurring: declarative substrates for organizational knowledge decay. The AI context layer built on the same model will decay the same way. The right architecture is to skip the curation layer and let agents read from the SoRs directly, through a graph that the substrate keeps current on their behalf.

SixDegree is building that substrate. One graph, every SoR, every agent. The integrations catalog already runs across sales, support, operations, comms, data, and infrastructure. Whatever the application layer looks like in two years, or five, the graph absorbs it. The architecture is indifferent to which specific applications win or lose. The bet is on the structural property that organizations run systems, those systems emit state, and agents need to read across them coherently. That bet doesn't depend on predicting the next decade of enterprise software. It depends on the substrate being right.