sixdegree

AI Context Management: One Graph, Every System of Record

The enterprise AI context layer is one graph derived from every system of record an organization runs. Why the curation architectures are the wrong substrate.

AI Context Management: One Graph, Every System of Record

AI context management isn't a curation problem. It's a substrate problem, and the existing category is solving it with the wrong architecture.

Categories that have tried to organize enterprise knowledge declaratively have hit the same wall again and again. Data catalogs, CMDBs, service catalogs, business glossaries. Each was the right answer to a real problem: humans needed structured ways to understand what existed in their organizations and what it meant. Each has decayed in the same way, between the maintenance cycles that keep curated artifacts current and the underlying systems that move faster than any human can keep up with.

The AI context layer category is currently being built on the same model. Atlan, DataHub, Collibra, the data infrastructure category broadly. They're producing a new generation of curated artifacts (semantic models, metric definition stores, business glossaries with AI-assisted curation) and proposing them as the substrate that AI agents should read from. The category is sensing that the existing tools were built for human consumers and trying to retrofit them for agents.

The architecture hasn't changed. And the architecture is the problem.

This post argues that the enterprise AI context layer is one graph, derived continuously from every system of record an organization runs. By system of record we mean any system that holds authoritative state about some part of how the business runs. Not just the traditional CRM/ERP/warehouse triad, but the repositories, the on-call schedulers, the support tools, the internal applications that get built and retired, anything emitting events that describe what's true right now. The graph composes itself from those sources. It doesn't get curated into existence. The split between "data context" and "operational context" that the category is currently drawing isn't a real architectural distinction. It's an artifact of which vendors got to which territory first.

Context management vs context engineering

A quick note on terms, because the phrase "context management" is doing double duty.

Context engineering is the inference-time discipline of deciding what goes into the model's context window on a given turn: compaction, retrieval, tool result handling, sub-agent message routing. Anthropic's Effective context engineering for AI agents is the cleanest articulation of this. Context management is the upstream discipline of guaranteeing that the structured sources the agent reaches for are trustworthy, governed, and current. Engineering picks what to load. Management guarantees what's loadable is true. Both are necessary; their failure modes are different. The deeper comparison lives in a separate post: context engineering vs context management.

The rest of this post is about management.

The existing category got the substrate wrong

The data infrastructure category has spent the last two years pulling the term "context layer" toward a specific shape. Atlan's recent writing on context layers for AI agents is among the clearest articulations of this view. DataHub is explicitly positioning around "context management" as a category name. Collibra, Snowflake, ElixirData, and several smaller entrants are positioning around adjacent versions of the same architecture.

What they're building has real value. A metric definition store that resolves "active customer" consistently across Marketing, Finance, and Sales. An identity resolution layer that maps the same customer across Salesforce, Stripe, and HubSpot. A lineage graph that traces where a number came from and which transformations applied. These are real problems and the products solve real subsets of them.

But notice what the category's R&D effort is structured around. "Active metadata." "Automated lineage." "AI-assisted curation." "Continuous discovery." Every flagship feature is a mechanism for keeping curated artifacts from going stale. The entire feature surface of the category is structured around fighting decay.

That's a tell. The reason curation feels like manual labor that needs automation isn't that the tools are insufficiently clever. It's that the underlying substrate is declarative. A human declares what a metric means, what an entity is, what depends on what. The tool tries to automate as much of that maintenance as it can. The artifacts are constantly drifting away from the systems they describe, and the product's job is to drag them back into alignment.

This isn't a new pattern. The recurring pattern across declarative enterprise-knowledge categories is the same. The CMDB graveyard is the canonical example: fifteen years of products promising to give enterprises a single source of truth about their infrastructure, all of them maintained by human curators, all of them stale within months of going live. Service catalogs followed the same arc. We've written elsewhere about why the issue isn't Backstage as a product, it's the catalog model as a substrate. The pattern shows up across declarative organizational-knowledge categories throughout enterprise software history.

The right move isn't to add more automation on top of curation. It's a different architectural starting point. The data infrastructure category begins with metadata governance over the data estate and works outward. A derived context layer begins with operational systems of record across every workflow agents act on (code, customers, contracts, incidents, deployments, deals, support, infrastructure) and composes a live enterprise state graph from all of it. The difference isn't curation versus derivation as techniques. The difference is what the architecture is denominated in.

The application layer is in flux. The substrate isn't.

There's a debate playing out across AI infrastructure right now about what happens to traditional SaaS as agents become the primary interface to enterprise systems. Some people think the major SaaS applications evolve toward agent-readable APIs and survive. Some think they get replaced by purpose-built databases with thin agent interfaces on top. Some think entirely new categories of tool emerge, vibe-coded internal applications that didn't exist three years ago and won't be on anyone's roadmap until someone builds them in a weekend.

We don't have a strong opinion on which scenario dominates. We have a strong opinion that the answer doesn't change the substrate problem.

Whether an organization is running Salesforce in 2030, or a vibe-coded internal CRM that emerged from a Saturday afternoon, or three best-of-breed tools that didn't exist when you read this, the same fact is true. Those systems will emit entities. They'll record state. They'll generate events. Agents will need to read across them coherently. The substrate that does that reading isn't denominated in any specific application. It's denominated in the property that organizations run systems and those systems emit state.

The data infrastructure category is bound to the existing application landscape in a way a derived context layer isn't. Atlan, DataHub, and the rest of the data infrastructure category started from metadata governance over the data estate. Their architectures are downstream of the data estate. Their connector breadth, even where it now reaches into business systems, is shaped by data-team buyer motions and data-estate-shaped pricing. The product gravity is denominated in data systems, even as the marketing surface expands. A derived context layer just needs the systems an organization actually runs, whatever those are, today or in five years, to be emitting state. New system comes online, new connector, graph absorbs it. Old system retires, connector retires, graph rebalances. The architecture persists across application-layer churn because it isn't a function of which applications are running.

This is also why the "data half vs operational half" framing the category has drawn is artifactual. The category is now expanding past that line, but the line was never structural in the first place. The framing assumes that enterprise systems sort cleanly into "data systems" (warehouses, BI tools, semantic layers) and "operational systems" (CRMs, support tools, CI/CD, infrastructure). They don't. Salesforce is a CRM (operational) and a primary source of revenue and pipeline truth (data). Stripe is billing infrastructure (operational) and a primary source of revenue reality (data). Workday is HR operations and the canonical source of headcount truth. The warehouse is itself a derived view over many of these systems. The clean line the category wants to draw between "data" and "operational" doesn't survive contact with the actual enterprise stack. The category's recent expansion toward "business context" and "operational metadata" is itself an acknowledgment that the line doesn't hold.

The right model is simpler. Every system of record is a source. The graph derives itself from all of them. Whether a particular SoR feels "data-shaped" or "operations-shaped" is incidental. It's a categorization that comes from how vendors organized their product catalogs, not from anything architecturally meaningful.

One graph, every SoR

The substrate we're describing has a specific shape. It's worth being precise about the architectural commitments, because the contrast with the curation approach is sharpest there.

The graph is auto-derived from systems of record. Humans define the integration and identity rules; the graph state itself is derived from systems of record continuously, rather than maintained by hand. The integration layer subscribes to the events each SoR emits: webhooks, change feeds, polling for systems that don't push. The graph composes itself from those streams. When a new pull request opens in GitHub, an entity appears. When a Salesforce opportunity moves stages, the edge updates. When a customer's support ticket gets closed, the closure timestamp propagates. The graph reflects the current state of the underlying systems within a window measured in minutes, not maintenance cycles.

Identity is resolved across systems at ingest time. The same customer has different IDs in Salesforce, Stripe, HubSpot, Zendesk, and the warehouse. That's a feature of the underlying systems, not a bug to be hidden from the graph. Identity resolution is the substrate's job, not the agent's. When an agent asks about a customer, it gets the unified entity, and the provenance trail back to each source ID is queryable if it needs to be.

The graph is exposed through MCP. MCP has emerged as the default protocol surface for many agent systems, and a context layer that doesn't speak MCP is asking enterprise teams to bridge protocols themselves. There's craft in what makes an MCP server actually useful versus what just technically works: exposing a small, semantically coherent surface per entity type, rather than dumping the underlying API on the model. The MCP layer is where the architecture meets the agent.

Policy is enforced across ingest, storage, and query: query-time projection determines what each agent or user can see, and ingest-time controls determine what enters the graph in the first place. The same user asking the same question can get different graph projections depending on entitlement. The agent doesn't get to bypass policy by virtue of being an agent. Governance becomes meaningful, not theatrical, when the underlying graph is real and the policy enforcement is happening at the layer that actually sees the data.

These architectural commitments are uniform across SoRs. They don't change based on whether the system is GitHub or Salesforce or Snowflake or a vibe-coded internal tool. The substrate doesn't have a different mode for data systems and another mode for operational systems. It has one mode, and it absorbs whatever SoRs the organization runs.

The shape of the integrations catalog reflects this. SixDegree runs 60+ connectors across sales, support, operations, comms, data, and infrastructure. Not because the product is many things, but because the substrate is one thing. Warehouse and metadata-vendor integrations follow the same architectural pattern as the rest and ship as the design partner pipeline pulls them in.

We've made a related argument elsewhere about why a live knowledge graph is the missing context layer for safe agentic AI. The architectural commitments above are what make that thesis concrete in production.

SoR coherence is the hard work; the graph follows from it.

DimensionCurated contextDerived context
SubstrateDeclared entities and relationshipsAuto-derived from system events
MaintenanceContinuous human curationContinuous SoR integration
Staleness windowDays to months between updatesMinutes
Failure modeArtifact drifts from realityConnector breaks, surface visible
What happens when SoRs changeCurators update the artifactGraph updates from the source
Source of truthThe curated artifactThe underlying SoR
Representative patternData catalogs, CMDBs, service catalogs, business glossariesSoR-derived graphs, live ontologies
Center of gravityData estate metadata governanceOperational state graph

Why curation fails

The pattern is well-documented historically. The recurring pattern across declarative organizational-knowledge categories is the same.

Configuration management databases promised a single source of truth about infrastructure. They were the right idea. Operations teams genuinely needed to know what servers existed, what services ran on them, which depended on which. The implementation pattern was wrong. CMDBs were maintained by humans declaring the state, and the actual infrastructure moved faster than the declarations. Within months of any CMDB going live, the graph drifted from reality. The category absorbed billions of dollars in enterprise spending and is widely regarded as having decayed against its original promise. CMDBs are still deployed; engineers learned not to trust their accuracy.

Service catalogs are the current generation of the same mistake. The Backstage ecosystem, the IDP category broadly, all built on the same architecture: humans declare what services exist, what teams own them, what depends on what. The catalog goes stale the moment any service evolves faster than its catalog entry does. We've made this argument elsewhere about why static service catalogs fail and why the distinction between an IDP and a derived context layer is architectural, not a feature gap.

Business glossaries and curated semantic layers in the data infrastructure category are now repeating the pattern. Humans declare what metrics mean, the underlying calculations change, the glossary drifts. The category's response is to layer automation on top: AI-suggested updates, change detection against dbt models, lineage that updates from pipeline runs. None of which solves the substrate problem. They make the curation less manual, but the artifact is still a curated artifact, separate from the systems it describes, with its own maintenance lifecycle and its own drift.

The query optimization vs schema design analogy applies directly. Curation is to derivation as query optimization is to schema design. A clever query plan against a bad schema returns wrong answers fast. A clever curation tool against a fundamentally declarative substrate returns drift faster. The intervention that matters is the substrate, not the tools on top of it.

What a derived graph actually contains

The shape of an SoR-derived graph isn't theoretical. The questions agents need to answer across enterprise workloads cluster into a few recognizable patterns, and each pattern has the same structural property: the answer lives across multiple SoRs, can't be assembled by curation, and falls naturally out of a graph derived from the systems themselves.

Ownership across systems. Who owns this service, this account, this incident, this contract, this region? Ownership is recorded in different systems for different entity types: CODEOWNERS files, Salesforce account owner fields, on-call rotations, territory assignments, contract signatories. Each link in the chain comes from a different SoR. A derived graph composes the ownership picture from all of them continuously and reflects organizational changes the day they happen, not the next time someone updates a YAML file. The same problem appears for service ownership in engineering, account ownership in sales, and territory mapping in real estate. Different surfaces, same substrate.

Lineage and history. What's the current state of this thing, what was its previous state, what changed between the two, and what caused the change? This question shows up everywhere. Deployment lineage from the source repo through CI to the runtime environment. Contract history on an account. Pipeline stage transitions on a deal. Feature rollout sequences against a specific customer cohort. The graph has to reflect both current state and recent history, because cause-and-effect questions inherently require comparison across time. A static catalog can't do this. A derived graph differs from an IDP precisely on this dimension.

Blast radius. Before any agent takes an action, the responsible question is what else this action affects. Blast radius isn't an engineering concept. It's a pattern. Changing a pricing tier has a blast radius across customer accounts. Pulling a feature has a blast radius across renewal motions and reference customers. Rolling back a service has a blast radius across consuming services and the on-calls responsible for them. Blast radius analysis is one of the clearest tests for whether a context layer is actually useful at runtime.

Incident and decision history. Has this thing failed in this way before? What was the resolution? Are the same conditions recurring? This applies to production incidents, to customer escalations, to deal regressions, to security events. The graph needs PagerDuty, Linear, postmortem documents, Slack channels, support tickets, and account history, all correlated through identity. Without it, every agent investigation starts from scratch, which is the same problem tribal knowledge creates for humans, now magnified because agents have no prior experience to fall back on.

Cross-domain chains. The most consequential queries cross domains. Which PR introduced the bug that's affecting Acme Corp right now? Which deploy correlates with the support ticket spike that started Tuesday? Which feature rollout caused the conversion drop on the checkout flow? Which customer signals (escalations, churn risk, support volume) should resolve back to the engineering changes that caused them? Connecting code to customer and customer back to code requires bridging GitHub, the CI/CD platform, the deployment system, the customer analytics platform, and the support tool. No single system owns this chain. The shift from dashboards to conversations only works when the chain is real and queryable.

Identifier semantics across systems. Nothing has a stable cross-system identifier. The "service" in your catalog has a different name in Kubernetes, a different label in Datadog, a different repository name in GitHub, a different alert routing key in PagerDuty. The "customer" in Salesforce is a different ID in HubSpot, a different account in Zendesk, a different row in the warehouse. Identity resolution across systems is the load-bearing infrastructure underneath every other pattern. Without it, every cross-system query breaks at the first hop. MCP needs connective tissue for exactly this reason: the protocol assumes coherent identity across tools, but coherence doesn't happen by itself.

Two scenarios

A coding agent is asked to make a rollback decision. A new version of a payments service shipped at 14:02. By 14:11 the error rate on the dependent frontend has tripled. The agent has to assemble: which exact service deployed at 14:02, which version was previously running and whether it was healthy, which downstream services consume the affected APIs, what the historical error rate baseline looks like, who is on-call for both the regressing service and its dependents, what the blast radius of the rollback itself looks like, whether the deploy has pending verification jobs still running, whether there's a feature flag that could be disabled instead.

A CX agent is asked to assemble a churn-risk picture. A high-value enterprise customer at $1.4M ARR has a renewal in 52 days. The CSM's auto-flagged churn score just jumped from yellow to red. The agent has to assemble: who owns this account across sales, CS, and support; what the last QBR covered and what action items came out of it; which support tickets are open and which were recently closed; whether any P1 incidents touched this customer's deployment in the last quarter; what features they've requested that are still pending; which executive sponsor has the relationship; whether the renewal motion has started in the CRM and at what stage; what the blast radius of losing them looks like across reference accounts and pipeline.

Look at the systems each agent has to traverse. The rollback crosses GitHub, the CI/CD platform, Argo, Kubernetes, PagerDuty, Datadog, and the service ownership graph. The renewal crosses Salesforce, Zendesk, the incident tracker, Slack, the product analytics system, the CRM, and the account ownership graph. The systems overlap. The same customer that appears in the renewal scenario is the affected downstream account in the rollback scenario, and the same engineer who owns the regressing service is the one whose on-call rotation determines who hears about it first.

Different SoRs, different agents, different surface queries. Same substrate. One graph, derived from every system the organization runs, with identity resolved across all of it, exposed to both agents through MCP.

A curation-based approach would require maintaining separate artifacts for each domain: a data catalog for the customer side, a service catalog for the engineering side, glossaries for each, lineage trees for each. Each artifact drifts from its underlying systems on its own schedule, and the cross-domain queries (the customer affected by the rollback, the engineering change that caused the support spike) become integration work between artifacts rather than the native shape of the substrate.

A derived graph spans both because it's downstream of the SoRs, not upstream of them.

Implementation patterns

The patterns that actually make derived context layers work at production scale are well-understood individually. The work is in combining them into a coherent platform rather than a bag of integrations.

The exposure layer is MCP. It's emerged as the default protocol surface for many agent systems, and a context layer that doesn't speak MCP is asking enterprise teams to bridge protocols themselves. A good MCP surface exposes a small, semantically coherent set of tools per entity type rather than dumping the underlying SoR API on the model.

Auto-derivation is the only sustainable maintenance model. Manually curated graphs are the failure mode the CMDB and service catalog categories spent fifteen years demonstrating. Each operational system is modeled as a coherent integration that emits entities, relationships, and updates continuously. The graph composes itself from the union of those streams. This is where the SoR coherence work lives: getting each integration to emit a clean, stable, identity-resolved view of its source system. The graph follows from it.

Live updates versus batch ETL is a category boundary. A batch-derived graph that refreshes nightly is a catalog with extra steps. For operational agents acting on live state, yesterday's graph can be fiction. This isn't a performance requirement. It's a correctness requirement. A rollback agent reading yesterday's service dependency graph would have caused outages. A renewal agent reading last week's ticket queue would have missed the escalation.

Progressive disclosure keeps the agent from drowning. A complete derived context layer can expose thousands of entities and hundreds of MCP tools. Surfacing all of them on every turn collapses model performance. We benchmarked exactly how badly that goes in MCP tool overload. The pattern that works is to start with the entities and tools relevant to the current task and let the graph reveal more as the conversation traverses. Progressive disclosure for agents is the architectural pattern; it's the difference between a useful context layer and an overwhelming one.

Identifier resolution is load-bearing infrastructure. Every integration needs a strategy for emitting entities with stable identifiers that the graph can join across system boundaries. Done well, the agent never has to know that "service X" has eight different aliases or that "customer Acme" lives under five different IDs. Done poorly, every query breaks at the first traversal and the graph effectively isn't connected.

Operating MCP servers in production covers the day-two operational realities of running this kind of system at enterprise scale. For tactical guidance on what to fix first if you're already running agents, the companion piece on AI context management best practices covers the disciplines that make the architecture above actually work.

Frequently asked questions

What is AI context management?

AI context management is the upstream discipline of maintaining a graph derived from systems of record that AI agents can read from at inference time. It guarantees that the structured sources the agent reaches for are accurate, current, governed, and coherent across systems. It operates alongside context engineering (the inference-time discipline of deciding what to load into the prompt on a given turn), but the two are distinct disciplines with different failure modes. For enterprise agents, context management is typically the binding constraint, because a perfectly engineered prompt against a stale source produces a confident wrong answer.

What's the difference between context engineering and context management?

Context engineering is the inference-time discipline of deciding what goes into the model's context window on a given turn: compaction, retrieval, tool result handling, sub-agent routing. Context management is the upstream discipline of guaranteeing that the structured sources the agent reaches for are governed, current, and trustworthy. Engineering picks what to load. Management guarantees what's loadable is true. Most teams need both; their failure modes are different.

Why isn't a data catalog enough?

Data catalogs are primarily metadata-governance systems. Even with active metadata and automated lineage, their center of gravity is curated understanding of the data estate. A derived context layer composes itself from systems of record continuously. The difference is the substrate, not the surface. Both can expose similar-looking metadata to agents, but one is a snapshot drifting from reality and the other is a live view of it. We cover the contrast in depth in context layer vs data catalog.

Do I need a context layer if I'm already using RAG?

Probably yes. Plain document RAG handles passage retrieval well but doesn't reliably handle relationship traversal. Questions like 'who owns this service,' 'what's the blast radius of this change,' or 'which incidents have touched this customer' require structural answers that document retrieval can't produce. A RAG pipeline against the documents your catalog points to returns semantically similar text without ever resolving the relational truth underneath. We benchmarked where RAG falls short for agentic workloads; the conclusion is that retrieval and reasoning over relationships are different problems.

Is SixDegree a competitor to Atlan or DataHub?

Not in the traditional sense. Atlan, DataHub, and the data infrastructure category start from the data estate and metadata governance. SixDegree builds a derived graph across all systems of record, data ones included. A customer running Atlan today has a data system of record that SixDegree can integrate alongside the others; the architectures are different but the surface areas can overlap. The architectural critique stands (curation as the substrate doesn't scale), but the relationship at the customer level is integrative, not zero-sum.

What systems does SixDegree integrate with today?

SixDegree runs 60+ connectors across sales (Salesforce, HubSpot, Outreach, Gong), support (Zendesk, Intercom, Genesys, Qualtrics), operations (Workday, NetSuite, QuickBooks, Okta), comms (Slack, Notion, Confluence), data (Looker, Mixpanel, Google Analytics), and infrastructure (GitHub, Jira, Argo, Kubernetes, PagerDuty, Datadog, Stripe, and more). Warehouse and data-catalog integrations follow the same architectural pattern as the rest and ship as the design partner pipeline pulls them in.

How does MCP relate to context management?

MCP is the protocol surface through which agents discover and invoke tools and resources at inference time. It's how the context layer exposes itself to the model. By itself, MCP doesn't guarantee that the context being queried is accurate, current, or coherent across systems. That's the context management work. A poorly built MCP surface against a stale graph is still stale. MCP needs connective tissue underneath it to actually deliver on what the protocol promises.

The substrate decision

The enterprise AI context layer is one graph, derived continuously from every system of record an organization runs. Not two halves. Not data context versus operational context. One substrate, with uniform architectural commitments (graph-shaped, auto-derived, MCP-exposed, identity-resolved, policy-enforced across ingest, storage, and query) that absorbs whatever SoRs the organization happens to be running.

The clearest way to state the architectural difference: Atlan, DataHub, and the rest of the data infrastructure category start from metadata governance over the data estate. SixDegree starts from operational systems of record and derives a live enterprise state graph across every workflow agents act on. Both can claim automation, both can claim graphs, both can claim MCP exposure. The distinction is what each architecture is denominated in. Once an organization's agentic workloads span beyond the data estate, the denomination matters more than the feature surface.

The data infrastructure category is building real products with real value, but its center of gravity is the wrong substrate for agentic workloads that span the enterprise. Curation has been failing at this category of problem for decades. The CMDBs, the service catalogs, the business glossaries. The pattern keeps recurring: declarative substrates for organizational knowledge decay. The AI context layer built on the same model will decay the same way. The right architecture is to skip the curation layer and let agents read from the SoRs directly, through a graph that the substrate keeps current on their behalf.

SixDegree is building that substrate. One graph, every SoR, every agent. The integrations catalog already runs across sales, support, operations, comms, data, and infrastructure. Whatever the application layer looks like in two years, or five, the graph absorbs it. The architecture is indifferent to which specific applications win or lose. The bet is on the structural property that organizations run systems, those systems emit state, and agents need to read across them coherently. That bet doesn't depend on predicting the next decade of enterprise software. It depends on the substrate being right.

Become a design partner.

Shape the substrate. Lock in early pricing. Direct founder access. Limited spots available.

Apply for a design partner spot