Service Catalog vs Live Ontology: Why Static Catalogs Fail

I recently spoke with an infrastructure lead at a large media company. His team runs Backstage. They are actively evaluating Cortex as a replacement. And they still cannot answer a basic question: who owns Datadog?

"Two teams claimed to own Datadog," he told me. "There's no authoritative source of truth."

This is not an edge case. It is the norm. And it reveals a structural problem with how service catalogs work.

The Ownership Problem Nobody Talks About

Service catalogs like Backstage, Cortex, and OpsLevel solve a real problem. In any organization past about fifty engineers, nobody has a complete mental model of the system. A service catalog gives you a central place to register services, declare ownership, and list dependencies.

The concept is sound. The implementation model is where it breaks.

At this media company, ServiceNow was supposed to track ownership. "It was a policy," the infrastructure lead explained. "Sometimes it didn't happen." The result is predictable: ownership data exists in some entries, is missing in others, and is outright wrong in enough places to make the whole system untrustworthy.

This is not a discipline problem. It is a structural one. Service catalogs are declarative documents. Someone writes a catalog-info.yaml, checks it into a repo, and the catalog ingests it. This works exactly as long as humans keep those files up to date. Which is to say, it works for about a quarter before drift sets in.

Different Tools, Same Blind Spots

The landscape of "system of record" tools is crowded, and none of them actually provide a complete picture.

Service catalogs (Backstage, Cortex, OpsLevel) answer "what services exist and who owns them." They do not answer "what depends on this service" or "what breaks if it changes," because those relationships live in systems the catalog does not connect to. Your dependency graph is only as good as what someone remembered to declare in YAML.

CMDBs (ServiceNow, Device42) track IT assets. They are rarely real-time, rarely complete, and not designed for the kind of cross-domain correlation that modern infrastructure requires. ServiceNow does not connect to GitHub, Kubernetes, or your identity provider in any meaningful way. It is a database of records that someone was supposed to keep current.

Observability platforms (Datadog, Dynatrace) show metrics and traces, which is essential during incidents. But they do not connect to your identity provider, your ticketing system, or your cloud IAM policies. They tell you what is slow. They do not tell you who is responsible or what else is affected.

The result is that teams have Backstage AND Datadog AND ServiceNow, and they still cannot answer "what breaks if we deprecate this service" without a week of Slack archaeology.

When Change Communication Depends on People, Not Process

The infrastructure lead I spoke with described another failure mode that catalogs cannot solve. When his team makes a change that affects other teams, good communication about that change only happens "when a PM is involved." It is person-dependent, not process-dependent.

Think about what that means. The blast radius of a change (who needs to know, what might break, which downstream consumers are affected) is determined by whether someone with the right organizational awareness happens to be paying attention. If the PM is on vacation, or if the change seems minor enough to skip the usual process, affected teams find out when things break.

No service catalog addresses this. Catalogs can tell you that Service A exists and Team B owns it. They cannot tell you that Service A's API is consumed by three other services through a shared library, that one of those services has a hardcoded timeout that will trigger cascading failures if latency increases, and that the team responsible for that service is in a different timezone and will not see the Slack notification until morning.

What a Live Ontology Provides

A live ontology takes a different approach. Instead of asking humans to declare what exists and how it connects, it discovers that information directly from the infrastructure, continuously.

Discovery vs. declaration. A live ontology queries your actual infrastructure: Kubernetes clusters, cloud providers, CI/CD systems, monitoring tools, and source repositories. It builds the graph from what it finds. No YAML required. No ownership forms to fill out. The representation tracks reality because it is derived from reality.

Relationships from reality. Dependencies are derived from actual runtime behavior and configuration, not from someone's memory of the architecture. Network policies, service mesh configurations, database connections, and API calls are all observable facts. When someone asks "what depends on this service," the answer comes from what is actually connected, not from what was declared eighteen months ago.

Cross-domain correlation. A live ontology connects entities across domains that catalogs typically silo. The Git repository, the CI pipeline, the container image, the Kubernetes deployment, the monitoring dashboard, and the on-call rotation are all related. A live ontology surfaces those relationships without requiring someone to manually wire them together.

This is the difference between asking "who owns this?" and getting an answer from a stale YAML file, versus asking the same question and getting an answer derived from Git commit history, deployment records, and on-call schedules.

When Static Catalogs Still Make Sense

It would be dishonest to claim that a live ontology replaces every function of a service catalog. There are things that only humans know and that cannot be discovered automatically:

Business context. What this service does in business terms, who its customers are, what revenue it affects. This is human knowledge by nature.
Intended architecture. Sometimes you need to express what the system should look like, not just what it looks like now. Catalogs can serve as a target state document.
Compliance metadata. Data classification, regulatory obligations, and policy decisions are not discoverable facts. They require human judgment and declaration.

A catalog is the right tool when the information is human-originated and changes slowly. An ontology is the right tool when the information is machine-observable and changes frequently.

The Hybrid Approach

The most effective setup combines both. Use a live ontology as the foundation: the accurate, always-current graph of what actually exists and how it connects. Layer human-curated metadata on top where it adds real value: business context, architectural intent, compliance annotations.

This inverts the maintenance burden. Instead of humans maintaining everything and machines checking their work, machines maintain the factual substrate and humans contribute only what machines cannot discover. The surface area for human maintenance shrinks to a fraction of what it was.

The infrastructure lead I spoke with would no longer need to resolve which of two teams owns Datadog by asking around. The system would show who has active integrations, who deploys to it, who responds to its alerts. Ownership would be an observable fact, not a policy that "sometimes didn't happen."

Practical Questions to Ask

If you are evaluating your approach to system understanding, consider:

How much of your catalog is discoverable? If 80% of your catalog entries could be derived from your actual infrastructure, you are spending 80% of your catalog maintenance effort on work that could be automated.

How many tools do you consult during an incident? If engineers are switching between the catalog, the cloud console, Kubernetes dashboards, and monitoring tools to build a mental model, you do not have a single source of truth. You have a table of contents.

Can you answer "what breaks if we change this" without asking around? If the answer requires Slack threads, tribal knowledge, or hoping a PM is involved, your system understanding is person-dependent. That is a fragility, not a process.

SixDegree takes the live ontology approach, using molecules to continuously discover and connect entities across your infrastructure. The result is a real-time graph that reflects what is actually running, how it is connected, and what depends on what, without asking engineers to maintain YAML files they will inevitably forget about.

The service catalog was the right idea at the wrong layer of abstraction. The goal was never to have a well-maintained registry. The goal was to understand your systems. Starting from reality, rather than asking humans to describe it, is how you get there.

Frequently Asked Questions

What is the difference between a service catalog and a live ontology?

A service catalog like Backstage, Cortex, or OpsLevel is a declarative registry: humans write catalog-info.yaml to declare what exists and who owns it, so it drifts about a quarter after setup. A live ontology discovers entities and relationships directly from your infrastructure, continuously, so the graph tracks reality because it is derived from reality.

Why do service catalogs like Backstage go stale?

Catalogs are declarative documents that only stay accurate as long as humans keep the YAML up to date. New services ship without entries, ownership changes without updates, and links rot. It is structural, not a discipline problem: the catalog mirrors what someone remembered to declare, not what is actually running.

What is a live ontology?

A live ontology queries your actual infrastructure (Kubernetes, cloud providers, CI/CD, monitoring, and source repos) and builds a graph from what it finds, with no YAML required. Dependencies come from observable runtime behavior and configuration, and entities are correlated across domains so ownership becomes an observable fact rather than a policy that sometimes did not happen.

Do I still need a service catalog if I have a live ontology?

For some things, yes. Business context, intended target-state architecture, and compliance metadata are human-originated and cannot be discovered automatically. The most effective setup is hybrid: a live ontology maintains the factual substrate and humans layer on only what machines cannot discover, which shrinks the human maintenance burden to a fraction.