Open Semantic Interchange (OSI): Can Enterprise really implement it?

OSI interchange layer

Snowflake, Salesforce, dbt Labs, Databricks, and fifteen other industry leaders just finalized a common specification for how AI systems, BI tools, and data platforms read and share your business definitions. The OSI spec is done, the harder question, whether enterprises can actually adopt it, starts now.


The problem has a name: semantic drift

Picture a board meeting where Finance pulls Q1 revenue at $48M and Marketing pulls it at $51M. Both numbers are technically correct. Finance excludes trial conversions; Marketing does not. Neither definition is wrong, they are just inconsistent, baked into separate dashboards, written in separate SQL, and maintained by separate teams with separate incentives. This is semantic drift: the slow divergence of business definitions across tools, teams, and time.

In a world where decisions were made by humans reading dashboards, semantic drift was expensive but manageable. When an executive noticed the discrepancy, someone could explain it. As enterprises route AI agents through their data stacks, that same drift becomes actively dangerous. An LLM asked “what was our Q1 revenue?” does not know which version to use. It infers from schema and column names, writes its best-guess SQL, and returns a number that may be accurate or may not and there is often no reliable way to tell from the output alone.

This is the structural problem the Open Semantic Interchange is designed to address.


What OSI actually specifies

The OSI specification, finalized at v1.0 in January 2026, is a vendor-neutral, Apache 2.0-licensed, YAML-based format for representing semantic layer constructs. It covers five core entities: datasets (mappings between physical data and logical business models), metrics (quantitative calculations sums, ratios, period-over-period comparisons), dimensions (categorical attributes for slicing), relationships (join logic and cardinality between datasets), and contexts (scopes that shape how definitions apply across different use cases).

The critical distinction is what OSI is not. It is not a semantic layer tool. It is an interchange format, the same way OpenAPI is not an API, but a standard for describing APIs. A team using dbt’s MetricFlow, Cube, Snowflake Cortex Analyst, or Salesforce Data Cloud can express their definitions in OSI-compliant YAML and publish them. Any downstream consumer that understands OSI an AI agent, a BI platform, a data catalog can read those definitions deterministically, without guessing.

The dbt Labs framing captures the design intent well: OSI enables a metrics-as-code workflow where “revenue” is defined once, version-controlled, and consumed consistently everywhere. That is not a product pitch. It is a description of what the interchange format makes technically possible.

OSI as semantic interchange layer OSI spec Apache 2.0 · YAML format · GitHub Datasets · Metrics Dimensions · Relationships Contexts & scopes Vendor-neutral YAML Version-controlled Git-native definitions dbt MetricFlow Metric definitions in code Snowflake Cortex Analyst models Cube & Salesforce Governed semantic layers Databricks Unity Catalog semantics AI agents LLM orchestration layers BI tools ThoughtSpot, Hex, Sigma Data catalogs Atlan, Collibra, DataHub Custom consumers Any OSI-compatible reader Semantic sources Consumers

OSI sits between semantic layer producers and AI/BI consumers, providing a vendor-neutral exchange format.


Five things the spec finalization actually means

The working group is real and unusually broad. The OSI initiative spans Alation, Atlan, BlackRock, Collibra, Cube, DataHub, Databricks, Hex, Honeydew, Informatica, Instacart, Mistral AI, Omni, Salesforce, Snowflake, ThoughtSpot, and more. This is not a single vendor trying to colonize a standard through consortium theater. It is a genuine multi-stakeholder effort, which matters for longevity.

Governance will move to Apache. Founding members have committed to donating the project to the Apache Software Foundation as it matures. That shifts control away from any single vendor and provides a governance model that enterprise legal and procurement teams can reason about without fear of lock-in.

dbt open-sourcing MetricFlow was a structural signal, not a PR move. When dbt Labs open-sourced MetricFlow, it was making the engine that powers its semantic layer freely available and OSI-interoperable. Platform teams already running dbt at scale can now treat their metric definitions as portable not tied to dbt’s commercial offering.

No vendor has shipped production import tooling yet. The spec is finalized. The tooling is not. Phase 2 of the OSI roadmap, native support in 50-plus platforms, is scheduled for Q2 through Q4 2026. Enterprises adopting OSI today are early. There is real first-mover advantage in understanding the spec now; there is also real integration risk in depending on it before the ecosytem’s tooling matures.

The LLM grounding payoff is measurable. dbt’s internal testing found natural language query accuracy at roughly 83% when queries are backed by a governed semantic layer, versus approximately 40% when the LLM writes raw SQL against a schema. That gap is precisely what OSI is designed to close at scale, across heterogeneous platforms.

The enterprise edge is still underspecified. As Brooklyn Data’s analysis notes, clean metrics the kind that map to a single aggregation over one dataset represent perhaps 80% of typical analytics. The remaining 20% is where enterprises actually live: cross-domain calculations, business-rule-heavy definitions, regulatory carve-outs, and metrics that span three systems. That 20% resists clean YAML representation. It is also the 20% that matters most for board-level decisions.

The Databricks entry matters more than it appears. Databricks joining the working group after the initial announcement, alongside its Unity Catalog business semantics general availability, signals that OSI is not a Snowflake-led play. When the two dominant lakehouse platforms both commit to the same interchange format, the semantic layer ecosystem has a real convergence point for the first time.


Deep breakdown: governance is the actual implementation problem

Here is where most platform teams will get into trouble. OSI solves the interchange problem. It does not solve the ownership problem, and ownership is where semantic models go to die.

Consider what adopting OSI at enterprise scale actually requires. You do not have zero metric definitions today. You have too many. “Revenue” likely lives in your dbt project, your Salesforce reports, your Looker explores, several Tableau calculated fields, and at least three Excel models that Finance is unwilling to retire. The problem is not the absence of definitions it is an abundance of conflicting ones. Consolidating them into an OSI-compliant canonical model requires three things that a YAML spec cannot provide: cross-functional alignment, named ownership, and an active change process.

Salesforce’s engineering team put this directly: a semantic layer without named owners degrades quickly. Definitions go stale, certifications lapse, and governance becomes theater. The technology enforces format and structure. It cannot enforce accountability.

The versioning question is equally underestimated. Organizations run on metrics that change. Revenue recognition rules shift with new product lines. Customer definitions evolve with market segments. Regulatory requirements rewrite retention calculations overnight. An OSI-compliant definition of “ARR” published today may be meaningfully wrong by Q3 if there is no change process attached to it. The spec supports versioning as a concept. Making versioning operational deciding who approves changes, how downstream consumers are notified, what deprecation looks like is governance work that platform teams will have to design themselves.

Enterprise OSI implementation phases 01 Audit Inventory existing metric definitions across all tools, teams, and BI platforms 02 Consolidate Resolve conflicts, assign named owners, establish canonical definitions per domain 03 Publish in OSI Express canonical definitions as OSI-compliant YAML via your semantic layer tooling 04 Govern continuously Version control changes, notify consumers on updates, deprecate stale definitions Most teams start here skipping steps 01 and 02

Most teams jump to publishing OSI YAML before the governance foundation exists to sustain it.

There is also a subtler adoption challenge that the spec’s working group has not fully addressed: the consumption side requires coordination across teams that have historically operated independently. A data catalog team, a BI vendor, and an AI platform team each need to implement OSI readers. Each will do it on their own timeline. Even if your semantic model is perfectly expressed in OSI YAML by mid-2026, the full value of the interchange format depends on your downstream consumers having shipped their OSI-compatible readers. You are not controlling that schedule.


Contrarian take: we have been here before

It is worth remembering that the semantic web was supposed to solve exactly this problem in 2001. OWL, RDF, SPARQL all rigorous, well-designed specifications for representing and querying structured knowledge. The enterprise adoption rate was effectively zero outside of specific academic and government domains. The technology was not the problem. The incentive structure was.

The semantic web asked enterprises to invest significant effort in structured knowledge representation so that search engines could return marginally better results. The value case was diffuse and the incumbent behavior humans reading documents and resolving ambiguity in meetings was good enough. OSI asks enterprises to invest similar organizational effort so that AI agents can retrieve business context without hallucinating metric calculations. That is a fundamentally different value case.

The forcing function is concrete. Organizations routing AI agents through their financial, sales, and operational data need those agents to return the right number, not a plausible number. The difference between 83% accuracy and 40% accuracy in natural language queries against your data stack is not abstract. It is a number your CFO will have an opinion about when the AI-generated revenue forecast disagrees with the board deck.

That said, the history is a calibration worth holding. Standards with broad working group support and solid technical foundations can still fail to achieve adoption if the integration path is too steep or the organizational change required is underestimated. OSI’s Phase 2 roadmap targets 50-plus native platform integrations. But native support in tools is not the same as governance inside organizations. The tooling will arrive on schedule or close to it. The institutional work ownership, process, accountability is harder to roadmap.


Tool worth attention: MetricFlow (now open source)

dbt Labs’ decision to open-source MetricFlow alongside its OSI commitment is the most practically significant move for platform teams evaluating where to start.

MetricFlow is a metrics-as-code framework. You define metrics in YAML adjacent to your dbt models, and MetricFlow handles query generation including multi-dimensional slicing, time granularity, and cross-dataset joins. The definition lives in version control. The output is consistent regardless of which tool is consuming it. Downstream BI tools and AI agents query the MetricFlow service layer rather than the underlying tables directly, which means the business logic is centralized, not duplicated.

What makes this OSI-relevant: MetricFlow-defined metrics are expressible in OSI-compliant format, which means any tool that adopts OSI can read the same definitions your dbt project already maintains. For organizations already running dbt at scale, MetricFlow is the most natural OSI on-ramp available today. The spec works better when a credible reference implementation already exists, and MetricFlow is that implementation.


The question platform teams need to answer before anything else

Before your team begins mapping a semantic model to OSI YAML, the prior question is simpler and harder: can you list your top 20 business metrics, name who owns each one, and describe the last time each definition was formally reviewed? For most enterprise data platforms, the honest answer is no.

If that is your situation, the OSI migration conversation is premature. You are not ready to exchange semantics you have not yet governed. Adopting OSI in that state produces an OSI-compliant YAML file that encodes your existing confusion in a new format which is worse than the status quo because it looks like progress.

The right sequence is audit first, then consolidate, then express in OSI, then govern the ongoing change process. Skipping to step three because the tooling exists is how you end up with a spec adoption story that looks good in a blog post and breaks the first time an AI agent actually queries a metric that changed six months ago.

The rule: If you cannot govern it, do not standardize it yet. Start with the ten metrics that appear in board presentations, assign a named owner to each, version the definitions in Git, and establish a lightweight change process. That foundation is what makes OSI adoption durable. The spec gives you the exchange format. Governance is the only work that cannot be outsourced to a working group.


References: OSI specification (GitHub) · OSI v1.0 announcement (Snowflake) · OSI industry launch (BusinessWire) · dbt on OSI and MetricFlow · MetricFlow open-sourced (dbt Labs) · Ending semantic drift (Salesforce) · Where OSI stands today (Brooklyn Data) · Semantic layer accuracy benchmarks (dbt Labs) · OSI expands partners (Snowflake) · open-semantic-interchange.org

Similar Posts