Documentation source
Open Knowledge Format (OKF)
How Amble maps to and interchanges with Google Cloud's Open Knowledge Format v0.1 — the entity graph as a vendor-neutral, agent-readable markdown bundle, plus the industry-vocabulary crosswalk (OSI, OpenLineage, DCAT, Croissant, Frictionless).
## Overview
The **Open Knowledge Format (OKF) v0.1** (`GoogleCloudPlatform/knowledge-catalog/okf`) is a vendor-neutral, "format not platform" convention for representing knowledge as plain markdown so humans, agents, and existing tools can all read it without bespoke SDKs. Amble treats OKF as its **knowledge-interchange format**: a way to export a tenant's knowledge graph as a portable, git-diffable, LLM-ingestible bundle, and to import OKF bundles back into the entity graph.
The key insight — and the reason this is cheap — is that **Amble already ships ~90% of an OKF producer/consumer**. The Obsidian-interop vault export/import path (`app/api/entities/{export,import}/vault/route.ts` + `generateEntityMarkdown`/`parseEntityMarkdown` in `features/entities/type-spec/entity-markdown.ts`) already emits `{type-slug}/{slug}.md` files with YAML frontmatter, a non-empty `type`, tags, and a markdown body. OKF alignment is therefore an **in-place hardening of that single path into OKF v0.1 conformance**, never a new module or store. See ADR-0056 (`documents/adr/0056-open-knowledge-format-alignment.md`) and [Obsidian Interop](/docs/features/obsidian-interop) for the underlying machinery.
> **Status:** proposed (Phase 0 funded). This page documents the target integration and the conformance contract. The export/import routes referenced as `/okf` are the renamed-in-place successors to the live `/vault` routes; until Phase 0 lands they live at `/vault`.
## What OKF is (the spec in one screen)
- A **bundle** is a directory of UTF-8 markdown files. Each non-reserved `.md` is a **Concept** = YAML frontmatter + free-form markdown body.
- A **Concept ID** is the file path minus `.md` (e.g. `tables/users.md` → `tables/users`). It is the stable identifier consumers index on.
- The **only required field is a non-empty, free-text `type`**. Type values are never centrally registered; consumers MUST tolerate unknown types.
- **Recommended frontmatter** (priority order): `title`, `description` (one sentence), `resource` (URI to the underlying asset), `tags` (list), `timestamp` (ISO 8601 last meaningful change). Producers MAY add arbitrary keys; consumers **MUST preserve unknown keys**.
- The **graph** is built from plain markdown links in the body (absolute bundle-relative `/path.md` RECOMMENDED; relative allowed). Edges are untyped — the relationship lives in the surrounding prose. A **Citation** is a link to an external source backing a claim.
- **Reserved filenames:** `index.md` (progressive-disclosure listing; frontmatter-free except the bundle root may declare `okf_version`) and `log.md` (chronological change history, newest-first, with `**Creation**` / `**Update**` / `**Deprecation**` bold-word verbs).
- **Conformance is purely structural (§9):** parseable frontmatter + non-empty `type`. Consumers MUST NOT reject on missing optional fields, unknown types, extra keys, broken links, or a missing `index.md`.
A real concept doc from the GA4 sample bundle:
```markdown
---
type: BigQuery Table
resource: https://bigquery.googleapis.com/v2/projects/bigquery-public-data/datasets/ga4_obfuscated_sample_ecommerce/tables/events_*
title: Events table (Google Analytics BigQuery Export)
description: Contains Google Analytics event export data.
tags:
- events
- BigQuery
- ecommerce
timestamp: '2026-05-28T22:53:05+00:00'
---
# Overview
...
# Schema
| Column | Type | Description |
|--------|------|-------------|
...
# Joins
Joined with [users](/tables/users.md) on `user_pseudo_id`.
```
## Where OKF sits in the standards landscape
OKF is at a **higher altitude than the catalog/semantic standards** it gets compared to — it *wraps and points at* them rather than competing to be one.
| Band | Standard | Answers | Serialization |
|---|---|---|---|
| Agent-narrative | **OKF**, `llms.txt`, `AGENTS.md`, MCP Resources | "Give an LLM/agent readable context over a knowledge graph" | Markdown + YAML |
| Semantic layer | **OSI** (Snowflake/Salesforce/dbt/Google/AWS, 2025), dbt MetricFlow, Cube | "Define metrics, dimensions, relationships as code" | YAML / JSON |
| Catalog / metadata | DCAT, schema.org/Dataset, Croissant, Frictionless Data Package | "Describe a dataset/table/column rigorously" | RDF / JSON-LD / JSON Schema |
| Lineage | OpenLineage | "Emit run/job/dataset lineage events" | JSON events |
| Physical | Apache Iceberg, Parquet | "Lay bytes on disk" | Binary / manifest |
OKF abandons the schema-registry approach of the catalog band: its only required field is a free-text `type`, its graph is plain markdown links, and its philosophy is "format not platform." It points at the rigorous artifacts via `resource` URIs. **The strategic read for Amble:** keep the DB-driven generic engine, and borrow the shared ecosystem vocabulary (`dataset`, `resource`, `field`, `metric`/`dimension`/`measure`, `lineage`, `knowledge graph`) at the naming layer to be legible to agents trained on this ecosystem — a cheap future-proofing bet as the largest vendors converge on OSI for AI readiness.
## The mapping: Amble primitive ↔ OKF concept ↔ industry analog
| Amble primitive | OKF v0.1 | Industry analog |
|---|---|---|
| `entities` row (`title`, `description`, `content` jsonb, `tags`) | **One Concept doc** (`<type-slug>/<slug>.md`); `type`=`entity_type_slug`, `title`=`title`, body=`description`, extra frontmatter→`content` keys (preserved verbatim) | schema.org/Dataset record; Frictionless row |
| `entity_types.slug` / `.name` | A Concept with `type: "Entity Type"`; the slug IS what other concepts emit as their `type` | DCAT `Catalog` / schema.org `DataCatalog`. **OKF `type` ≈ `entity_type_slug` — both free-text, no central registry** |
| `FieldDefinition` + `json_schema.properties` | Rendered in a `# Fields` / `# Schema` body table — **not** exploded into frontmatter | Frictionless `fields[]`; Croissant `Field`; schema.org `variableMeasured` — **keep the word "field"** |
| `entity_relations` (`from`/`to`/`relationship_type`) | Untyped markdown link in body (`/type-slug/slug.md`), gloss = relationship type **+ `amble_relationship_type` per link for lossless round-trip** | Croissant `references`; Frictionless `foreignKeys`; OSI `relationships` (MISMATCH: Amble typed, OKF untyped) |
| `external_data_sources` / `integration_connectors` | A Concept with `type: "Data Source"` / `"Dataset"` / `"Metric"`; `resource` = `<externalSourcePrefix>:<resource>` or source URL | DCAT `dcat:Dataset`/`Distribution`; `resource` term shared by OKF + Frictionless + DCAT |
| `external_data_points` + ADR-0052 `metric` data-source kind | Concept `type: "Metric"`; latest/history in a `# Metrics` body section | dbt MetricFlow / Cube / OSI `metric`/`measure`/`dimension` |
| `criteria_sets` + `entity_responses` dimensions | Scored rubric in a `# Scoring` body section or `amble_*` keys; promoted score → `amble_score` | OSI `metric` + `dimension` (a measure with `weight`/`scale`) |
| `activities` (Critical Rule 4 — every write logs) | **`log.md`** — `## YYYY-MM-DD` newest-first, bold-word verbs from the activity verb | **OpenLineage event stream — the cleanest 1:1 in the mapping** |
| Views / navigation | **`index.md`** — synthesized progressive-disclosure listing; root declares `okf_version: "0.1"` | `llms.txt` curated index; DCAT Catalog listing |
| Core loop: Action / Session / `session_events` | (not exported as concepts) | **OpenLineage Job / Run / RunEvent** — Amble's strongest live-standard alignment; document the crosswalk |
| `features/context/` (corrections + lessons) | (prompt-context layer) | OSI `context` construct — naming already matches |
**Identity / round-trip:** OKF Concept ID (`tasks/q2-review`) ↔ `{entity_type_slug}/{slug}`. The existing import keys on `(entity_type_slug, title)`; OKF-faithful import additionally carries `external_source = "okf:<bundleId>"` + `external_id = <conceptId>` for idempotent re-import via the existing `(tenant_id, external_source, external_id)` identity.
## Vocabulary crosswalk — two sharply separated tiers
**Tier (i) — ALIGN NOW (external/agent-facing prose only, cheap):**
- An "industry vocabulary crosswalk" paragraph in [`architecture.mdx`](/docs/architecture) and [`data-model.mdx`](/docs/data-model) mapping Amble terms → OKF/OSI/OpenLineage/DCAT/Croissant.
- The **Action / Session / Entity ≈ OpenLineage Job / Run / Dataset** crosswalk stated explicitly — Amble's strongest standards alignment, currently undocumented.
- "Knowledge graph" in agent-facing prose for "the tenant's entity graph." Frontend user copy stays on "records / data / data types" ([Critical Rule 12](/docs/architecture) — already aligns with DCAT "data types").
- Refresh `/llms.txt`, `/llms-full.txt`, and MCP **tool descriptions** (prose only — never tool names or param keys, which are an API contract).
**Tier (ii) — LEAVE ALONE (internal code primitives):** do **not** rename `entities`, `entity_types`, `entity_relations`, `EntityRecord`, `FieldDefinition`, `criteria_sets`. They are stable, load-bearing (≈34k-symbol blast radius), and "entity" is blessed in code/API/admin by Rule 12. Vocabulary aligns in *prose*, not symbols.
**Two collisions to disambiguate:**
- **"bundle"** already means a *plugin/extension npm package* in Amble ([bundles](/docs/features/bundles), `features/custom/bundles/`). Always namespace the OKF artifact as **"OKF knowledge bundle"** and never route the importer through the plugin-bundle install path.
- **"entity"** in OSI/MetricFlow means a *join key*; in Amble it is a graph node. Disambiguate in any semantic-layer-facing doc.
## How it works
### Producer — OKF as a tenant-scoped VIEW over canonical Postgres
The **OKF Bundle Producer** is the in-place upgrade of the existing vault export route (`app/api/entities/export/vault/route.ts` → `app/api/entities/export/okf/route.ts`, with the `vault` path kept as a dated-deprecated 308 alias). Posture: **Postgres stays canonical; the bundle is a stateless projection computed on request** — the exact posture of `/llms.txt`. No `okf_concepts`/`okf_bundles` table. One tenant (optionally workspace) = one bundle; every read flows through `requireAuth()` tenant scope, so a caller only ever exports their own graph.
What it emits, extending `generateEntityMarkdown` (the single serializer):
- **Per entity:** `<type-slug>/<slug>.md` with OKF-recommended frontmatter in priority order (`type`, `title`, `description`, `resource`, `tags`, `timestamp`) + namespaced `amble_*` identity keys (`amble_id`, `amble_type_slug`, `amble_visibility`, `amble_external_source?`, `amble_score?`). `resource` is present only for entities wrapping an external asset.
- **`content` scalars** render in a `# Fields` body table — not exploded into frontmatter; the `description` prose is appended verbatim (stored as markdown by the Obsidian-interop layer, so it is OKF-faithful).
- **Relations** → bundle-relative links (`/type-slug/slug.md`) in a `# Relations` section, gloss = `relationship_type`, plus `amble_relationship_type` for lossless Amble→Amble round-trip.
- **`index.md`** synthesized per-directory + root (frontmatter-free except root `okf_version: "0.1"`).
- **`log.md`** from `activities`, newest-first, bold-word verbs.
Conformance is **by construction**: `type` is always the non-empty slug, frontmatter is always emitted, no entity is ever rejected for missing optional data.
### Consumer — permissive, idempotent import
The **OKF Bundle Consumer** is the in-place upgrade of the existing vault import route (`app/api/entities/import/vault/route.ts` → `.../okf/route.ts`). It already works (two-pass create + `syncWikilinkRelations`); OKF-conformance is mostly **permissive parsing**:
- Parse frontmatter with Zod **`.passthrough()`, never `.strict()`** — only `type` required; preserve unknown keys verbatim into `content`.
- **Never reject** on missing optionals, unknown `type`, extra keys, broken links, or missing `index.md`. Reject only on unparseable frontmatter / empty `type`.
- **Broken links queue, never throw** — record as deferred relations; unresolved-after-pass-2 surface as `conflicts[]` (reuse the `IntegrationConflict` shape). Handle both absolute (`/x.md`) and relative (`../x.md`) link forms.
- Idempotency via `external_source="okf:<bundleId>"` + `external_id=<conceptId>`.
**Data-integrity guardrails (required):** default to a **dry-run report** (types-to-create, stubs-to-create) for human approval before apply — today the importer *skips* unknown types (`import/vault/route.ts:58-64`); OKF permissiveness means we may *create* them, so a typo'd `type` is a real type-sprawl foot-gun. Never auto-create stub entities for broken-link targets by default. Import stays `system_admin`-gated. **Do NOT graft the integration framework's bidirectional/conflict/readback sync machinery onto OKF** — OKF is one-way interchange, not a sync engine.
## For agents
- **Export your tenant's knowledge graph as an OKF bundle:** `POST /api/entities/export/okf` (scoped by `entityTypeSlug` / `entityIds` / `limit`). Reuses `entities:read` — there is no dedicated `exportOkf` MCP tool and no `okf:read` scope by design (per `feedback_fewer_declarative_tools` and ADR-0056).
- **Import an OKF bundle:** `POST /api/entities/import/okf` (`system_admin`, dry-run by default). The operating procedure lives in the `okf-import` skill, not a tool.
- **Discoverability** rides `/llms.txt`, `documents/AGENT-NATIVE.md`, and the skill — the same agent-entry-point family as the rest of the platform.
- A bundle is a **snapshot**. For *live* operation over a tenant's graph, the MCP entity tools (`filterEntities`, `getEntity`, `createEntity`, …) remain superior. The bundle's value is **portability, offline ingestion, and vendor-neutral interchange**.
## Design decisions
- **Harden in place, never a new module** — the vault path is the single serializer; a `features/entities/okf/` module would be a second serializer (`no-parallel-systems`). See ADR-0056.
- **Stateless projection, no second store** — no `okf_concepts`/`okf_bundles` table; the bundle is computed on request like `/llms.txt`.
- **Permissiveness is load-bearing** — `.passthrough()`, broken-link tolerance, and unknown-type tolerance are encoded as regression tests so a future `.strict()` tightening can't silently reject valid bundles.
- **Lossless typed edges via `amble_relationship_type`** — no LLM prose→edge classifier (over-engineered, not in spec).
- **Fund Phase 0 only** — a static bundle is thin value over the live MCP surface; gate later phases on a named consumer, and treat Google Cloud Knowledge Catalog interop as verification-led / BLOCKED-until-contract-documented, never a build.
## Related modules
- [Obsidian Interop](/docs/features/obsidian-interop) — the existing OKF-shaped vault export/import + wikilink machinery
- [Entity System](/docs/features/entity-system) — entities, types, `json_schema`, relations
- [Source Sync](/docs/features/source-sync) and [Webhooks](/docs/features/webhooks) — named data sources / `external_data_sources`
- [Skills](/docs/features/skills) — where the `okf-import` operating procedure lives
- [Data model](/docs/data-model) and [Architecture](/docs/architecture) — the industry vocabulary crosswalk
- ADR-0056 (this decision), ADR-0052 (metric data source kind), ADR-0029 (tenant declarations)