Documentation source

Company Brain

A source-backed, retrieval-optimized knowledge graph that external agents query over MCP — canonical node types, fused retrieval via getContextPack, and a goal-loop ingestion contract.

# Company Brain

The **company brain** is not a new system. It is the Amble entity graph used as a durable,
source-backed context layer that agents — in-app and external (via MCP) — query to understand who
the user is, what they're working on, who and what matters, what was decided, what's open, and
which sources justify each conclusion.

It is assembled from primitives the platform already owns:

- **Nodes** = `entities` rows; **node kinds** = `entity_types` (DB-driven `json_schema`).
- **Edges** = typed `entity_relations` (graph traversal + provenance links).
- **Documents** = `document` entities + chunked, embedded `document_chunks` (hybrid RAG).
- **Retrieval** = live pgvector + Postgres FTS + graph, fused by `getContextPack`.

No parallel store, no new engine. See `.claude/rules/no-parallel-systems.md`.

## The `@amble/company-brain` bundle

Install the bundle (`features/custom/bundles/company-brain/manifest.json`) to seed a tenant with
the canonical knowledge-graph node types. Each carries provenance, confidence, freshness, and
sensitivity in its `json_schema` so retrieval can rank by trust and recency.

| Type | What it holds |
| --- | --- |
| `person` | People — relationship, role, preferences, open loops |
| `organization` | Companies / orgs / teams |
| `project` | Active, time-bound work with a definition-of-done |
| `decision` | Statement, rationale, alternatives, consequences, status |
| `note` | Atomic, concept-oriented note (`kind`: insight / preference / procedure / meeting / commitment / report / state / source-manifest / validation-report / completion-audit) |
| `topic` | Concept/theme node for concept-oriented linking |
| `context-pack` | A curated retrieval bundle for a bounded task — "use this when…" + agent instructions |
| `source` | Provenance anchor (document / url / integration / manual) that records cite |
| `source-item` | An individual captured item (email thread, article, message) |

**Canonical edges:** `person —works_at→ organization`, `person —involved_in→ project`,
`project —produced_decision→ decision`, `decision —sourced_from→ source`, `note —about→ topic`,
`note —mentions→ (person|organization|project)`, `(any) —sourced_from→ source`,
`context-pack —includes→ (any)`.

**Provenance field group** (on every claim-bearing type): `summary`, `aliases[]`, `lifecycle`
(active / area / resource / archived — PARA-style relevance routing), `confidence`
(high / medium / low), `source`, `source_url`, `last_verified_at`, `sensitivity`
(public / internal / confidential). These are `json_schema` content properties — not `FieldConfig`
keys — so the strict entity-type write-gate never strips them.

## Retrieval — `getContextPack` (the front door)

`getContextPack` is a read-only, external MCP tool that fuses the three existing hybrid searches
into one ranked, cited, token-budgeted payload — so an agent makes **one call**, not three.

```
getContextPack({ query, typeSlug?, maxEntities?, maxDocuments?, maxConnections? })
→ {
    entities:    [{ id, title, type, typeSlug, href, content }],   // hybrid semantic+FTS, compact refs
    documents:   [{ documentId, documentTitle, page, score, snippet }], // hybrid chunk RAG, cited
    connections: [{ id, title, type, typeSlug, href, relationshipType, viaEntityId }], // graph neighbors
    summary:     { entityCount, documentCount, connectionCount }
  }
```

It returns **compact references** (just-in-time retrieval, per Anthropic's context-engineering
guidance) — dereference a full record with `getEntity`. It is tenant-scoped and never crosses a
tenant boundary. Implementation: `features/tools/context-pack-tools.ts` (composition over
`searchEntitiesHybrid` + `searchDocumentsHybrid` + an `entity_relations` graph walk).

## For agents — how to consume the brain

1. **Lookup, don't dump.** Call `getContextPack({ query })` first. Expand only what you need with
   `getEntity`. Keep context small.
2. **Cite.** Every claim in the brain carries provenance — surface it. Ground or abstain: if the
   brain doesn't know, say "not in the brain", never guess.
3. **Respect sensitivity.** Honor each record's `sensitivity` when deciding what to surface.
4. **Write back with discipline.** Agent-created records get the same provenance, confidence, and
   `sourced_from` edges as human writes. Canonicalize before creating — `searchEntities` /
   `getContextPack` to find an existing node and update it (append provenance, add an alias) rather
   than creating a duplicate.

## Zero-instruction operation — the ambient brain

The brain is **ambient**: an external agent (Codex CLI, Claude Code on a tech team) connecting to
a tenant's MCP server learns to use Amble as its company brain + control plane *without* a repo
rule, `AGENTS.md`, or explicit instruction. The mechanism is the MCP spec's
`Implementation.instructions` field — surfaced to every client on `initialize`. Amble sets it in
`features/mcp/amble-server.ts` (`AMBLE_MCP_INSTRUCTIONS`): retrieve-first via `getContextPack`,
write canonical records back, pick up work via the goal-loop tools, report via
`reportAgentActivity`, and pull operating method via `ambleGetSkill`. Tenant-agnostic — the
connecting URL already pins the tenant.

## The one command — method + goal both live in Amble

Everything an agent needs is pulled from Amble over MCP, so every agent (Claude Code, Codex,
OpenClaw) gets the same latest version with no local copy to drift:

- **The method** is the `company-brain-ingestion` **platform skill** (`features/skills/platform-skills.ts`)
  — the operating playbook (compiler process, canonicalize, provenance, validation gates, retrieval
  audit, safety). Pull it with `ambleGetSkill("company-brain-ingestion")`.
- **The goal** is the `company-brain-ingestion` **goal-loop template** (`features/entities/lib/goal-loop-templates.ts`)
  — a generic, every-tenant template. Stand it up with
  `createGoalFromTemplate({ templateSlug: "company-brain-ingestion" })`.
- **The launcher** is the repo skill `.claude/skills/company-brain/SKILL.md` — the thin bootstrap a
  Claude Code / Codex agent invokes to kick the loop off.

Filling the brain is then a **goal loop**, not a one-shot script: the agent reads its contract with
`getGoalBrief` / `getGoalLog` (both external reads) and runs
`listGoalWork → claimGoalWork → closeGoalWork(proof)` — claiming only `controlled_execute` work,
renewing under the ~12h lease, reporting via `reportAgentActivity`, resuming from
`getAgentHandoffPacket`. The full, copy-ready prompt lives at
`documents/prompts/company-brain-ingestion-codex.md`.

## Rollout — provision a tenant (Sprinter, Marbella, OCI)

1. **Install the node types.** `POST /api/plugins` with
   `{ action: "install", install: { source: "manifest", manifestName: "@amble/company-brain", workspaceId } }`
   (admin / operator) — idempotent; seeds the 9 canonical types on that tenant.
2. **Provision a key.** An API key with scope `tools:execute` and a role granting
   `entities.team.read` + `entities.team.update`, owned by a current member of the tenant.
3. **Point the agent at it.** `POST /api/mcp/t/{tenantSlug}/server` with the Bearer key — the
   server instructions orient the agent automatically.
4. **Arm the loop.** `createGoalFromTemplate` leaves the goal `draft`; arming it (draft → running)
   is a one-time human step in the Amble UI. Thereafter the loop is agent-driven at the
   `controlled_execute` tier.

## Design decisions

- **Extend, don't add.** The brain is entity-graph + documents + retrieval composition — not a new
  table or memory store (`no-parallel-systems`). `getContextPack` is the only new code; it wraps
  existing RPCs.
- **Provenance lives in the schema and on edges**, reusing the value-level `field_sources` /
  `lineage` model and `sourced_from` relations — not a bespoke provenance type.
- **External reads, gated.** `getGoalBrief`/`getGoalLog`/`getContextPack` are `entities.team.read`
  and tenant-scoped; goal-WRITE tools stay internal.
- **Ambient by server instructions, not a new tool.** The "use Amble as your brain" contract rides
  the MCP spec's `instructions` field — no dedicated tool, no per-agent config (`thin-harness`,
  `feedback_fewer_declarative_tools`).
- **Method and goal are data, pulled over MCP.** The ingestion playbook is a platform *skill* and
  the loop is a goal-loop *template* — both version-controlled, both retrieved live, so no agent
  carries a stale local copy.

## Related

- [Knowledge Loops](/docs/features/knowledge-loops) — the feed → synthesize → review → eval cycle
- [Shared Knowledge](/docs/features/shared-knowledge) — team memory injected into agent prompts
- [Tool System](/docs/features/tool-system) — the MCP tool surface
- [Document Processing](/docs/features/document-processing) — chunking + hybrid retrieval
- [Bundles](/docs/features/bundles) — how `@amble/company-brain` installs

# Company Brain The **company brain** is not a new system. It is the Amble entity graph used as a durable, source-backed context layer that agents — in-app and external (via MCP) — query to understand who the user is, what they're working on, who and what matters, what was decided, what's open, and which sources justify each conclusion. It is assembled from primitives the platform already owns: - **Nodes** = `entities` rows; **node kinds** = `entity_types` (DB-driven `json_schema`). - **Edges** = typed `entity_relations` (graph traversal + provenance links). - **Documents** = `document` entities + chunked, embedded `document_chunks` (hybrid RAG). - **Retrieval** = live pgvector + Postgres FTS + graph, fused by `getContextPack`. No parallel store, no new engine. See `.claude/rules/no-parallel-systems.md`. ## The `@amble/company-brain` bundle Install the bundle (`features/custom/bundles/company-brain/manifest.json`) to seed a tenant with the canonical knowledge-graph node types. Each carries provenance, confidence, freshness, and sensitivity in its `json_schema` so retrieval can rank by trust and recency. | Type | What it holds | | --- | --- | | `person` | People — relationship, role, preferences, open loops | | `organization` | Companies / orgs / teams | | `project` | Active, time-bound work with a definition-of-done | | `decision` | Statement, rationale, alternatives, consequences, status | | `note` | Atomic, concept-oriented note (`kind`: insight / preference / procedure / meeting / commitment / report / state / source-manifest / validation-report / completion-audit) | | `topic` | Concept/theme node for concept-oriented linking | | `context-pack` | A curated retrieval bundle for a bounded task — "use this when…" + agent instructions | | `source` | Provenance anchor (document / url / integration / manual) that records cite | | `source-item` | An individual captured item (email thread, article, message) | **Canonical edges:** `person —works_at→ organization`, `person —involved_in→ project`, `project —produced_decision→ decision`, `decision —sourced_from→ source`, `note —about→ topic`, `note —mentions→ (person|organization|project)`, `(any) —sourced_from→ source`, `context-pack —includes→ (any)`. **Provenance field group** (on every claim-bearing type): `summary`, `aliases[]`, `lifecycle` (active / area / resource / archived — PARA-style relevance routing), `confidence` (high / medium / low), `source`, `source_url`, `last_verified_at`, `sensitivity` (public / internal / confidential). These are `json_schema` content properties — not `FieldConfig` keys — so the strict entity-type write-gate never strips them. ## Retrieval — `getContextPack` (the front door) `getContextPack` is a read-only, external MCP tool that fuses the three existing hybrid searches into one ranked, cited, token-budgeted payload — so an agent makes **one call**, not three. ``` getContextPack({ query, typeSlug?, maxEntities?, maxDocuments?, maxConnections? }) → { entities: [{ id, title, type, typeSlug, href, content }], // hybrid semantic+FTS, compact refs documents: [{ documentId, documentTitle, page, score, snippet }], // hybrid chunk RAG, cited connections: [{ id, title, type, typeSlug, href, relationshipType, viaEntityId }], // graph neighbors summary: { entityCount, documentCount, connectionCount } } ``` It returns **compact references** (just-in-time retrieval, per Anthropic's context-engineering guidance) — dereference a full record with `getEntity`. It is tenant-scoped and never crosses a tenant boundary. Implementation: `features/tools/context-pack-tools.ts` (composition over `searchEntitiesHybrid` + `searchDocumentsHybrid` + an `entity_relations` graph walk). ## For agents — how to consume the brain 1. **Lookup, don't dump.** Call `getContextPack({ query })` first. Expand only what you need with `getEntity`. Keep context small. 2. **Cite.** Every claim in the brain carries provenance — surface it. Ground or abstain: if the brain doesn't know, say "not in the brain", never guess. 3. **Respect sensitivity.** Honor each record's `sensitivity` when deciding what to surface. 4. **Write back with discipline.** Agent-created records get the same provenance, confidence, and `sourced_from` edges as human writes. Canonicalize before creating — `searchEntities` / `getContextPack` to find an existing node and update it (append provenance, add an alias) rather than creating a duplicate. ## Zero-instruction operation — the ambient brain The brain is **ambient**: an external agent (Codex CLI, Claude Code on a tech team) connecting to a tenant's MCP server learns to use Amble as its company brain + control plane *without* a repo rule, `AGENTS.md`, or explicit instruction. The mechanism is the MCP spec's `Implementation.instructions` field — surfaced to every client on `initialize`. Amble sets it in `features/mcp/amble-server.ts` (`AMBLE_MCP_INSTRUCTIONS`): retrieve-first via `getContextPack`, write canonical records back, pick up work via the goal-loop tools, report via `reportAgentActivity`, and pull operating method via `ambleGetSkill`. Tenant-agnostic — the connecting URL already pins the tenant. ## The one command — method + goal both live in Amble Everything an agent needs is pulled from Amble over MCP, so every agent (Claude Code, Codex, OpenClaw) gets the same latest version with no local copy to drift: - **The method** is the `company-brain-ingestion` **platform skill** (`features/skills/platform-skills.ts`) — the operating playbook (compiler process, canonicalize, provenance, validation gates, retrieval audit, safety). Pull it with `ambleGetSkill("company-brain-ingestion")`. - **The goal** is the `company-brain-ingestion` **goal-loop template** (`features/entities/lib/goal-loop-templates.ts`) — a generic, every-tenant template. Stand it up with `createGoalFromTemplate({ templateSlug: "company-brain-ingestion" })`. - **The launcher** is the repo skill `.claude/skills/company-brain/SKILL.md` — the thin bootstrap a Claude Code / Codex agent invokes to kick the loop off. Filling the brain is then a **goal loop**, not a one-shot script: the agent reads its contract with `getGoalBrief` / `getGoalLog` (both external reads) and runs `listGoalWork → claimGoalWork → closeGoalWork(proof)` — claiming only `controlled_execute` work, renewing under the ~12h lease, reporting via `reportAgentActivity`, resuming from `getAgentHandoffPacket`. The full, copy-ready prompt lives at `documents/prompts/company-brain-ingestion-codex.md`. ## Rollout — provision a tenant (Sprinter, Marbella, OCI) 1. **Install the node types.** `POST /api/plugins` with `{ action: "install", install: { source: "manifest", manifestName: "@amble/company-brain", workspaceId } }` (admin / operator) — idempotent; seeds the 9 canonical types on that tenant. 2. **Provision a key.** An API key with scope `tools:execute` and a role granting `entities.team.read` + `entities.team.update`, owned by a current member of the tenant. 3. **Point the agent at it.** `POST /api/mcp/t/{tenantSlug}/server` with the Bearer key — the server instructions orient the agent automatically. 4. **Arm the loop.** `createGoalFromTemplate` leaves the goal `draft`; arming it (draft → running) is a one-time human step in the Amble UI. Thereafter the loop is agent-driven at the `controlled_execute` tier. ## Design decisions - **Extend, don't add.** The brain is entity-graph + documents + retrieval composition — not a new table or memory store (`no-parallel-systems`). `getContextPack` is the only new code; it wraps existing RPCs. - **Provenance lives in the schema and on edges**, reusing the value-level `field_sources` / `lineage` model and `sourced_from` relations — not a bespoke provenance type. - **External reads, gated.** `getGoalBrief`/`getGoalLog`/`getContextPack` are `entities.team.read` and tenant-scoped; goal-WRITE tools stay internal. - **Ambient by server instructions, not a new tool.** The "use Amble as your brain" contract rides the MCP spec's `instructions` field — no dedicated tool, no per-agent config (`thin-harness`, `feedback_fewer_declarative_tools`). - **Method and goal are data, pulled over MCP.** The ingestion playbook is a platform *skill* and the loop is a goal-loop *template* — both version-controlled, both retrieved live, so no agent carries a stale local copy. ## Related - [Knowledge Loops](/docs/features/knowledge-loops) — the feed → synthesize → review → eval cycle - [Shared Knowledge](/docs/features/shared-knowledge) — team memory injected into agent prompts - [Tool System](/docs/features/tool-system) — the MCP tool surface - [Document Processing](/docs/features/document-processing) — chunking + hybrid retrieval - [Bundles](/docs/features/bundles) — how `@amble/company-brain` installs