Documentation source
Phase 6b — Flatten FieldConfig.extraction (Tasks as Source of Truth)
Drop the nested FieldConfig.extraction shape. Tasks become the single source of truth for instructions, agents, dependencies, consensus, refinement, and approval gating.
# Phase 6b — Flatten FieldConfig.extraction
## Problem
Phase 6a moved all runtime extraction triggers (`entity/created`, `status_transition`, `document_ready`) onto the unified **tasks → sessions → session_events** pipeline. The runtime is now 100% unified: entities go through `triggerTask()` → `session-executor` → `executeAgentSession()` → `LocalSprinterAgent`.
But the **config layer is still double-booked**. Every per-field extraction setting lives in two places:
1. **`FieldConfig.extraction`** on `entity_types.config.fields[fieldName].extraction` — the legacy nested shape with `instructions`, `agentSlug`, `sources`, `dependsOn`, `required`, `validation`, `maxSteps`, `consensus`, `refinement`, `requiresApproval`.
2. **`tasks` table row** — projected from `FieldConfig.extraction` by `syncSystemTasks()` when the entity type is saved. Instructions → `tasks.instructions`, agentSlug → `tasks.agent_slug`, dependsOn → `tasks.depends_on` (slug-mapped), consensus/refinement/etc → `tasks.metadata`.
`syncSystemTasks()` is a one-way projection. Any edit has to happen on the field config side; the task row is a derived artifact. This has hardened into the following problems:
1. **Two sources of truth, one write path.** Editing the task row directly (e.g. via the task editor) gets clobbered the next time the entity type is saved. There is no way to author a task without authoring a field config shape that projects into it.
2. **The task editor is unusable for field extractions.** The existing task editor UI is the natural home for editing instructions, dependencies, consensus strategies, etc. But because the field config projection overwrites it, no one can use it for extraction tasks.
3. **Runtime callers cannot pass per-field overrides cleanly.** `feedback-rerun.ts` is the last runtime caller of `runEntityWorkflow()` — it needs to pass `fieldOverrides: { [fieldName]: { instructions, parentResponseId, responseSource } }` so the agent gets the rejection feedback and the new response links back to the rejected parent. Tasks + sessions have no equivalent per-field override mechanism today, so feedback-rerun can't migrate.
4. **Readers are scattered.** 13 platform files read `FieldConfig.extraction` directly: legacy workflow compiler, dep sorting, completion status, UI filtering, data table column badges, type spec compiler, admin tool input schemas, API route validation, entity-bento, form-utils, clean-field-configs. Every one of them is an anchor preventing the drop.
5. **Data duplication in the DB.** Entity type configs carry the full extraction JSON even though tasks already persist the same data. Two places to audit, two places to backfill, two places to get wrong.
6. **Phase 6c is blocked.** Phase 6c needs to delete `run-workflow.ts`, drop `workflow_runs` / `workflow_node_runs` / `extraction_runs` / `extraction_results`, migrate heartbeat off `resumeWorkflowRunsForAgent()`, and unify response sessions. None of that can start while the config layer still projects into tasks rather than owning them directly.
## Solution
**Flatten. Drop `FieldConfig.extraction` entirely. The task row is the source of truth for instructions, agent, dependencies, consensus, refinement, sources, and approval gating.**
Five interlocking changes make this safe:
1. **Per-field overrides on tasks.** Extend `triggerTask()` with a `fieldOverrides` parameter. The override map is persisted into parent session metadata and projected down into matching child sessions (by `output_config.fieldNames`). `executeAgentSession()` unpacks the override and threads it into the system prompt and the response-tool factory. This unblocks `feedback-rerun.ts`.
2. **Feedback-rerun migrates off `runEntityWorkflow()`.** `feedback-rerun.ts` resolves the field's system extraction task via `tasks.entity_type_id + slug=extract-{field}`, calls `triggerTask()` with `fieldOverrides`, and no longer touches `run-workflow.ts`. With that, **all runtime callers that needed per-field overrides are on tasks**.
3. **Legacy workflow compiler reads from tasks.** `features/workflows/compile.ts` (already `@deprecated`) is rewritten to take `TaskRecord[]` instead of `FieldConfig` records. `run-workflow.ts` loads tasks at the call site and passes them down. The two remaining `runEntityWorkflow()` callers (`POST /api/workflows/[entityId]` and the `triggerWorkflow` admin tool) keep working unchanged — they'll be deleted together in Phase 6c. `CriteriaSetDimension.extraction` (scoring dimensions) is a separate code path and is **out of scope** for Phase 6b.
4. **Field Config editor swaps inline editing for a task-editor link.** The ~330-line inline extraction editor in `features/entities/components/admin/field-config-editor.tsx` is replaced with a single "Edit extraction task" button per field. The button opens the existing task editor for that field's system task (`slug = extract-{fieldName}`). `humanInput`, `relation`, `displayType`, and `statusMap` stay on `FieldConfig`.
5. **Platform readers migrate off `.extraction` or are removed.** Every reader is either rewritten to query tasks, removed (deprecated deadcode), or swapped to a different derived signal. The concrete per-file plan is in `docs/superpowers/plans/2026-04-11-unified-sessions-phase6b.md`. After all readers are off, `FieldConfig.extraction` is deleted from `features/entities/types.ts`, the two Zod schemas are deleted, and a data-repair migration strips the `extraction` key from every row of `entity_types.config.fields.*`.
## Design
### Architecture — current vs. target
**Current (double-booked):**
```
entity_types.config.fields.{field}.extraction = { instructions, agentSlug, ... }
│
syncSystemTasks() projects ↓
│
tasks rows with slug=extract-{field} (derived copy)
│
┌────────────────────────────┼────────────────────┐
▼ ▼ ▼
runEntityWorkflow() via task-dispatch.ts via compile.ts reads
compile.ts reading FieldConfig getClaimableTasks() FieldConfig.extraction
```
**Target (tasks as the root):**
```
tasks rows with slug=extract-{field} = SOURCE OF TRUTH
instructions, agent_slug, depends_on, metadata.{consensus,refinement,maxSteps,...}
│
┌────────────────────────────┼────────────────────┐
▼ ▼ ▼
task editor UI task-dispatch.ts + session-executor compile.ts reads
(Create/edit task) (runtime pipeline — unchanged) tasks directly
feedback-rerun → triggerTask({ fieldOverrides }) → session_executor →
executeAgentSession reads fieldOverrides from session metadata →
injects into system prompt + response tool factory
```
### API — `triggerTask()` gains `fieldOverrides`
```typescript
// features/tasks/server/trigger.ts
export interface FieldOverride {
/** Additional guidance appended to the agent's prompt for this field. */
instructions?: string;
/** Parent response to link the new response to (for feedback reruns). */
parentResponseId?: string;
/** Tag applied to the response to distinguish extraction vs feedback reruns. */
responseSource?: "extraction" | "feedback";
}
export interface TriggerTaskInput {
taskId: string;
entityId?: string;
tenantId: string;
triggeredBy: "manual" | "event" | "cron" | "heartbeat";
userInput?: Record<string, unknown>;
/** Per-field runtime overrides, keyed by field name. */
fieldOverrides?: Record<string, FieldOverride>;
}
```
Inside `triggerTask()`:
1. Parent session persists the full `fieldOverrides` map in its `metadata.fieldOverrides`.
2. For each child session, if the child's task has `output_config.fieldNames` that overlaps with `fieldOverrides` keys, copy only the matching override(s) into that child's `metadata.fieldOverride`. This keeps each child's payload small and self-contained.
### API — `executeAgentSession()` reads the override
`execute.ts` reads `session.metadata.fieldOverride` after loading the session. If present:
1. Append `override.instructions` to the `taskInstructions` array before the system prompt is built.
2. Pass `{ responseSource, parentResponseId }` into `buildInternalAgentTools({ resolved, tenantId, responseContext })` so the response tool factory can tag the new response correctly.
`buildInternalAgentTools()` gains an optional `responseContext` parameter that is threaded into `createResponseToolDefinitions()`. The existing response tool code already knows how to tag `responseSource` and `parentResponseId` — we just surface them through the call chain.
### API — `compile.ts` takes tasks
```typescript
// features/workflows/compile.ts (rewritten)
interface CompileFromTasksParams {
tasks: TaskRecord[]; // Children of the system extraction parent task
schemaProperties?: Record<string, Record<string, unknown>>;
criteriaSets?: CompileCriteriaSet[]; // Scoring dimensions — unchanged path
defaultAgentSlug?: string;
}
export function compileEntityWorkflowDefinition(
params: CompileFromTasksParams,
): WorkflowDefinition
```
- Each task becomes a `WorkflowNodeDefinition`.
- Instructions from `task.instructions`.
- Agent from `task.agent_slug`.
- Dependencies from `task.depends_on` (already slug-prefixed with `extract-`).
- Consensus from `task.metadata.consensus` → consensus node.
- Refinement from `task.metadata.refinement` → refinement node.
- Max steps from `task.metadata.maxSteps`.
- Required / requiresApproval from `task.metadata.required` / `task.metadata.requiresApproval` (these move from field config into task metadata as part of the migration).
- `humanInput` fields become `wait_human` nodes — tasks with `assigned_to` set and no `agent_slug`.
- Scoring dimensions (`CriteriaSetDimension.extraction`) still flow through `criteriaSets` parameter — untouched.
`run-workflow.ts` updates its call:
```typescript
const tasks = await loadSystemExtractionTasks(tenantId, entityTypeId);
const workflowDef = compileEntityWorkflowDefinition({
tasks,
schemaProperties,
criteriaSets,
defaultAgentSlug,
});
```
### API — `feedback-rerun.ts` migrated
```typescript
// features/inngest/functions/feedback-rerun.ts (rewritten)
const task = await loadSystemFieldTask(tenantId, entityTypeId, fieldName);
if (!task) return { skipped: true, reason: "no_field_task" };
const { parentSessionId } = await triggerTask({
taskId: task.id,
entityId,
tenantId: event.data.tenantId,
triggeredBy: "event",
fieldOverrides: {
[fieldName]: { instructions, parentResponseId, responseSource: "feedback" },
},
});
await inngest.send({
name: EVENT_NAMES.SESSION_EXECUTE,
data: { parentSessionId, tenantId, triggeredBy: "feedback-rerun" },
});
```
### Field Config editor UI swap
The `FieldConfigEditor` drawer retains:
- Label, display type, status map
- `humanInput` toggle
- `relation` / `connection` config
It gains:
- Per-field "Edit extraction task" link — button that:
- If a task exists (`slug = extract-{fieldName}`, `is_system = true`): opens the task editor for that task.
- If no task exists: opens the task editor in "create" mode, pre-filled with the field name.
It loses:
- The ~330-line inline extraction block (instructions, sources, dependsOn, agent, consensus, refinement, approval, maxSteps, required).
The task editor already exists and already edits `tasks` table rows. No new UI module is needed — just a link.
### Seed scripts
9 product seed scripts currently set `extraction` on field configs:
- `scripts/update-ember-company-fields.mjs`
- `scripts/enrich-oci-entity-types.mjs`
- `scripts/audit-oci-data.mjs`
- `scripts/seed-lead-gen-types.mjs`
- `scripts/seed-entity-types.mjs`
- `scripts/seed-oci-content-architecture.mjs`
- `scripts/polish-ember-tenant.mjs`
- `scripts/seed-nathan-excavating-types.mjs`
- `scripts/seed-rock-hill-v2.mjs`
- `scripts/oci-launch-seed.mjs`
They get a shared helper `seedExtractionTask({ tenantId, entityTypeId, fieldName, instructions, agentSlug, dependsOn, metadata })` that inserts task rows directly, bypassing the field config path. Callers are mechanically rewritten: each `field.extraction = { ... }` becomes a `seedExtractionTask(...)` call.
### Data-repair migration
A forward-only migration:
```sql
-- supabase/migrations/{timestamp}_flatten_field_config_extraction.sql
-- 1. Ensure every field with .extraction has a corresponding task row.
-- This runs the projection one final time for any entity types whose config
-- has drifted away from tasks. Uses a PL/pgSQL loop over entity_types.
DO $$
DECLARE
et RECORD;
field_key TEXT;
field_cfg JSONB;
BEGIN
FOR et IN SELECT id, tenant_id, config FROM entity_types LOOP
FOR field_key, field_cfg IN SELECT * FROM jsonb_each(et.config -> 'fields') LOOP
IF field_cfg ? 'extraction' THEN
-- Upsert the task row if missing (mirrors syncSystemTasks behavior)
-- … full logic in the migration file
END IF;
END LOOP;
END LOOP;
END $$;
-- 2. Strip .extraction from every field config in entity_types.
UPDATE entity_types
SET config = jsonb_set(
config,
'{fields}',
(
SELECT jsonb_object_agg(
key,
CASE WHEN value ? 'extraction' THEN (value - 'extraction') ELSE value END
)
FROM jsonb_each(config -> 'fields')
)
)
WHERE config -> 'fields' IS NOT NULL
AND EXISTS (
SELECT 1
FROM jsonb_each(config -> 'fields')
WHERE value ? 'extraction'
);
```
After the migration, every entity type config has `extraction` fully stripped, and every row of `tasks` that was previously a projection now stands on its own.
## Trade-offs
**Cost:**
- Large refactor surface — 15+ files changed, 9 seed scripts rewritten, one data-repair migration.
- One-way drop. Rolling back requires reviving `FieldConfig.extraction`, reprojecting from tasks, and resyncing.
- Task editor UX is currently "plain" — not optimized for field-extraction authoring ergonomics. The inline editor was more ergonomic for writing instructions. We accept the regression because unifying the two editors is the whole point.
**Benefit:**
- Single source of truth. Edit once, in one place, with one permission model.
- Unblocks Phase 6c's deletion of `run-workflow.ts` and its audit tables.
- Feedback reruns stop using a parallel pipeline.
- Task editor becomes the universal authoring surface for all agent work units — field extractions, triggered automations, document processing, heartbeats.
- Config drift is eliminated. No more "saved the field config but the task row was stale."
**Why not:**
- **Why not keep `FieldConfig.extraction` as a thin view-of-tasks?** That's just rename-the-storage. The user is explicit: drop it entirely. A hydration layer would preserve the double-source-of-truth coupling we're trying to break.
- **Why not do this in Phase 6c instead?** Phase 6c deletes `run-workflow.ts`, and `run-workflow.ts` currently reads `FieldConfig.extraction` through `compile.ts`. We'd have to solve the same problem there anyway. Splitting earlier (6b) lets us land the config flatten without a big-bang deletion PR.
- **Why not extend `triggerTask()` with a richer "override the whole task" interface?** `fieldOverrides` as designed matches the existing `WorkflowFieldOverride` shape 1:1. Adding a richer override surface is unnecessary scope. If we need it later, we can add.
- **Why not drop `compile.ts` now?** It's still used by two non-feedback callers (`POST /api/workflows/[entityId]` and the `triggerWorkflow` admin tool). Those migrate in Phase 6c together with `run-workflow.ts` itself. Rewriting `compile.ts` to read tasks is a small surgical change that keeps the deprecated path working through the transition.
## Acceptance Criteria
- [ ] `FieldConfig.extraction` no longer exists on the TypeScript type (`features/entities/types.ts`).
- [ ] `ExtractionConfigSchema` and `ExtractionConsensusSchema` no longer exist in `features/schemas/json-column-schemas.ts`.
- [ ] The duplicate extraction schema in `app/api/entity-types/[slug]/route.ts` is deleted.
- [ ] The admin tool `createEntityType` input schema no longer includes `extraction`.
- [ ] No platform file under `features/` reads `.extraction` on a `FieldConfig` object (verified with grep).
- [ ] `features/workflows/compile.ts` takes `TaskRecord[]`, not `FieldConfig`. Tests updated.
- [ ] `features/inngest/functions/feedback-rerun.ts` calls `triggerTask()` with `fieldOverrides`, not `runEntityWorkflow()`. Tests updated.
- [ ] `triggerTask()` accepts a `fieldOverrides` parameter. Parent session persists it; child sessions receive matching overrides via `metadata.fieldOverride`.
- [ ] `executeAgentSession()` reads `metadata.fieldOverride` and threads `instructions` into the system prompt and `{ responseSource, parentResponseId }` into the response tool factory. Tests updated.
- [ ] `FieldConfigEditor` renders a "Edit extraction task" button per field instead of the inline editor; clicking it opens the task editor for that field's system task.
- [ ] 9 seed scripts create tasks via a shared helper instead of writing `extraction` on field configs.
- [ ] A forward migration exists that (a) ensures tasks exist for every historic `.extraction` entry and (b) strips `.extraction` from every entity type config row.
- [ ] `pnpm test` green, `pnpm typecheck` clean, `pnpm build` green.
- [ ] `documents/CHANGELOG.md` has a Phase 6b entry; `documents/MIGRATION-STATUS-UNIFIED-SESSIONS.md` has a Phase 6b section; `CLAUDE.md` deprecated-systems section updated.
- [ ] PR 620 updated and pushed. Spec status set to `completed`.
## Out of Scope
- **`POST /api/workflows/[entityId]` and `triggerWorkflow` admin tool migrations off `runEntityWorkflow()`** — deferred to Phase 6c together with `run-workflow.ts` deletion.
- **Heartbeat migration** (`resumeWorkflowRunsForAgent()` → `getClaimableTasks()`) — Phase 6c.
- **Dropping `workflow_runs` / `workflow_node_runs` / `extraction_runs` / `extraction_results` tables** — Phase 6c (destructive).
- **Response-session unification** — Phase 6c.
- **`CriteriaSetDimension.extraction` (scoring-dimension extraction)** — parallel surface, not a field config. Addressed separately if ever.
- **Task editor UX improvements for field-extraction authoring** — the current editor is usable but not optimized. Can be polished once the surface is unified.
- **Backfilling task rows from scratch for entity types that never had an `.extraction` entry** — the data-repair migration handles what exists in the DB today; net-new types are authored via the task editor going forward.
## Related Files
See `files-touched` in frontmatter. The implementation plan at `docs/superpowers/plans/2026-04-11-unified-sessions-phase6b.md` has per-task file lists and concrete test + code diffs.