Documentation source

Delegation Readiness

Versioned, shape-agnostic delegation-readiness scoring via the criteria-set primitive.

# Delegation Readiness

Every task entity in `human_only` or `trialing` delegation state gets a daily
readiness assessment — a 0..100 score that ranks the delegate-page queue and
shows reviewers a per-dimension breakdown of _why_ the score looks the way it
does. Scores are submitted as responses against the platform-level
`delegation-readiness` criteria set; there is no flat `readiness_score` field.

## Overview

The score answers: _"If I delegate this task to an agent today, will it go
well?"_ The stock rubric is four dimensions (volume, time efficiency, agent
coverage, proven pattern) at 0..25 each, summing to 0..100. Tenants can fork
the criteria set to add or reweight dimensions without touching platform code.

## Key Concepts

- **Criteria set (`delegation-readiness`)** — seeded into every tenant via
  `supabase/migrations/20260425000000_*.sql` and
  `seedDelegationReadinessCriteriaSet` (called from `createTenant`). Scoped
  to `entity_type_slug='task'`.
- **Response** — one row in `entity_responses` per assessment. Latest
  promoted response per task is the source of truth. Older responses are
  kept for score-progression.
- **Denorm cache** — `entities.content.readiness_cache = { total, scored_at,
  response_id }`. Populated by a DB trigger on promotion. Shape-agnostic: no
  per-dimension fields, so custom rubrics render without cache migration.
- **Evaluator** — the `delegation-readiness-sweep` action (a `public.actions`
  row seeded per tenant) fires nightly at 04:00 UTC via `taskCron`. The row's
  assigned agent (`delegation-evaluator`, a system agent) runs a single tool
  call (`scoreDelegationReadinessSweep`) that invokes `evaluateTenant()`.
  Admins pause, reschedule, or swap the agent from `/tasks` — no code change.

## How It Works

```text
         ┌────────────────────────┐
         │ public.actions row     │  slug: delegation-readiness-sweep
         │ (trigger_type: cron)   │  schedule: 0 4 * * *
         └──────────┬─────────────┘
                    │ taskCron scans every minute
                    ▼
         ┌────────────────────────┐
         │ triggerTask() →        │
         │ session-executor       │
         └──────────┬─────────────┘
                    │ resolves delegation-evaluator agent
                    ▼
         ┌────────────────────────┐
         │ score_delegation_      │  single-shot tool call
         │ readiness_sweep tool   │
         └──────────┬─────────────┘
                    │ evaluateTenant(tenantId)
                    ▼
         ┌────────────────────────┐
         │ scoreDelegationReadiness │  pure rubric + context
         │ (readiness.ts)         │  for each eligible task
         └──────────┬─────────────┘
                    │ submit + promote
                    ▼
         ┌────────────────────────┐
         │ entity_responses       │  source of truth
         └──────────┬─────────────┘
                    │ AFTER promotion
                    ▼
         ┌────────────────────────┐
         │ trg_update_task_       │  writes cache
         │ readiness_cache        │
         └──────────┬─────────────┘
                    ▼
         content.readiness_cache = { total, scored_at, response_id }
```

Queue reads (`getDelegationQueue`) rank by `readiness_cache.total` with a
fallback to `getLatestReadinessForTask(taskId)` for tasks the evaluator
hasn't swept yet. The wizard's Brief step loads the response row and
iterates `criteria_snapshot` + `values` to render dimension-by-dimension.

## API Reference

### `scoreDelegationReadiness({ tenantId, taskEntityId, submittedByAgentRef })`

Fetches the task's content + tenant context, computes the pure 4-dim
breakdown, submits and promotes a response. Returns
`{ responseId, breakdown, cache }`.

### `getLatestReadinessForTask({ tenantId, taskEntityId })`

Returns the latest promoted readiness response projected into the cache
shape (`{ total, scored_at, response_id }`), or `null` if the tenant has
not yet scored this task.

### `evaluateTenant(tenantId)`

Per-tenant sweep: scores every eligible task, skipping those scored within the
last 24h. Lives in `features/custom/server/work-model/readiness-sweep.ts`.
Invoked by the `scoreDelegationReadinessSweep` tool during the nightly
action, and available for direct server-side invocation (e.g. manual "re-score
now"). Returns `{ tenantId, scored, skipped, failed, errors }`.

### `DELEGATION_READINESS_CRITERIA_SET` constant

Exported from `features/custom/seeds/delegation-readiness-criteria-set.ts`
— used by the seed migration + unit tests to guarantee the rubric shape
stays consistent.

## Customizing the Rubric (per tenant)

The seed creates a baseline row per tenant. Admins can edit dimensions via
the criteria-set editor (`/admin` → Criteria Sets). The evaluator keeps
writing the same four dimensions, but:

- The promotion trigger's total-derivation order is **`normalized_score`
  → `values.total` → sum of numeric values → 0**, so weighted dimensions
  take effect.
- The cache carries only `{ total, scored_at, response_id }` — no need to
  migrate per-task cache when dimensions change.
- The wizard's breakdown panel iterates `criteria_snapshot` + `values`
  generically — custom dimension labels, rubrics, tooltips, and scales
  render without code changes.

If you need to override the evaluator's rubric (not just the criteria-set
shape), replace `scoreDelegationReadiness` in `features/custom/server/work-model/readiness.ts`
— the function is already in the venture-specific `features/custom/`
tree for exactly this reason.

## For Agents

Agents submit readiness scores the same way the evaluator does:

```ts
import { scoreDelegationReadiness } from "@/features/custom/server/work-model/readiness";

await scoreDelegationReadiness({
  tenantId,
  taskEntityId,
  submittedByAgentRef: "my-agent-slug",
});
```

The `submitResponse` AI tool is the chat/LLM path — agents without direct
server-action access can still submit by calling the tool with the right
`criteria_set_id` and dimension values. The response status flips to
`promoted` via `promote_dimension_response` once a reviewer approves, and
the cache populates automatically.

## Design Decisions

- **Cache is shape-agnostic** — `{ total, scored_at, response_id }` only.
  Per-dimension values live on the response row. See
  `documents/work/2026-04-24-task-delegation-criteria-sets/decisions.md`.
- **Evaluator is deterministic, not LLM-driven** — the pure rubric doesn't
  benefit from an LLM round-trip. Cost + latency without accuracy gain.
- **Daily cadence, not on-read** — re-scoring on every task read would
  double-query every queue render. Once per tenant per day amortizes the
  fetch cost.
- **`source = 'extraction'`** — the closest-fit existing response source;
  we don't introduce a new CHECK value until the next cut validates the
  category's shape.

## Related Modules

- `content/docs/features/command-center.mdx` — where the queue + wizard
  ship as UI blocks.
- `content/docs/features/tasks.mdx` — the task entity + delegation state
  machine this scoring feeds into.
- `content/docs/features/agent-system.mdx` — how evaluator-sourced
  responses attribute (`submitted_by_agent` text ref).

Documentation source

Delegation Readiness

Versioned, shape-agnostic delegation-readiness scoring via the criteria-set primitive.

# Delegation Readiness

Every task entity in `human_only` or `trialing` delegation state gets a daily
readiness assessment — a 0..100 score that ranks the delegate-page queue and
shows reviewers a per-dimension breakdown of _why_ the score looks the way it
does. Scores are submitted as responses against the platform-level
`delegation-readiness` criteria set; there is no flat `readiness_score` field.

## Overview

The score answers: _"If I delegate this task to an agent today, will it go
well?"_ The stock rubric is four dimensions (volume, time efficiency, agent
coverage, proven pattern) at 0..25 each, summing to 0..100. Tenants can fork
the criteria set to add or reweight dimensions without touching platform code.

## Key Concepts

- **Criteria set (`delegation-readiness`)** — seeded into every tenant via
  `supabase/migrations/20260425000000_*.sql` and
  `seedDelegationReadinessCriteriaSet` (called from `createTenant`). Scoped
  to `entity_type_slug='task'`.
- **Response** — one row in `entity_responses` per assessment. Latest
  promoted response per task is the source of truth. Older responses are
  kept for score-progression.
- **Denorm cache** — `entities.content.readiness_cache = { total, scored_at,
  response_id }`. Populated by a DB trigger on promotion. Shape-agnostic: no
  per-dimension fields, so custom rubrics render without cache migration.
- **Evaluator** — the `delegation-readiness-sweep` action (a `public.actions`
  row seeded per tenant) fires nightly at 04:00 UTC via `taskCron`. The row's
  assigned agent (`delegation-evaluator`, a system agent) runs a single tool
  call (`scoreDelegationReadinessSweep`) that invokes `evaluateTenant()`.
  Admins pause, reschedule, or swap the agent from `/tasks` — no code change.

## How It Works

```text
         ┌────────────────────────┐
         │ public.actions row     │  slug: delegation-readiness-sweep
         │ (trigger_type: cron)   │  schedule: 0 4 * * *
         └──────────┬─────────────┘
                    │ taskCron scans every minute
                    ▼
         ┌────────────────────────┐
         │ triggerTask() →        │
         │ session-executor       │
         └──────────┬─────────────┘
                    │ resolves delegation-evaluator agent
                    ▼
         ┌────────────────────────┐
         │ score_delegation_      │  single-shot tool call
         │ readiness_sweep tool   │
         └──────────┬─────────────┘
                    │ evaluateTenant(tenantId)
                    ▼
         ┌────────────────────────┐
         │ scoreDelegationReadiness │  pure rubric + context
         │ (readiness.ts)         │  for each eligible task
         └──────────┬─────────────┘
                    │ submit + promote
                    ▼
         ┌────────────────────────┐
         │ entity_responses       │  source of truth
         └──────────┬─────────────┘
                    │ AFTER promotion
                    ▼
         ┌────────────────────────┐
         │ trg_update_task_       │  writes cache
         │ readiness_cache        │
         └──────────┬─────────────┘
                    ▼
         content.readiness_cache = { total, scored_at, response_id }
```

Queue reads (`getDelegationQueue`) rank by `readiness_cache.total` with a
fallback to `getLatestReadinessForTask(taskId)` for tasks the evaluator
hasn't swept yet. The wizard's Brief step loads the response row and
iterates `criteria_snapshot` + `values` to render dimension-by-dimension.

## API Reference

### `scoreDelegationReadiness({ tenantId, taskEntityId, submittedByAgentRef })`

Fetches the task's content + tenant context, computes the pure 4-dim
breakdown, submits and promotes a response. Returns
`{ responseId, breakdown, cache }`.

### `getLatestReadinessForTask({ tenantId, taskEntityId })`

Returns the latest promoted readiness response projected into the cache
shape (`{ total, scored_at, response_id }`), or `null` if the tenant has
not yet scored this task.

### `evaluateTenant(tenantId)`

Per-tenant sweep: scores every eligible task, skipping those scored within the
last 24h. Lives in `features/custom/server/work-model/readiness-sweep.ts`.
Invoked by the `scoreDelegationReadinessSweep` tool during the nightly
action, and available for direct server-side invocation (e.g. manual "re-score
now"). Returns `{ tenantId, scored, skipped, failed, errors }`.

### `DELEGATION_READINESS_CRITERIA_SET` constant

Exported from `features/custom/seeds/delegation-readiness-criteria-set.ts`
— used by the seed migration + unit tests to guarantee the rubric shape
stays consistent.

## Customizing the Rubric (per tenant)

The seed creates a baseline row per tenant. Admins can edit dimensions via
the criteria-set editor (`/admin` → Criteria Sets). The evaluator keeps
writing the same four dimensions, but:

- The promotion trigger's total-derivation order is **`normalized_score`
  → `values.total` → sum of numeric values → 0**, so weighted dimensions
  take effect.
- The cache carries only `{ total, scored_at, response_id }` — no need to
  migrate per-task cache when dimensions change.
- The wizard's breakdown panel iterates `criteria_snapshot` + `values`
  generically — custom dimension labels, rubrics, tooltips, and scales
  render without code changes.

If you need to override the evaluator's rubric (not just the criteria-set
shape), replace `scoreDelegationReadiness` in `features/custom/server/work-model/readiness.ts`
— the function is already in the venture-specific `features/custom/`
tree for exactly this reason.

## For Agents

Agents submit readiness scores the same way the evaluator does:

```ts
import { scoreDelegationReadiness } from "@/features/custom/server/work-model/readiness";

await scoreDelegationReadiness({
  tenantId,
  taskEntityId,
  submittedByAgentRef: "my-agent-slug",
});
```

The `submitResponse` AI tool is the chat/LLM path — agents without direct
server-action access can still submit by calling the tool with the right
`criteria_set_id` and dimension values. The response status flips to
`promoted` via `promote_dimension_response` once a reviewer approves, and
the cache populates automatically.

## Design Decisions

- **Cache is shape-agnostic** — `{ total, scored_at, response_id }` only.
  Per-dimension values live on the response row. See
  `documents/work/2026-04-24-task-delegation-criteria-sets/decisions.md`.
- **Evaluator is deterministic, not LLM-driven** — the pure rubric doesn't
  benefit from an LLM round-trip. Cost + latency without accuracy gain.
- **Daily cadence, not on-read** — re-scoring on every task read would
  double-query every queue render. Once per tenant per day amortizes the
  fetch cost.
- **`source = 'extraction'`** — the closest-fit existing response source;
  we don't introduce a new CHECK value until the next cut validates the
  category's shape.

## Related Modules

- `content/docs/features/command-center.mdx` — where the queue + wizard
  ship as UI blocks.
- `content/docs/features/tasks.mdx` — the task entity + delegation state
  machine this scoring feeds into.
- `content/docs/features/agent-system.mdx` — how evaluator-sourced
  responses attribute (`submitted_by_agent` text ref).