Delegation Readiness
Versioned, shape-agnostic delegation-readiness scoring via the criteria-set primitive.
Delegation Readiness
Every task entity in human_only or trialing delegation state gets a daily
readiness assessment — a 0..100 score that ranks the delegate-page queue and
shows reviewers a per-dimension breakdown of why the score looks the way it
does. Scores are submitted as responses against the platform-level
delegation-readiness criteria set; there is no flat readiness_score field.
Overview
The score answers: "If I delegate this task to an agent today, will it go well?" The stock rubric is four dimensions (volume, time efficiency, agent coverage, proven pattern) at 0..25 each, summing to 0..100. Tenants can fork the criteria set to add or reweight dimensions without touching platform code.
Key Concepts
- Criteria set (
delegation-readiness) — seeded into every tenant viasupabase/migrations/20260425000000_*.sqlandseedDelegationReadinessCriteriaSet(called fromcreateTenant). Scoped toentity_type_slug='task'. - Response — one row in
entity_responsesper assessment. Latest promoted response per task is the source of truth. Older responses are kept for score-progression. - Denorm cache —
entities.content.readiness_cache = { total, scored_at, response_id }. Populated by a DB trigger on promotion. Shape-agnostic: no per-dimension fields, so custom rubrics render without cache migration. - Evaluator — the
delegation-readiness-sweepaction (apublic.actionsrow seeded per tenant) fires nightly at 04:00 UTC viataskCron. The row's assigned agent (delegation-evaluator, a system agent) runs a single tool call (score_delegation_readiness_sweep) that invokesevaluateTenant(). Admins pause, reschedule, or swap the agent from/tasks— no code change.
How It Works
┌────────────────────────┐
│ public.actions row │ slug: delegation-readiness-sweep
│ (trigger_type: cron) │ schedule: 0 4 * * *
└──────────┬─────────────┘
│ taskCron scans every minute
▼
┌────────────────────────┐
│ triggerTask() → │
│ session-executor │
└──────────┬─────────────┘
│ resolves delegation-evaluator agent
▼
┌────────────────────────┐
│ score_delegation_ │ single-shot tool call
│ readiness_sweep tool │
└──────────┬─────────────┘
│ evaluateTenant(tenantId)
▼
┌────────────────────────┐
│ scoreDelegationReadiness │ pure rubric + context
│ (readiness.ts) │ for each eligible task
└──────────┬─────────────┘
│ submit + promote
▼
┌────────────────────────┐
│ entity_responses │ source of truth
└──────────┬─────────────┘
│ AFTER promotion
▼
┌────────────────────────┐
│ trg_update_task_ │ writes cache
│ readiness_cache │
└──────────┬─────────────┘
▼
content.readiness_cache = { total, scored_at, response_id }Queue reads (getDelegationQueue) rank by readiness_cache.total with a
fallback to getLatestReadinessForTask(taskId) for tasks the evaluator
hasn't swept yet. The wizard's Brief step loads the response row and
iterates criteria_snapshot + values to render dimension-by-dimension.
API Reference
scoreDelegationReadiness({ tenantId, taskEntityId, submittedByAgentRef })
Fetches the task's content + tenant context, computes the pure 4-dim
breakdown, submits and promotes a response. Returns
{ responseId, breakdown, cache }.
getLatestReadinessForTask({ tenantId, taskEntityId })
Returns the latest promoted readiness response projected into the cache
shape ({ total, scored_at, response_id }), or null if the tenant has
not yet scored this task.
evaluateTenant(tenantId)
Per-tenant sweep: scores every eligible task, skipping those scored within the
last 24h. Lives in features/custom/server/work-model/readiness-sweep.ts.
Invoked by the score_delegation_readiness_sweep tool during the nightly
action, and available for direct server-side invocation (e.g. manual "re-score
now"). Returns { tenantId, scored, skipped, failed, errors }.
DELEGATION_READINESS_CRITERIA_SET constant
Exported from features/custom/seeds/delegation-readiness-criteria-set.ts
— used by the seed migration + unit tests to guarantee the rubric shape
stays consistent.
Customizing the Rubric (per tenant)
The seed creates a baseline row per tenant. Admins can edit dimensions via
the criteria-set editor (/admin → Criteria Sets). The evaluator keeps
writing the same four dimensions, but:
- The promotion trigger's total-derivation order is
normalized_score→values.total→ sum of numeric values → 0, so weighted dimensions take effect. - The cache carries only
{ total, scored_at, response_id }— no need to migrate per-task cache when dimensions change. - The wizard's breakdown panel iterates
criteria_snapshot+valuesgenerically — custom dimension labels, rubrics, tooltips, and scales render without code changes.
If you need to override the evaluator's rubric (not just the criteria-set
shape), replace scoreDelegationReadiness in features/custom/server/work-model/readiness.ts
— the function is already in the venture-specific features/custom/
tree for exactly this reason.
For Agents
Agents submit readiness scores the same way the evaluator does:
import { scoreDelegationReadiness } from "@/features/custom/server/work-model/readiness";
await scoreDelegationReadiness({
tenantId,
taskEntityId,
submittedByAgentRef: "my-agent-slug",
});The submitResponse AI tool is the chat/LLM path — agents without direct
server-action access can still submit by calling the tool with the right
criteria_set_id and dimension values. The response status flips to
promoted via promote_dimension_response once a reviewer approves, and
the cache populates automatically.
Design Decisions
- Cache is shape-agnostic —
{ total, scored_at, response_id }only. Per-dimension values live on the response row. Seedocuments/work/2026-04-24-task-delegation-criteria-sets/decisions.md. - Evaluator is deterministic, not LLM-driven — the pure rubric doesn't benefit from an LLM round-trip. Cost + latency without accuracy gain.
- Daily cadence, not on-read — re-scoring on every task read would double-query every queue render. Once per tenant per day amortizes the fetch cost.
source = 'extraction'— the closest-fit existing response source; we don't introduce a new CHECK value until the next cut validates the category's shape.
Related Modules
content/docs/features/command-center.mdx— where the queue + wizard ship as UI blocks.content/docs/features/tasks.mdx— the task entity + delegation state machine this scoring feeds into.content/docs/features/agent-system.mdx— how evaluator-sourced responses attribute (submitted_by_agenttext ref).