Sprinter Docs

Delegation Readiness

Versioned, shape-agnostic delegation-readiness scoring via the criteria-set primitive.

Delegation Readiness

Every task entity in human_only or trialing delegation state gets a daily readiness assessment — a 0..100 score that ranks the delegate-page queue and shows reviewers a per-dimension breakdown of why the score looks the way it does. Scores are submitted as responses against the platform-level delegation-readiness criteria set; there is no flat readiness_score field.

Overview

The score answers: "If I delegate this task to an agent today, will it go well?" The stock rubric is four dimensions (volume, time efficiency, agent coverage, proven pattern) at 0..25 each, summing to 0..100. Tenants can fork the criteria set to add or reweight dimensions without touching platform code.

Key Concepts

  • Criteria set (delegation-readiness) — seeded into every tenant via supabase/migrations/20260425000000_*.sql and seedDelegationReadinessCriteriaSet (called from createTenant). Scoped to entity_type_slug='task'.
  • Response — one row in entity_responses per assessment. Latest promoted response per task is the source of truth. Older responses are kept for score-progression.
  • Denorm cacheentities.content.readiness_cache = { total, scored_at, response_id }. Populated by a DB trigger on promotion. Shape-agnostic: no per-dimension fields, so custom rubrics render without cache migration.
  • Evaluator — the delegation-readiness-sweep action (a public.actions row seeded per tenant) fires nightly at 04:00 UTC via taskCron. The row's assigned agent (delegation-evaluator, a system agent) runs a single tool call (score_delegation_readiness_sweep) that invokes evaluateTenant(). Admins pause, reschedule, or swap the agent from /tasks — no code change.

How It Works

         ┌────────────────────────┐
         │ public.actions row     │  slug: delegation-readiness-sweep
         │ (trigger_type: cron)   │  schedule: 0 4 * * *
         └──────────┬─────────────┘
                    │ taskCron scans every minute

         ┌────────────────────────┐
         │ triggerTask() →        │
         │ session-executor       │
         └──────────┬─────────────┘
                    │ resolves delegation-evaluator agent

         ┌────────────────────────┐
         │ score_delegation_      │  single-shot tool call
         │ readiness_sweep tool   │
         └──────────┬─────────────┘
                    │ evaluateTenant(tenantId)

         ┌────────────────────────┐
         │ scoreDelegationReadiness │  pure rubric + context
         │ (readiness.ts)         │  for each eligible task
         └──────────┬─────────────┘
                    │ submit + promote

         ┌────────────────────────┐
         │ entity_responses       │  source of truth
         └──────────┬─────────────┘
                    │ AFTER promotion

         ┌────────────────────────┐
         │ trg_update_task_       │  writes cache
         │ readiness_cache        │
         └──────────┬─────────────┘

         content.readiness_cache = { total, scored_at, response_id }

Queue reads (getDelegationQueue) rank by readiness_cache.total with a fallback to getLatestReadinessForTask(taskId) for tasks the evaluator hasn't swept yet. The wizard's Brief step loads the response row and iterates criteria_snapshot + values to render dimension-by-dimension.

API Reference

scoreDelegationReadiness({ tenantId, taskEntityId, submittedByAgentRef })

Fetches the task's content + tenant context, computes the pure 4-dim breakdown, submits and promotes a response. Returns { responseId, breakdown, cache }.

getLatestReadinessForTask({ tenantId, taskEntityId })

Returns the latest promoted readiness response projected into the cache shape ({ total, scored_at, response_id }), or null if the tenant has not yet scored this task.

evaluateTenant(tenantId)

Per-tenant sweep: scores every eligible task, skipping those scored within the last 24h. Lives in features/custom/server/work-model/readiness-sweep.ts. Invoked by the score_delegation_readiness_sweep tool during the nightly action, and available for direct server-side invocation (e.g. manual "re-score now"). Returns { tenantId, scored, skipped, failed, errors }.

DELEGATION_READINESS_CRITERIA_SET constant

Exported from features/custom/seeds/delegation-readiness-criteria-set.ts — used by the seed migration + unit tests to guarantee the rubric shape stays consistent.

Customizing the Rubric (per tenant)

The seed creates a baseline row per tenant. Admins can edit dimensions via the criteria-set editor (/admin → Criteria Sets). The evaluator keeps writing the same four dimensions, but:

  • The promotion trigger's total-derivation order is normalized_scorevalues.total → sum of numeric values → 0, so weighted dimensions take effect.
  • The cache carries only { total, scored_at, response_id } — no need to migrate per-task cache when dimensions change.
  • The wizard's breakdown panel iterates criteria_snapshot + values generically — custom dimension labels, rubrics, tooltips, and scales render without code changes.

If you need to override the evaluator's rubric (not just the criteria-set shape), replace scoreDelegationReadiness in features/custom/server/work-model/readiness.ts — the function is already in the venture-specific features/custom/ tree for exactly this reason.

For Agents

Agents submit readiness scores the same way the evaluator does:

import { scoreDelegationReadiness } from "@/features/custom/server/work-model/readiness";

await scoreDelegationReadiness({
  tenantId,
  taskEntityId,
  submittedByAgentRef: "my-agent-slug",
});

The submitResponse AI tool is the chat/LLM path — agents without direct server-action access can still submit by calling the tool with the right criteria_set_id and dimension values. The response status flips to promoted via promote_dimension_response once a reviewer approves, and the cache populates automatically.

Design Decisions

  • Cache is shape-agnostic{ total, scored_at, response_id } only. Per-dimension values live on the response row. See documents/work/2026-04-24-task-delegation-criteria-sets/decisions.md.
  • Evaluator is deterministic, not LLM-driven — the pure rubric doesn't benefit from an LLM round-trip. Cost + latency without accuracy gain.
  • Daily cadence, not on-read — re-scoring on every task read would double-query every queue render. Once per tenant per day amortizes the fetch cost.
  • source = 'extraction' — the closest-fit existing response source; we don't introduce a new CHECK value until the next cut validates the category's shape.
  • content/docs/features/command-center.mdx — where the queue + wizard ship as UI blocks.
  • content/docs/features/tasks.mdx — the task entity + delegation state machine this scoring feeds into.
  • content/docs/features/agent-system.mdx — how evaluator-sourced responses attribute (submitted_by_agent text ref).

On this page