Documentation source
Video Studio
Provider-agnostic AI video generation — storyboard planner, model catalog, async render pipeline, Studio UI, and agent tools for creating tenant-scoped video clips.
## Overview
Video Studio gives agents and humans a storyboard-first pipeline for generating short-form AI video clips. A creative brief becomes an editable, inspectable storyboard plan before any expensive generation begins. Generation runs asynchronously in an Inngest worker and produces a stored artifact — observable at every stage, cost-attributed, and tenant-scoped with RLS.
The platform is deliberately provider-agnostic: the video model catalog is keyed by Vercel AI Gateway slugs, so model swaps are configuration changes rather than code changes. Phase 1 routes all traffic through the system gateway credential; per-tenant BYOK/direct transports are a catalogued follow-up.
The same pipeline is reachable from two front doors: three agent tools (`planVideoStoryboard`, `generateVideo`, `getVideoRenderStatus`) and a `/studio` UI (brief → editable storyboard cards → generate → inline player). See ADR-0049 for the decision record.
## Key Concepts
### Render
A render (`video_renders` table row) is the central artifact. It owns the storyboard, the generation parameters, the render status, and the storage path of the produced clip.
```typescript
type VideoRenderStatus = "draft" | "queued" | "generating" | "complete" | "failed";
type VideoAspectRatio = "9:16" | "16:9" | "1:1";
```
The status machine is linear: `draft → queued → generating → complete | failed`. A render in `draft` has a planned storyboard but has not been sent to the generator. Flipping to `queued` happens through the single seam in `features/videos/server/queue.ts` — nowhere else.
### Storyboard
A storyboard is a structured plan stored in the render's `storyboard` jsonb column. It is the inspectable, editable artifact that separates the "think" step from the "spend" step.
```typescript
interface StoryboardScene {
visual: string; // What the viewer sees — visual direction for this beat
onScreenText: string; // Caption / overlay text (empty = no overlay)
narration: string; // Voiceover line (informational in Phase 1)
durationSeconds: number; // Beat length, 1–30 s
}
interface Storyboard {
hookOptions: string[]; // Ranked opening-hook options (best first)
selectedHook: number; // Index into hookOptions
scenes: StoryboardScene[]; // 1–12 ordered beats
callToAction: string;
styleNotes: string; // Visual style / mood direction
}
```
Phase 1 renders the full board as one clip via a composed prompt (`composeRenderPrompt`). The scene schema anticipates per-scene generation and stitching — that is a catalogued follow-up.
### Status Machine
| Status | Meaning |
|--------|---------|
| `draft` | Storyboard planned; generation not started |
| `queued` | Inngest event fired; worker has not picked it up yet |
| `generating` | Worker is calling `experimental_generateVideo` |
| `complete` | Video stored in Supabase Storage; `output_url` populated |
| `failed` | Generation failed; `error_message` populated |
`ACTIVE_VIDEO_RENDER_STATUSES` (`["queued", "generating"]`) is the set the client poller checks against to decide whether to keep polling.
### Model Catalog
`VIDEO_MODEL_CATALOG` in `features/videos/lib/video-models.ts` is the authoritative list of supported video models. Each entry carries its gateway slug (used as the public model id), supported durations, supported aspect ratios, per-second cost in cents, and whether the model produces native audio.
```typescript
interface VideoModelEntry {
id: string; // Gateway slug — also the tool input model id
name: string;
provider: string;
supportedDurations: number[]; // Seconds; first entry is the default
aspectRatios: string[]; // First entry is the default
costPerSecondCents: number;
nativeAudio: boolean;
recommended?: boolean;
}
```
Video models are intentionally not in the `ai_models` table: they have no token pricing, no text capability, and their cost unit is seconds rather than tokens.
## How It Works
### Brief → Storyboard → Queue → Inngest → Storage → Poll
1. **Brief submitted.** The Studio UI or `planVideoStoryboard` tool calls `createVideoRenderAction` with a title, brief, aspect ratio, and target duration. An optional `sourceEntityId` grounds the plan in a real data record.
2. **Storyboard planned.** `planStoryboard` calls `safeGenerateObject` with a structured prompt that includes entity context (if supplied) and the brief. The LLM produces a `Storyboard` conforming to `storyboardSchema`. The render is written to `video_renders` in `draft` status with the storyboard stored in the `storyboard` jsonb column.
3. **User reviews / edits.** In the Studio UI, the storyboard scenes are displayed as editable cards. The user can modify scene visuals, on-screen text, narration, select a different hook, and adjust the CTA before generating.
4. **Generate triggered.** `generateVideoRenderAction` (or `generateVideo` tool with a `renderId`) validates the render is in `draft`, calls `queueVideoRender`, which updates the row to `queued` and fires the `video/render.requested` Inngest event. This is the only path that flips status to `queued`.
5. **Inngest worker runs.** The `videoRender` function in `features/inngest/functions/video-render.ts` picks up the event, marks the row `generating`, calls `experimental_generateVideo` via `gateway.videoModel(modelId)`, uploads the returned video buffer to Supabase Storage, and marks the row `complete` with `output_url`. On failure the row is marked `failed` with `error_message`.
6. **Client polls.** `useVideoRenders` (React Query, 3 s refetch when any render is active) keeps the Studio UI current. The `getVideoRenderStatus` agent tool is the equivalent for agent callers.
### Gateway-Only Transport (Phase 1)
All video generation in Phase 1 goes through the Vercel AI Gateway via the system credential. The tenant `blockSystemDefaults` gateway setting is honored: if a tenant has blocked platform defaults, video generation is unavailable until a tenant-level key is configured. This mirrors the behaviour of other gateway-gated features. Per-tenant BYOK and direct-provider transports are tracked in `documents/work/2026-06-11-video-studio/followups.md`.
### Cost Attribution
Each completed render records a cost event via `recordCostEvent` with two new fields: `videoSeconds` (actual clip length) and `videoCostPerSecondCents` (the model catalog rate at generation time). This extends the existing cost system — no parallel table. The extension mirrors the per-image cost path already in `features/cost/`.
## API Reference
### Server Actions (`features/videos/server/actions.ts`)
| Function | Description |
|----------|-------------|
| `listVideoRendersAction()` | List the tenant's renders, newest first (limit 50). |
| `getVideoRenderAction(renderId)` | Fetch a single render by ID (tenant-scoped). |
| `createVideoRenderAction(input)` | Create a draft render and plan its storyboard in one step. Returns the `VideoRenderRow`. |
| `generateVideoRenderAction(renderId)` | Validate the render is in `draft`, queue it, return the updated row. |
| `updateStoryboardAction(renderId, storyboard)` | Persist user edits to the storyboard (draft status only). |
### Queue Seam (`features/videos/server/queue.ts`)
`queueVideoRender(renderId, tenantId)` is the only function that transitions a render to `queued`. It updates the DB row and fires the Inngest event. All callers — server actions, tools, and any future API route — go through this seam.
### Storyboard Planner (`features/videos/server/storyboard.ts`)
| Function | Description |
|----------|-------------|
| `planStoryboard(brief, options)` | LLM-driven storyboard generation via `safeGenerateObject`. Returns a validated `Storyboard`. |
| `loadEntityContext(entityId, tenantId)` | Fetch entity content for grounding the storyboard. Returns a formatted string. |
### Type Utilities (`features/videos/types.ts`)
| Function | Description |
|----------|-------------|
| `composeRenderPrompt(storyboard)` | Flatten a storyboard into the single text prompt sent to the video model. |
| `parseStoryboard(value)` | Parse a jsonb storyboard column; returns `null` on missing or malformed data. |
| `storyboardDurationSeconds(storyboard)` | Sum of scene `durationSeconds` values. |
### Model Catalog (`features/videos/lib/video-models.ts`)
| Export | Description |
|--------|-------------|
| `VIDEO_MODEL_CATALOG` | Full list of supported video models. |
| `DEFAULT_VIDEO_MODEL_ID` | Gateway slug of the recommended default model. |
| `getVideoModel(id)` | Look up a catalog entry by gateway slug. Returns `undefined` when not found. |
### Client Hook (`features/videos/components/use-video-renders.ts`)
`useVideoRenders()` — React Query hook. Fetches the tenant's renders list; sets `refetchInterval` to 3 000 ms when any render has an active status, otherwise `false`. Provides a stable `invalidate()` helper for post-action cache busting.
### Render (`features/inngest/functions/video-render.ts`)
The `videoRender` Inngest function handles the full async generation lifecycle: `queued → generating → complete | failed`. It is registered on the documents domain app (same worker as document processing). Concurrency is capped at 3 per tenant to prevent runaway spend.
## For Agents
### Available Tools
| Tool | Input | Output | Permission |
|------|-------|--------|------------|
| `planVideoStoryboard` | `title`, `brief`, `context?`, `entityId?`, `targetDurationSeconds?`, `aspectRatio?` | Storyboard plan + render ID | `entities.team.create` |
| `generateVideo` | `renderId` (preferred) or `prompt` + `title` + `modelId?` + `durationSeconds?` + `aspectRatio?` | Queued render ID | `entities.team.create` |
| `getVideoRenderStatus` | `renderId` | Status, output URL when complete | `entities.team.read` |
All three tools are in the `media` tool group.
### Plan-Before-Generate Rule
Always call `planVideoStoryboard` before `generateVideo`. The storyboard step is cheap (one LLM call); the generation step is expensive (1–6 min, billed per second). By planning first, the agent and user can review the scene breakdown and hook options before committing to generation.
```
planVideoStoryboard(brief, ...) → renderId
↓ (optional: surface storyboard to user for review)
generateVideo(renderId) → queued render id
↓ (poll)
getVideoRenderStatus(renderId) → { status: "complete", outputUrl: "..." }
```
`generateVideo` also accepts a direct `prompt` for quick clips that do not need a storyboard (e.g., a simple motion graphic). Use this path sparingly.
### Polling Etiquette
Video generation takes 1–6 minutes. After calling `generateVideo`, poll `getVideoRenderStatus` no more than once every 15 seconds. Stop polling once status is `complete` or `failed`. Do not call `generateVideo` again on the same render — doing so will fail (the row is no longer in `draft`).
## Design Decisions
**Dedicated `video_renders` table rather than sessions or entities.** Sessions are agent execution records with append-only event logs; they are not artifact stores. Entities are DB-driven and per-tenant; platform code must not depend on a "video" entity type existing (Critical Rule 2). The `video_renders` table follows the same precedent as `documents`: a purpose-built media artifact table with its own status machine. See ADR-0049.
**Gateway-only transport in Phase 1.** The AI Gateway provides multi-model access under one credential, usage metering, and rate-limit management — ideal while the catalog is new and usage patterns are unknown. Direct-provider transports require per-tenant key management and provider-specific SDKs. Phase 1 opts for operational simplicity; BYOK is a catalogued follow-up (see `documents/work/2026-06-11-video-studio/followups.md`).
**Storyboard-first pipeline.** Short-form video succeeds or fails on its opening hook and scene pacing. An LLM-planned storyboard externalizes that judgment into an inspectable, editable artifact before any costly generation step. The schema models scenes at the beat level so per-scene generation and multi-clip stitching can be added without changing the data shape.
**Per-second cost attribution, not per-model flat rate.** Video generation is billed by output seconds, not by request. `recordCostEvent` is extended with `videoSeconds` and `videoCostPerSecondCents` so the existing cost system can aggregate spend accurately across models with different per-second rates. No parallel cost table is introduced.
**Inngest for generation, not inline in tool/action.** AI video generation blocks for 1–6 minutes. Running it inline in a tool call would exhaust the serverless function timeout and prevent the agent from doing other work. The `queueVideoRender` seam decouples the trigger from the work; the Inngest worker handles retries, concurrency limits, and observability.
## Related Modules
- **Tool System** (`features/tools/`, `content/docs/features/tool-system.mdx`) — The three video tools are registered here; permission gating follows the same `requiredPermission` pattern as other media tools.
- **Inngest** (`features/inngest/`, `content/docs/integrations/inngest.mdx`) — The `videoRender` function runs on the documents domain app alongside document processing.
- **Cost / Analytics** (`features/cost/`, `content/docs/features/analytics-cost.mdx`) — `recordCostEvent` is extended for video; cost rows appear in the same tenant cost dashboard.
- **Document Processing** (`features/documents/`, `content/docs/features/document-processing.mdx`) — Architectural sibling: same async pipeline pattern (Inngest worker, dedicated table, status machine, Supabase Storage output).
- **Agent System** (`features/agents/`, `content/docs/features/agent-system.mdx`) — Agents discover video tools via the `media` tool group; permission gating uses the standard `entities.team.create` / `entities.team.read` roles.