Documentation source
Analytics and Cost Tracking
Fire-and-forget event recording for analytics, plus per-call AI cost tracking and runtime telemetry for prompt caching, reasoning, and context-management visibility.
# Analytics and Cost Tracking
The platform tracks two categories of operational data: general analytics events (user actions, feature usage, system events) and AI cost events (token counts, model pricing, per-agent spend). Both systems use a fire-and-forget pattern that never blocks the calling operation.
## Overview
Analytics and cost tracking are split across two lightweight modules:
- **Analytics** (`features/analytics/`) -- `recordAnalyticsEvent()` validates events against a Zod registry, then forwards to PostHog. The `analytics_events` Postgres table is deprecated (migration `20260416100000`); PostHog is the single authoritative sink.
- **Cost** (`features/cost/`) -- records AI usage costs with model-aware pricing, queries aggregated data, and renders a summary dashboard in the Admin panel.
Both modules use fire-and-forget patterns that never block the caller.
## Key Concepts
### Analytics
**recordAnalyticsEvent<T>(type, payload, tenantId, userId?)** -- Validates `payload` against the Zod schema in `features/analytics/events.ts`, then calls `posthog.capture()` via `posthog-node`. Invalid payloads are dropped with a dev-mode warning; PostHog failures are silent.
The event registry (`features/analytics/events.ts`) defines 29 v1 events using the `resource.verb` convention:
- **Auth** -- `auth.signed_in`, `auth.signed_up`, `auth.signed_out`, `auth.tenant_switched`, `auth.tenant_created`
- **Entity** -- `entity.created`, `entity.updated`, `entity.deleted`, `entity.viewed`
- **Response** -- `response.submitted`, `response.promoted`
- **Chat** -- `chat.started`, `chat.message_sent`, `chat.agent_selected`
- **Tool** -- `tool.invoked`, `tool.completed`, `tool.failed`, `tool.session_shared`, `tool.embed_opened`, `tool.embed_submitted`
- **Task / Session** -- `task.created`, `task.session_started`, `task.session_completed`, `task.session_failed`, `task.human_completed`
- **Agent** -- `agent.dispatched`, `agent.heartbeat_run`
- **Runtime telemetry** -- `ai.runtime.completed`
- **Activation** -- `feature.adopted` (8 first-use features; fires once per user/tenant)
Every capture includes `groups: { tenant: tenantId }` for Group Analytics aggregation.
**Client-side identification:** `features/analytics/identify.ts` provides `identifyUser`, `identifyTenantGroup`, `identifyEmbedSession`, `resetIdentity`, and `registerSuperProperties`. `AnalyticsIdentitySync` in the app shell layout wires identity + super-properties on navigation.
**Activation tracking:** `features/analytics/feature-adoption.ts` calls `trackFeatureAdoption(userId, tenantId, feature)`. It writes `user_feature_adoptions` with `INSERT ... ON CONFLICT DO NOTHING`; only a first-ever insert emits `feature.adopted`. Wired at 8 sites: entity create, response submit, chat, tool run, task create, agent dispatch, view save, share create.
**Server flush:** `lib/analytics/posthog-server.ts` exports `flushPostHog()` and `shutdownPostHog()`. Key API routes wrap completion with `after(() => flushPostHog())`; Inngest steps `await flushPostHog()` before returning.
**DNT honoring:** `lib/analytics/posthog-provider.tsx` initializes PostHog with `opt_out_capturing_by_default` set from `navigator.doNotTrack === "1"`.
**Adding a new event:**
1. Add a Zod schema to `EVENT_SCHEMAS` in `features/analytics/events.ts`.
2. Add a test case to `features/analytics/events.test.ts`.
3. Fire with `void recordAnalyticsEvent("event.name", payload, tenantId, userId)` at the source.
4. For E2E assertions, use `attachPostHogCapture(page)` from `e2e/helpers/posthog.ts`.
### Cost Tracking
**CostEvent** -- A single AI usage record:
- `provider` -- AI provider name (e.g., "anthropic", "openai")
- `model` -- model identifier
- `input_tokens`, `output_tokens` -- token counts from the API response
- `cost_cents` -- computed cost in cents using model-specific pricing
- `agent_id` -- which agent made the call (nullable)
- `chat_id` -- which chat session (nullable)
- `source` -- context (e.g., "chat", "extraction", "heartbeat")
**CostSummary** -- Aggregated metrics over a time window:
- `totalCostCents` -- total spend
- `totalInputTokens`, `totalOutputTokens` -- total tokens
- `dailyBurnRate` -- average cost per day
- `eventCount` -- number of API calls
**CostTelemetrySummary** -- Runtime telemetry aggregated from `ai.runtime.completed` analytics events:
- `totalCacheReadTokens`, `totalCacheWriteTokens`, `totalReasoningTokens`
- `promptCacheHitCount`, `promptCacheWriteCount`
- `promptCachingConfiguredCount`, `contextManagementConfiguredCount`
- `appliedContextEditCount`
**CostByAgent** -- Spend grouped by agent ID.
**CostByModel** -- Spend grouped by model name, enriched with cache-read, cache-write, and reasoning-token totals.
## How It Works
### Recording Costs
The `recordCostEvent()` function in `features/cost/server/record-cost.ts`:
1. Looks up the model in the `ai_models` registry via `getModelEntry()` to get per-million-token pricing
2. Calculates cost: `(inputTokens / 1M * inputRate) + (outputTokens / 1M * outputRate)`, rounded to 2 decimal places in cents
3. Falls back to default rates ($3/1M input, $15/1M output) only when a model is not in the registry, for example external-agent or DB-unavailable fallback paths
4. Inserts into `cost_events` table via admin client
5. Reports insert failures through the non-fatal error path without blocking the user-facing AI response
This function is called after every tracked `streamText()` / `generateText()` / `generateObject()` / embedding call. Image generation currently records visible zero-token rows until image pricing units are added.
### Recording Runtime Telemetry
The shared agent runtime also emits an analytics event after every completed run:
- Event type: `ai.runtime.completed`
- Metadata: `model`, `source`, `agentId`, `chatId`, `cacheReadTokens`, `cacheWriteTokens`, `reasoningTokens`, `promptCachingConfigured`, `contextManagementConfigured`, `appliedContextEdits`
This event is additive to `cost_events`: spend and raw token accounting remain in the cost table, while provider-specific runtime signals live in analytics.
### Querying Cost Data
The `getCostData(tenantId, days?, agentId?)` function performs a single-pass aggregation:
1. Paginates through `cost_events` for spend and token totals.
2. Paginates through `analytics_events` for `ai.runtime.completed` telemetry.
3. Aggregates:
- Total cost, input tokens, output tokens
- Workspace or agent-scoped cache-read, cache-write, reasoning, and context-management metrics
- Per-agent spend
- Per-model spend plus cache/reasoning totals
4. Returns `{ summary, telemetry, byAgent, byModel }`
A thin `getCostSummary()` wrapper delegates to `getCostData()` for callers that only need the top-line totals. The previous `getCostByAgent()` / `getCostByModel()` wrappers were deleted — they re-ran the full scan (cost events + runtime telemetry + agent-name join) just to return one property. Call `getCostData()` once and read the relevant breakdown instead.
### Admin Dashboard
The `CostSummary` component in `features/cost/components/cost-summary.tsx` renders:
1. **Five KPI cards** -- Total Spend (30d), Daily Burn Rate, Total Tokens, Cache Reuse, Reasoning Tokens
2. **Runtime Telemetry** -- cache writes, prompt-caching configuration count, context-management configuration count, applied context edits, cache-write tokens
3. **Cost by Agent** -- list of agents with call counts and spend
4. **Cost by Model** -- list of models with call counts, spend, cache-read, cache-write, and reasoning totals
The component fetches data from `GET /api/costs/summary` on mount. Empty state shows a message indicating costs are tracked automatically. Values are formatted with smart suffixes (cents for small amounts, dollars for larger; K/M for tokens).
## API Reference
### Analytics
| Function | Location | Purpose |
| ------------------------------------------------ | ------------------------------ | ------------------------------- |
| `recordAnalyticsEvent(type, metadata, tenantId)` | `features/analytics/record.ts` | Fire-and-forget event recording |
### Cost Recording
| Function | Location | Purpose |
| ------------------------ | ------------------------------------- | --------------------------------- |
| `recordCostEvent(input)` | `features/cost/server/record-cost.ts` | Record AI usage with auto-pricing |
### Cost Queries
| Function | Location | Purpose |
| --------------------------------- | --------------------------------- | --------------------------------------------------------- |
| `getCostData(tenantId, days?)` | `features/cost/server/queries.ts` | Full cost data: summary + by-agent + by-model + by-driver |
| `getCostSummary(tenantId, days?)` | Same | Summary only (wrapper) |
### Types
| Type | Location | Purpose |
| ---------------------- | ------------------------ | ------------------------------------------------------- |
| `CostEvent` | `features/cost/types.ts` | Single AI usage record |
| `CostSummary` | Same | Aggregated cost metrics |
| `CostTelemetrySummary` | Same | Aggregated prompt-caching and runtime telemetry metrics |
| `CostByAgent` | Same | Per-agent spend |
| `CostByModel` | Same | Per-model spend plus cache/reasoning totals |
### Components
| Component | Location | Purpose |
| ------------- | ------------------------------------------- | --------------------------------------------- |
| `CostSummary` | `features/cost/components/cost-summary.tsx` | Admin cost dashboard with KPIs and breakdowns |
## AI Limits and Cost Caps
In addition to tracking costs after the fact, the platform enforces configurable limits before AI calls are made. These limits are resolved per-tenant from `tenant_settings` (key `"ai_limits"`) and fall back to environment variable defaults when no tenant setting exists.
### AiLimits
```typescript
interface AiLimits {
/** Daily cost cap in cents. 0 = no cap. Default: 5000 ($50). */
dailyCapCents: number;
/** Chat messages per minute per user. 0 = no limit. Default: 30. */
chatRateLimit: number;
/** Max output tokens per model response. 0 = model default. Default: 0. */
maxTokensPerResponse: number;
}
```
### Resolving Limits
`getAiLimits(tenantId)` in `features/cost/server/ai-limits.ts` reads from the settings cascade:
1. **Default tenant setting** — a system-wide baseline in `tenant_settings` for the default tenant
2. **Active tenant override** — a per-tenant value in `tenant_settings` (higher priority, overwrites defaults field-by-field)
3. **Environment variable fallback** — `AI_DAILY_CAP_CENTS`, `CHAT_RATE_LIMIT`, `AI_MAX_TOKENS` when no setting row exists
The result is cross-request cached via `unstable_cache` with tag-based invalidation, so tenant setting changes take effect immediately on the next request after `saveTenantSetting()` is called.
### Daily Cost Cap
`checkTenantCostCap(tenantId, capCents)` in `features/cost/server/limits.ts` enforces the daily spend limit:
1. If `capCents <= 0`, returns `{ allowed: true }` immediately (no cap configured)
2. Queries `cost_events` for today's total spend (UTC midnight boundary)
3. Returns `{ allowed, currentCents, capCents }`
4. Results are cached in-process for **5 minutes** per tenant to avoid a DB query on every chat message
The check runs inside `executeAgent()` before any `streamText()` call. When the cap is exceeded, `executeAgent()` throws an error with a user-readable message including the current and cap amounts in dollars.
**Fail-open policy:** If the DB query errors (transient failure, connection issue), `checkTenantCostCap` returns `{ allowed: true }` rather than blocking AI. This prevents a DB outage from silently disabling all AI features.
### Chat Rate Limiting
The chat route (`POST /api/chat`) checks `chatRateLimit` before processing each message:
- The limit is per user, per minute (sliding window keyed as `chat:{userId}`)
- When exceeded, the route returns HTTP 429 with a `Retry-After`-style message: `"Rate limit exceeded. Try again in Xs."`
- A value of 0 disables rate limiting entirely
The rate limiter is in-process only — it resets on server restart and is not shared across replicas. For production deployments with multiple server instances, use a Redis-backed rate limiter.
### Token Cap
When `maxTokensPerResponse > 0`, `executeAgent()` passes `maxTokens` to `streamText()`. This limits output token spend per response and prevents runaway generation. A value of 0 defers to the model's own default.
### Configuration
Set limits via `tenant_settings` (key `"ai_limits"`) or environment variables:
| Env var | Default | Purpose |
| -------------------- | -------------- | --------------------------------- |
| `AI_DAILY_CAP_CENTS` | `5000` ($50) | Daily total spend cap |
| `CHAT_RATE_LIMIT` | `30` | Chat messages per user per minute |
| `AI_MAX_TOKENS` | `0` (no limit) | Max output tokens per response |
To override for a specific tenant, insert a row into `tenant_settings`:
```sql
INSERT INTO tenant_settings (tenant_id, key, value)
VALUES (
'your-tenant-id',
'ai_limits',
'{"dailyCapCents": 10000, "chatRateLimit": 60, "maxTokensPerResponse": 4096}'
);
```
### API Reference — Limits
| Function | Location | Purpose |
| ---------------------------------------- | ----------------------------------- | -------------------------------------------------------- |
| `getAiLimits(tenantId)` | `features/cost/server/ai-limits.ts` | Resolve per-tenant AI limits with cascade + env fallback |
| `checkTenantCostCap(tenantId, capCents)` | `features/cost/server/limits.ts` | Check today's spend against cap; cached 5 min |
| `__clearCostCapCache()` | Same | Clear in-process cache (test use only) |
## AI Gateway & BYOK
The platform supports two optional routing modes — the Vercel AI Gateway and per-tenant Bring Your Own Key (BYOK) provider credentials — both opt-in and default OFF. Every call still records a `cost_event`; the new attribution columns tell you which credential paid and over which transport.
### How routing precedence works
`resolveCredentialPlan` in `lib/ai/credential-resolver.ts` is the single decision point. It is pure (no IO) and runs before every `streamText` / `generateText` call:
| Priority | Credential source | `credential_source` | `routed_via` | BYOK? |
| -------- | ----------------- | ------------------- | ------------ | ----- |
| 1 | Tenant/workspace BYOK key for the model's provider | `tenant_direct` | `direct` | yes |
| 2 | System Gateway (enabled + `AI_GATEWAY_API_KEY` / OIDC present) | `gateway_system` | `gateway` | no |
| 3 | System env key — today's default | `system_env` | `direct` | no |
| — | `blockSystemDefaults` set, no BYOK key | `blocked` — `allowModelFallback` governs reroute vs. hard error | — | — |
A tenant BYOK key **always calls the provider SDK directly** — it never routes through the Vercel Gateway, even when the Gateway is enabled. Two reasons: a request-scoped Gateway BYOK call that fails silently retries on the platform's system credits (documented Vercel behavior), which would break the `blockSystemDefaults` billing-isolation guarantee; and routing direct keeps the raw key baked into a keyed provider client instead of travelling inside `providerOptions` (where it could reach the AI SDK's telemetry serialization). The Gateway is therefore the transport for the **system** credential path only (priority 2). BYOK spend is still fully attributed on `cost_events` via `credential_source = 'tenant_direct'` plus the `credential_id`.
### Configuring the Gateway and BYOK
Gateway settings live in `tenant_settings` under the key `ai_gateway`:
```typescript
interface GatewaySettings {
enabled: boolean;
blockSystemDefaults: boolean; // prevent spend on platform keys
allowModelFallback: boolean; // soft-block: try next model vs. hard error
budgetWarnCents?: number;
budgetHardCents?: number;
}
```
Tenant and workspace admins manage keys in **Admin > AI Keys & Budgets** (`/admin/ai-keys`):
1. Add a key for a provider (`anthropic`, `openai`, `google`, `xai`, `deepseek`). The raw API key is encrypted with AES-256-GCM and stored in `tenant_provider_credentials`; only a masked prefix is shown after save.
2. Optionally scope the key to a workspace (workspace key overrides the tenant key for calls within that workspace).
3. Toggle **Block system defaults** to prevent any AI call from spending platform credentials. Enabling this without adding a BYOK key for the relevant provider will cause calls for that provider to fail (or fall back if `allowModelFallback` is on).
4. Use **Test key** to verify connectivity before enabling.
### Credential attribution on cost_events
Every `cost_event` row now carries four attribution columns:
| Column | Values | Purpose |
| ------ | ------ | ------- |
| `credential_source` | `system_env` \| `gateway_system` \| `tenant_direct` (`gateway_byok` reserved, not currently emitted — BYOK routes direct) | Which credential paid |
| `credential_id` | uuid or null | Soft pointer to `tenant_provider_credentials.id` when BYOK was used |
| `is_byok` | boolean | Shorthand filter for tenant-funded calls |
| `routed_via` | `direct` \| `gateway` | Transport used |
The pointer is soft (no FK) so a revoked key never blocks a cost write — the same pattern as `driver_id` and `session_id` on `cost_events`.
### Schema
`tenant_provider_credentials` — per-tenant/workspace BYOK keys. One active row per `(tenant_id, workspace_id, provider)` enforced by a partial unique index (`WHERE revoked_at IS NULL`). The `encrypted_credentials` column uses the AES-256-GCM helper from `lib/crypto/credentials.ts` and is never returned to the client; server actions project safe metadata columns only.
`ai_models.gateway_slug` — optional dotted Gateway slug (e.g. `anthropic/claude-sonnet-4.6`). When null the resolver falls through to direct. Updated manually (scheduled sync is PR2).
## For Agents
Agents generate cost events automatically through their AI API calls. Every `streamText()` call in chat, extraction, and heartbeat records a cost event attributed to the agent's ID.
Agents can query their own cost data through the `getUsageStats` context tool, which now returns token totals, cost totals, prompt-caching signals, reasoning usage, and top-model breakdowns. This supports self-awareness about both spend and context-efficiency.
There are no agent tools for analytics event recording -- that happens automatically at the platform level.
## Related Modules
- **Agent System** (`features/agents/`) -- agent IDs are recorded on cost events; `executeAgent()` enforces cost cap and token limits
- **Chat** (`features/chat/`) -- records cost events after each AI response; `POST /api/chat` enforces chat rate limit
- **Extraction** (`features/entities/extraction/`) -- records cost events per field extraction
- **Inngest** (`features/inngest/`) -- heartbeat and background jobs record costs and respect daily cap
- **Settings** (`features/settings/`) -- stores `"ai_limits"` key in `tenant_settings`; provides the settings cascade consumed by `getAiLimits()`
- **Admin** -- cost dashboard is rendered in the Admin > Costs tab