Documentation source

Rate Limiting

How the platform throttles unauthenticated surfaces, with operational notes on the in-memory limiter's cold-start tradeoff and the Upstash upgrade path

# Rate Limiting

The platform uses a single in-memory sliding-window limiter at `lib/rate-limit.ts` for all currently-throttled surfaces. This page documents what that buys us, where it falls short, and when to upgrade.

## Where rate-limiting is applied

| Surface | Gate | Limit | Window | Source |
| --- | --- | --- | --- | --- |
| `/share/[token]` per-IP | Brute-force enumeration cap | 10 | 1 min | `app/share/[token]/share-rate-limit.ts` |
| `/share/[token]` per-token failures | Token-keyspace walk denial | 5 | 1 hour | same |
| `/api/sessions/[id]/events` | Event-flood throttle | 30 | 1 min | `app/api/sessions/[id]/events/route.ts` |
| `/api/chat` and friends | Per-user request rate | varies | varies | per-route via `checkRateLimit()` |

All four flow through `checkRateLimit(key, { limit, windowMs })`. The `peekRateLimit()` variant (added in PR #1637) reads the same map without consuming a slot — used by `generateMetadata` so the share-page metadata path can short-circuit without burning the budget the actual GET would consume.

## How the limiter works

`lib/rate-limit.ts` keeps a `Map<key, timestamps[]>` per server instance. Each call:

1. Drops timestamps older than `now - windowMs`.
2. If the remaining count `>= limit`, returns `{ allowed: false, retryAfterMs }`.
3. Otherwise appends `now` and returns `{ allowed: true }`.

A `MAX_TRACKED_KEYS = 5_000` ceiling caps total memory and evicts the oldest insertion-ordered key when full.

## The cold-start tradeoff (read before changing limits)

The limiter is **per-instance, in-process**. Vercel Fluid Compute reuses function instances across concurrent requests, so within a warm instance the limiter behaves as advertised. But:

- **A new instance starts with an empty map.** When traffic scales out to a new instance, that instance's `windows` is empty — an attacker hitting a fresh instance gets a full fresh budget. Cross-instance brute force is harder than within a warm instance, but not impossible.
- **Instance death drops state.** When Vercel reclaims a cold instance, its rate-limit history is gone. A burst attacker who can trigger instance churn (e.g., by spreading requests across regions) can wash their slate.
- **Multi-region deploys multiply the budget.** Each region's instance has its own map. A determined attacker can multiply the effective budget by `regions × instances/region × time-to-evict`.

For `/share/[token]`, this is an acceptable tradeoff:

- The share surface is not high-traffic — there aren't enough concurrent visitors to keep many instances warm in parallel.
- Token entropy is high enough that even with a few amplified budgets, brute-force enumeration is computationally infeasible.
- The per-token failure gate (5/hour keyed on `(token-prefix, ip)`) shrinks the keyspace an attacker can probe without burning their per-IP budget at the same time.

For `/api/sessions/[id]/events`, ditto — the gate is there for cost containment, not security.

## When to upgrade to distributed (Upstash)

Upgrade when **any** of these become true:

1. **A new surface is added where the attacker model assumes a determined adversary** — e.g., login, password reset, API-key creation, payment endpoints. The brute-force prevention guarantee of the in-memory limiter is "good enough" for low-stakes surfaces; high-stakes surfaces want hard guarantees.
2. **Traffic patterns push us past one or two warm instances** during peak hours. Once Vercel routinely keeps three or more instances warm for our traffic, the per-instance budget multiplier becomes a real ceiling.
3. **A `connect-src` audit reveals our rate limiter is the only thing protecting a real-money path** — e.g., LLM token spend, third-party API costs, document generation costs. In-memory isn't a defense-in-depth layer; it's an only-defense layer, which is wrong for cost-bearing flows.
4. **An incident demonstrates limit-washing** in production logs (look for the same IP burning the daily budget across N instances).

## How to swap in Upstash

The four-step migration:

1. **Add the deps:** `pnpm add @upstash/redis @upstash/ratelimit`.
2. **Replace `lib/rate-limit.ts` internals** — keep the `checkRateLimit` / `peekRateLimit` / `getClientIp` exports and signatures. Substitute the in-memory Map for the Upstash sliding-window primitive. Every caller is shielded by the interface — `share-rate-limit.ts`, `sessions/events`, chat routes, etc. don't change.
3. **Set env:** `UPSTASH_REDIS_REST_URL` and `UPSTASH_REDIS_REST_TOKEN` in Vercel envs (production + preview). Local dev can keep the in-memory fallback by leaving them empty; the swap should `if (!process.env.UPSTASH_REDIS_REST_URL) return checkRateLimitInMemory(...)`.
4. **Verify with a focused test** in `lib/rate-limit.test.ts` that hits the Upstash mock for one window and proves cross-instance budget enforcement.

Estimated effort: half a day for a swap with no regressions; full day if we want to add a config-driven per-route limit-and-window table while we're in there.

## Operational notes

- **Local dev** runs entirely in-memory. There is no Redis dependency for `pnpm dev`. The limiter survives HMR (the Map is on a module-level binding); reload the dev server to clear state.
- **Tests** rely on test isolation, not on the production limiter behavior. `lib/rate-limit.test.ts` resets between cases. Any new test that exercises a rate-limited route should mock `checkRateLimit` to return `{ allowed: true }` unless the test is specifically about throttling.
- **Sentry breadcrumbs** are added on every 429 throw — see `share-rate-limit.ts` for the pattern. Never include the full token in breadcrumb data; truncate to 8 chars.

## Why not Vercel KV / Redis on Vercel directly?

Vercel KV / Redis are no longer offered (deprecated in late 2025). The recommended path on Vercel for distributed rate-limiting is Upstash via the Vercel Marketplace integration — it's a first-party-supported option and matches the platform's "fewer primitives" philosophy.

## Related

- `lib/rate-limit.ts` — the primitive.
- `app/share/[token]/share-rate-limit.ts` — share-surface composition.
- `documents/CHANGELOG.md` PR #1630 — initial share rate-limit ship.
- `documents/reviews/2026-05-19-pr-1637-dev-to-main.md` — review that surfaced the cold-start documentation gap.

Documentation source

Rate Limiting

How the platform throttles unauthenticated surfaces, with operational notes on the in-memory limiter's cold-start tradeoff and the Upstash upgrade path

# Rate Limiting

The platform uses a single in-memory sliding-window limiter at `lib/rate-limit.ts` for all currently-throttled surfaces. This page documents what that buys us, where it falls short, and when to upgrade.

## Where rate-limiting is applied

| Surface | Gate | Limit | Window | Source |
| --- | --- | --- | --- | --- |
| `/share/[token]` per-IP | Brute-force enumeration cap | 10 | 1 min | `app/share/[token]/share-rate-limit.ts` |
| `/share/[token]` per-token failures | Token-keyspace walk denial | 5 | 1 hour | same |
| `/api/sessions/[id]/events` | Event-flood throttle | 30 | 1 min | `app/api/sessions/[id]/events/route.ts` |
| `/api/chat` and friends | Per-user request rate | varies | varies | per-route via `checkRateLimit()` |

All four flow through `checkRateLimit(key, { limit, windowMs })`. The `peekRateLimit()` variant (added in PR #1637) reads the same map without consuming a slot — used by `generateMetadata` so the share-page metadata path can short-circuit without burning the budget the actual GET would consume.

## How the limiter works

`lib/rate-limit.ts` keeps a `Map<key, timestamps[]>` per server instance. Each call:

1. Drops timestamps older than `now - windowMs`.
2. If the remaining count `>= limit`, returns `{ allowed: false, retryAfterMs }`.
3. Otherwise appends `now` and returns `{ allowed: true }`.

A `MAX_TRACKED_KEYS = 5_000` ceiling caps total memory and evicts the oldest insertion-ordered key when full.

## The cold-start tradeoff (read before changing limits)

The limiter is **per-instance, in-process**. Vercel Fluid Compute reuses function instances across concurrent requests, so within a warm instance the limiter behaves as advertised. But:

- **A new instance starts with an empty map.** When traffic scales out to a new instance, that instance's `windows` is empty — an attacker hitting a fresh instance gets a full fresh budget. Cross-instance brute force is harder than within a warm instance, but not impossible.
- **Instance death drops state.** When Vercel reclaims a cold instance, its rate-limit history is gone. A burst attacker who can trigger instance churn (e.g., by spreading requests across regions) can wash their slate.
- **Multi-region deploys multiply the budget.** Each region's instance has its own map. A determined attacker can multiply the effective budget by `regions × instances/region × time-to-evict`.

For `/share/[token]`, this is an acceptable tradeoff:

- The share surface is not high-traffic — there aren't enough concurrent visitors to keep many instances warm in parallel.
- Token entropy is high enough that even with a few amplified budgets, brute-force enumeration is computationally infeasible.
- The per-token failure gate (5/hour keyed on `(token-prefix, ip)`) shrinks the keyspace an attacker can probe without burning their per-IP budget at the same time.

For `/api/sessions/[id]/events`, ditto — the gate is there for cost containment, not security.

## When to upgrade to distributed (Upstash)

Upgrade when **any** of these become true:

1. **A new surface is added where the attacker model assumes a determined adversary** — e.g., login, password reset, API-key creation, payment endpoints. The brute-force prevention guarantee of the in-memory limiter is "good enough" for low-stakes surfaces; high-stakes surfaces want hard guarantees.
2. **Traffic patterns push us past one or two warm instances** during peak hours. Once Vercel routinely keeps three or more instances warm for our traffic, the per-instance budget multiplier becomes a real ceiling.
3. **A `connect-src` audit reveals our rate limiter is the only thing protecting a real-money path** — e.g., LLM token spend, third-party API costs, document generation costs. In-memory isn't a defense-in-depth layer; it's an only-defense layer, which is wrong for cost-bearing flows.
4. **An incident demonstrates limit-washing** in production logs (look for the same IP burning the daily budget across N instances).

## How to swap in Upstash

The four-step migration:

1. **Add the deps:** `pnpm add @upstash/redis @upstash/ratelimit`.
2. **Replace `lib/rate-limit.ts` internals** — keep the `checkRateLimit` / `peekRateLimit` / `getClientIp` exports and signatures. Substitute the in-memory Map for the Upstash sliding-window primitive. Every caller is shielded by the interface — `share-rate-limit.ts`, `sessions/events`, chat routes, etc. don't change.
3. **Set env:** `UPSTASH_REDIS_REST_URL` and `UPSTASH_REDIS_REST_TOKEN` in Vercel envs (production + preview). Local dev can keep the in-memory fallback by leaving them empty; the swap should `if (!process.env.UPSTASH_REDIS_REST_URL) return checkRateLimitInMemory(...)`.
4. **Verify with a focused test** in `lib/rate-limit.test.ts` that hits the Upstash mock for one window and proves cross-instance budget enforcement.

Estimated effort: half a day for a swap with no regressions; full day if we want to add a config-driven per-route limit-and-window table while we're in there.

## Operational notes

- **Local dev** runs entirely in-memory. There is no Redis dependency for `pnpm dev`. The limiter survives HMR (the Map is on a module-level binding); reload the dev server to clear state.
- **Tests** rely on test isolation, not on the production limiter behavior. `lib/rate-limit.test.ts` resets between cases. Any new test that exercises a rate-limited route should mock `checkRateLimit` to return `{ allowed: true }` unless the test is specifically about throttling.
- **Sentry breadcrumbs** are added on every 429 throw — see `share-rate-limit.ts` for the pattern. Never include the full token in breadcrumb data; truncate to 8 chars.

## Why not Vercel KV / Redis on Vercel directly?

Vercel KV / Redis are no longer offered (deprecated in late 2025). The recommended path on Vercel for distributed rate-limiting is Upstash via the Vercel Marketplace integration — it's a first-party-supported option and matches the platform's "fewer primitives" philosophy.

## Related

- `lib/rate-limit.ts` — the primitive.
- `app/share/[token]/share-rate-limit.ts` — share-surface composition.
- `documents/CHANGELOG.md` PR #1630 — initial share rate-limit ship.
- `documents/reviews/2026-05-19-pr-1637-dev-to-main.md` — review that surfaced the cold-start documentation gap.