Documentation source
PDF Viewer
Virtualized, full-featured PDF viewer with text search, annotations, and AI integration.
## Overview
`features/pdf/` is a self-contained PDF rendering module built on [`react-pdf`](https://github.com/wojciech-niedbalski/react-pdf) (PDF.js) and [`@tanstack/react-virtual`](https://tanstack.com/virtual). It renders arbitrarily large documents without mounting all pages at once, and exposes a composable API so it can be embedded directly or wrapped in a block.
The companion block at `features/blocks/components/pdf-viewer/` extends the core viewer with multi-document navigation, persistent highlight annotations stored as entities, and an AI action bar.
**Primary use case (MortgageQ):** Loan files, appraisals, and title reports displayed inline on entity detail pages with one-click "Send to AI" context injection.
---
## Key Concepts
### PdfHighlight
A highlight overlay drawn on top of a rendered page. Coordinates are **normalized (0–1)** relative to the page viewport so they remain stable as the user changes zoom level.
```ts
interface PdfHighlight {
id: string;
page: number; // 0-based page index
x: number; // 0-1 normalized left
y: number; // 0-1 normalized top
width: number; // 0-1 normalized width
height: number; // 0-1 normalized height
color?: string; // CSS color value
text?: string; // captured text (optional)
}
```
Search matches and persistent annotations both arrive as `PdfHighlight[]`. The viewer merges them before rendering — the active search match gets a brighter `activeHighlightColor` via `mixBlendMode: "multiply"`.
### SelectionAction
An extension point on the text-selection popup menu. Pass custom actions via `PdfViewerProps.selectionActions[]` to add product-specific buttons (e.g., "Send to AI", "Add to brief").
```ts
interface SelectionAction {
id: string;
label: string;
icon?: React.ReactNode;
onAction: (context: SelectionContext) => void;
}
interface SelectionContext {
text: string;
pageIndex: number; // 0-based
coordinates: { x: number; y: number; width: number; height: number };
rect: DOMRect;
}
```
### PdfViewerBlockConfig
Configuration stored in a `BlockConfig.config` JSON field for the `pdf-viewer` block type.
| Field | Type | Default | Purpose |
|---|---|---|---|
| `documentIds` | `string[]` | — | Explicit document IDs (standalone mode) |
| `entityId` | `string` | from block context | Override entity for document lookup |
| `showAiActions` | `boolean` | `true` | Show AI action bar below the viewer |
| `showAnnotations` | `boolean` | `true` | Enable highlight annotation creation/display |
| `annotationEntityTypeSlug` | `string` | — | Entity type slug used to store annotations as entities |
| `showThumbnails` | `boolean` | `false` | Show thumbnail sidebar inside the viewer |
| `defaultZoom` | `number` | — | Initial zoom % (e.g., `100`). Falls back to fit-page. |
---
## How It Works
### Rendering pipeline
```
PdfProvider — initializes PDF.js worker once per render tree
└─ PdfErrorBoundary — React error boundary with fallback UI
└─ PdfViewer — entry component (lazy-loaded in the block layer)
├─ usePdfViewer() — all viewer state + document lifecycle
├─ PdfToolbar — navigation, zoom, search toggle, print, download
├─ PdfSearch — search input with match counter
├─ PdfTextSelection — selection popup wrapper
└─ PdfScrollArea — @tanstack/react-virtual list
└─ PdfPage × n — react-pdf <Page/> + highlight overlays
```
### Virtual scrolling
`PdfScrollArea` uses `useVirtualizer` to render only the pages visible inside the scroll container plus `overscan: 3` pages on each side. Page height is estimated from the first page's viewport metadata, then refined by `measureElement` as pages are rendered. A `ResizeObserver` re-measures on container width change (debounced 250 ms).
Scroll-to-page is wired via `scrollToPageRef` — a mutable ref populated by `PdfScrollArea` and consumed by `usePdfViewer` navigation methods and search navigation. This avoids prop-drilling a callback through the virtualizer boundary.
Current page is **derived** from the scroll position (midpoint detection) rather than set imperatively, so it stays accurate during freeform scrolling.
### Text search
`createSearchController()` in `features/pdf/lib/text-search.ts` returns a stateful controller with three methods: `search()`, `cancel()`, and `clearCache()`.
- Pages are searched in **batches of 10** (`SEARCH_CONFIG.batchSize`).
- Between batches the controller yields to the main thread via `requestIdleCallback` (or `setTimeout(0)` fallback).
- Extracted text positions are cached in a `Map<pageIndex, TextPosition[]>` so repeated queries on the same document skip PDF.js API calls.
- Each `search()` call issues a new `AbortController`, automatically cancelling the previous search.
- Progress highlights are reported after each batch that yields new matches, making results appear incrementally for large documents.
`extractTextPositions()` converts PDF.js transform matrices (PDF coordinate space, origin bottom-left) to normalized top-left CSS coordinates.
### Annotations (block layer)
`useAnnotations` in the block component manages highlight annotations as regular platform entities:
1. **Read:** `GET /api/entities?typeSlug=&relatedTo=` fetches annotation entities for the current document entity, cached by React Query (60 s stale time).
2. **Create:** `POST /api/entities` creates an annotation entity with `content: { page_number, coordinates, selected_text, color, annotation_type }`. Page numbers are stored 1-based and converted to 0-based only at render time.
3. **Delete:** `DELETE /api/entities/:id` removes the annotation entity; the query is invalidated on success.
`toHighlights()` and `fromSelectionContext()` convert between the entity `content` shape and `PdfHighlight`.
### AI integration
The block dispatches `window.CustomEvent("amble:send-to-ai", { detail })` for three actions:
| Action | Event `type` | Payload extras |
|---|---|---|
| Send page | `pdf-page` | `documentId`, `documentTitle`, `page` |
| Send selection | `pdf-selection` | `documentId`, `documentTitle`, `text`, `page` |
| Extract insights | `pdf-extract` | `documentId`, `documentTitle` |
The chat dock listens for this event. Neither the viewer nor the block imports from `features/chat/`, keeping the coupling one-directional.
### Worker initialization
`PdfInitializer` (mounted by `PdfProvider`) configures `pdfjs.GlobalWorkerOptions.workerSrc` to `/pdf.worker.min.mjs` from `/public/` on first mount, avoiding external CDN requests. It also suppresses expected cancellation errors (`AbortException`, `TextLayer task cancelled`, etc.) from the console and the global `unhandledrejection` handler to keep the DevTools noise-free.
---
## API Reference
### `<PdfViewer>`
```tsx
import { PdfViewer } from "@/features/pdf"
<PdfViewer
fileUrl={signedUrl}
initialPage={0}
initialZoom={100}
fitOnLoad="page" // "page" | "width" | "none"
enableSearch={true}
enableTextSelection={true}
enableAnnotationLayer={true}
showToolbar={true}
showThumbnails={false}
selectionActions={[...]} // SelectionAction[]
highlights={[...]} // PdfHighlight[] — merged with search matches
onPageChange={(page) => {}}
onDocumentLoad={(numPages, doc) => {}}
onError={(err) => {}}
/>
```
Always wrap in `<PdfProvider>` and `<PdfErrorBoundary>`. Lazy-load via `next/dynamic({ ssr: false })` — the module imports `react-pdf` which is ~400 KB and browser-only.
### `usePdfViewer(options)`
Core hook. Manages all viewer state. Consumed internally by `PdfViewer` but exported for custom viewer implementations.
**Options:**
| Param | Type | Default |
|---|---|---|
| `initialPage` | `number` | `0` |
| `initialZoom` | `number` | `100` |
| `fitToPageOnLoad` | `boolean` | `true` |
| `onPageChange` | `(page: number) => void` | — |
| `onZoomChange` | `(zoom: number) => void` | — |
| `onDocumentLoad` | `(doc: PdfDocumentProxy) => void` | — |
| `onError` | `(err: Error) => void` | — |
**Key return values:**
| Name | Type | Description |
|---|---|---|
| `document` | `PdfDocumentProxy \| null` | Loaded PDF.js document proxy |
| `numPages` | `number` | Total page count |
| `currentPage` | `number` | 0-based current page |
| `zoom` | `number` | Current zoom % |
| `highlights` | `PdfHighlight[]` | Search result highlights |
| `scrollToPageRef` | `MutableRefObject<(i: number) => void \| null>` | Wire to virtualizer for programmatic scroll |
| `goToPage(page)` | `(page: number) => void` | Navigate to 0-based page |
| `search(query)` | `(q: string) => void` | Run full-text search (async, cancellable) |
| `fitToWidth()` | `() => void` | Fit zoom to container width |
| `fitToPage()` | `() => void` | Fit zoom to full page in container |
| `download(url, filename?)` | — | Trigger file download |
### `createSearchController()`
```ts
import { createSearchController } from "@/features/pdf/lib/text-search"
const controller = createSearchController()
// Returns Promise<PdfHighlight[]>. Calls onProgress after each batch.
await controller.search(doc, numPages, query, onProgress)
controller.cancel() // abort current search
controller.clearCache() // clear text position cache (e.g., on document change)
```
### `useAnnotations(options)`
```ts
import { useAnnotations } from "@/features/blocks/components/pdf-viewer/use-annotations"
const { highlights, createAnnotation, deleteAnnotation, isLoading } = useAnnotations({
entityTypeSlug: "annotation", // entity type slug for annotation records
documentEntityId: doc.document_entity_id,
enabled: true,
})
// Creates annotation entity from a SelectionContext
createAnnotation(selectionContext, "yellow")
// Deletes by entity ID or "annotation-{entityId}" prefixed ID
deleteAnnotation("annotation-uuid")
```
### Constants
```ts
import {
ZOOM_CONFIG, // { default: 100, min: 25, max: 500, step: 25 }
SEARCH_CONFIG, // { minQueryLength: 2, debounceDelay: 300, batchSize: 10, ... }
PERFORMANCE_CONFIG,// { overscan: 3, resizeDebounce: 250, pageGap: 16, ... }
ANNOTATION_COLORS, // { yellow, blue, green, pink, orange } → rgba strings
} from "@/features/pdf"
```
---
## For Agents
The PDF viewer is a **read-only UI component** — there are no AI tools in this module directly. Agents interact with PDF content through the surrounding entity system and documents module.
**How agents access PDF content:**
- Use `getEntity` to retrieve a document entity and its `signed_url` field for direct access.
- Use `searchEntities` with the annotation entity type slug to find existing highlights on a document.
- Use `createEntity` with the annotation entity type to programmatically create highlight annotations.
**Custom event integration:**
The block fires `amble:send-to-ai` custom events when users click AI action bar buttons. The chat agent receives the document context (title, page number, selected text) as part of the user message. Agents should handle these payloads to provide targeted document analysis.
---
## Design Decisions
**Normalized highlight coordinates.** Highlights use 0-1 values relative to the page viewport rather than pixel values. This means the same `PdfHighlight` renders correctly at any zoom level without recalculation.
**Annotations as entities.** Rather than a dedicated `pdf_annotations` table, annotations are stored as regular platform entities related to the document entity. This keeps the schema domain-agnostic — any product can configure a different entity type for annotations — and lets agents query and create annotations through the standard entity API.
**Lazy-loading `PdfViewer`.** `next/dynamic({ ssr: false })` is used at the block boundary. PDF.js includes a large WASM worker that cannot run in Node.js. Lazy-loading prevents it from appearing in the server bundle and avoids a Next.js build error.
**`scrollToPageRef` over prop callbacks.** The virtualizer's `scrollToIndex` must be called on the virtualizer instance, which lives inside `PdfScrollArea`. Rather than lifting the virtualizer out or prop-drilling a callback through multiple layers, a mutable ref is shared. `usePdfViewer` populates `scrollToPageRef` as a hook return value; `PdfScrollArea` writes the actual implementation into it after the virtualizer is ready.
**Current page derived from scroll.** Programmatic navigation sets `currentPage` immediately (optimistic), but freeform scrolling updates it via a 50 ms debounced midpoint detection loop. This avoids fighting the virtualizer's scroll state and keeps the toolbar page indicator accurate during keyboard or mouse scrolling.
**Cancellable batch search.** Long documents (200+ pages) would freeze the UI if searched synchronously. Batching with `requestIdleCallback` yields between batches, and `AbortController` ensures a new keystroke immediately stops the previous search rather than letting stale results race in.
---
## Related Modules
- [Document Processing](/docs/features/document-processing) — upload, chunking, signed URL generation; `DocumentRecord` type consumed by the block layer
- [Block System](/docs/features/block-system) — `ResolvedBlock` type, how block data is resolved server-side before rendering
- [Entity System](/docs/features/entity-system) — annotation entities are stored and queried through the standard entity API
- [Chat](/docs/features/chat) — receives `amble:send-to-ai` custom events from the AI action bar