PDF Viewer
Virtualized, full-featured PDF viewer with text search, annotations, and AI integration.
Overview
features/pdf/ is a self-contained PDF rendering module built on react-pdf (PDF.js) and @tanstack/react-virtual. It renders arbitrarily large documents without mounting all pages at once, and exposes a composable API so it can be embedded directly or wrapped in a block.
The companion block at features/blocks/components/pdf-viewer/ extends the core viewer with multi-document navigation, persistent highlight annotations stored as entities, and an AI action bar.
Primary use case (MortgageQ): Loan files, appraisals, and title reports displayed inline on entity detail pages with one-click "Send to AI" context injection.
Key Concepts
PdfHighlight
A highlight overlay drawn on top of a rendered page. Coordinates are normalized (0–1) relative to the page viewport so they remain stable as the user changes zoom level.
interface PdfHighlight {
id: string;
page: number; // 0-based page index
x: number; // 0-1 normalized left
y: number; // 0-1 normalized top
width: number; // 0-1 normalized width
height: number; // 0-1 normalized height
color?: string; // CSS color value
text?: string; // captured text (optional)
}Search matches and persistent annotations both arrive as PdfHighlight[]. The viewer merges them before rendering — the active search match gets a brighter activeHighlightColor via mixBlendMode: "multiply".
SelectionAction
An extension point on the text-selection popup menu. Pass custom actions via PdfViewerProps.selectionActions[] to add product-specific buttons (e.g., "Send to AI", "Add to brief").
interface SelectionAction {
id: string;
label: string;
icon?: React.ReactNode;
onAction: (context: SelectionContext) => void;
}
interface SelectionContext {
text: string;
pageIndex: number; // 0-based
coordinates: { x: number; y: number; width: number; height: number };
rect: DOMRect;
}PdfViewerBlockConfig
Configuration stored in a BlockConfig.config JSON field for the pdf-viewer block type.
| Field | Type | Default | Purpose |
|---|---|---|---|
documentIds | string[] | — | Explicit document IDs (standalone mode) |
entityId | string | from block context | Override entity for document lookup |
showAiActions | boolean | true | Show AI action bar below the viewer |
showAnnotations | boolean | true | Enable highlight annotation creation/display |
annotationEntityTypeSlug | string | — | Entity type slug used to store annotations as entities |
showThumbnails | boolean | false | Show thumbnail sidebar inside the viewer |
defaultZoom | number | — | Initial zoom % (e.g., 100). Falls back to fit-page. |
How It Works
Rendering pipeline
PdfProvider — initializes PDF.js worker once per render tree
└─ PdfErrorBoundary — React error boundary with fallback UI
└─ PdfViewer — entry component (lazy-loaded in the block layer)
├─ usePdfViewer() — all viewer state + document lifecycle
├─ PdfToolbar — navigation, zoom, search toggle, print, download
├─ PdfSearch — search input with match counter
├─ PdfTextSelection — selection popup wrapper
└─ PdfScrollArea — @tanstack/react-virtual list
└─ PdfPage × n — react-pdf <Page/> + highlight overlaysVirtual scrolling
PdfScrollArea uses useVirtualizer to render only the pages visible inside the scroll container plus overscan: 3 pages on each side. Page height is estimated from the first page's viewport metadata, then refined by measureElement as pages are rendered. A ResizeObserver re-measures on container width change (debounced 250 ms).
Scroll-to-page is wired via scrollToPageRef — a mutable ref populated by PdfScrollArea and consumed by usePdfViewer navigation methods and search navigation. This avoids prop-drilling a callback through the virtualizer boundary.
Current page is derived from the scroll position (midpoint detection) rather than set imperatively, so it stays accurate during freeform scrolling.
Text search
createSearchController() in features/pdf/lib/text-search.ts returns a stateful controller with three methods: search(), cancel(), and clearCache().
- Pages are searched in batches of 10 (
SEARCH_CONFIG.batchSize). - Between batches the controller yields to the main thread via
requestIdleCallback(orsetTimeout(0)fallback). - Extracted text positions are cached in a
Map<pageIndex, TextPosition[]>so repeated queries on the same document skip PDF.js API calls. - Each
search()call issues a newAbortController, automatically cancelling the previous search. - Progress highlights are reported after each batch that yields new matches, making results appear incrementally for large documents.
extractTextPositions() converts PDF.js transform matrices (PDF coordinate space, origin bottom-left) to normalized top-left CSS coordinates.
Annotations (block layer)
useAnnotations in the block component manages highlight annotations as regular platform entities:
- Read:
GET /api/entities?typeSlug=&relatedTo=fetches annotation entities for the current document entity, cached by React Query (60 s stale time). - Create:
POST /api/entitiescreates an annotation entity withcontent: { page_number, coordinates, selected_text, color, annotation_type }. Page numbers are stored 1-based and converted to 0-based only at render time. - Delete:
DELETE /api/entities/:idremoves the annotation entity; the query is invalidated on success.
toHighlights() and fromSelectionContext() convert between the entity content shape and PdfHighlight.
AI integration
The block dispatches window.CustomEvent("amble:send-to-ai", { detail }) for three actions:
| Action | Event type | Payload extras |
|---|---|---|
| Send page | pdf-page | documentId, documentTitle, page |
| Send selection | pdf-selection | documentId, documentTitle, text, page |
| Extract insights | pdf-extract | documentId, documentTitle |
The chat dock listens for this event. Neither the viewer nor the block imports from features/chat/, keeping the coupling one-directional.
Worker initialization
PdfInitializer (mounted by PdfProvider) configures pdfjs.GlobalWorkerOptions.workerSrc to /pdf.worker.min.mjs from /public/ on first mount, avoiding external CDN requests. It also suppresses expected cancellation errors (AbortException, TextLayer task cancelled, etc.) from the console and the global unhandledrejection handler to keep the DevTools noise-free.
API Reference
<PdfViewer>
import { PdfViewer } from "@/features/pdf"
<PdfViewer
fileUrl={signedUrl}
initialPage={0}
initialZoom={100}
fitOnLoad="page" // "page" | "width" | "none"
enableSearch={true}
enableTextSelection={true}
enableAnnotationLayer={true}
showToolbar={true}
showThumbnails={false}
selectionActions={[...]} // SelectionAction[]
highlights={[...]} // PdfHighlight[] — merged with search matches
onPageChange={(page) => {}}
onDocumentLoad={(numPages, doc) => {}}
onError={(err) => {}}
/>Always wrap in <PdfProvider> and <PdfErrorBoundary>. Lazy-load via next/dynamic({ ssr: false }) — the module imports react-pdf which is ~400 KB and browser-only.
usePdfViewer(options)
Core hook. Manages all viewer state. Consumed internally by PdfViewer but exported for custom viewer implementations.
Options:
| Param | Type | Default |
|---|---|---|
initialPage | number | 0 |
initialZoom | number | 100 |
fitToPageOnLoad | boolean | true |
onPageChange | (page: number) => void | — |
onZoomChange | (zoom: number) => void | — |
onDocumentLoad | (doc: PdfDocumentProxy) => void | — |
onError | (err: Error) => void | — |
Key return values:
| Name | Type | Description |
|---|---|---|
document | PdfDocumentProxy | null | Loaded PDF.js document proxy |
numPages | number | Total page count |
currentPage | number | 0-based current page |
zoom | number | Current zoom % |
highlights | PdfHighlight[] | Search result highlights |
scrollToPageRef | MutableRefObject<(i: number) => void | null> | Wire to virtualizer for programmatic scroll |
goToPage(page) | (page: number) => void | Navigate to 0-based page |
search(query) | (q: string) => void | Run full-text search (async, cancellable) |
fitToWidth() | () => void | Fit zoom to container width |
fitToPage() | () => void | Fit zoom to full page in container |
download(url, filename?) | — | Trigger file download |
createSearchController()
import { createSearchController } from "@/features/pdf/lib/text-search"
const controller = createSearchController()
// Returns Promise<PdfHighlight[]>. Calls onProgress after each batch.
await controller.search(doc, numPages, query, onProgress)
controller.cancel() // abort current search
controller.clearCache() // clear text position cache (e.g., on document change)useAnnotations(options)
import { useAnnotations } from "@/features/blocks/components/pdf-viewer/use-annotations"
const { highlights, createAnnotation, deleteAnnotation, isLoading } = useAnnotations({
entityTypeSlug: "annotation", // entity type slug for annotation records
documentEntityId: doc.document_entity_id,
enabled: true,
})
// Creates annotation entity from a SelectionContext
createAnnotation(selectionContext, "yellow")
// Deletes by entity ID or "annotation-{entityId}" prefixed ID
deleteAnnotation("annotation-uuid")Constants
import {
ZOOM_CONFIG, // { default: 100, min: 25, max: 500, step: 25 }
SEARCH_CONFIG, // { minQueryLength: 2, debounceDelay: 300, batchSize: 10, ... }
PERFORMANCE_CONFIG,// { overscan: 3, resizeDebounce: 250, pageGap: 16, ... }
ANNOTATION_COLORS, // { yellow, blue, green, pink, orange } → rgba strings
} from "@/features/pdf"For Agents
The PDF viewer is a read-only UI component — there are no AI tools in this module directly. Agents interact with PDF content through the surrounding entity system and documents module.
How agents access PDF content:
- Use
getEntityto retrieve a document entity and itssigned_urlfield for direct access. - Use
searchEntitieswith the annotation entity type slug to find existing highlights on a document. - Use
createEntitywith the annotation entity type to programmatically create highlight annotations.
Custom event integration:
The block fires amble:send-to-ai custom events when users click AI action bar buttons. The chat agent receives the document context (title, page number, selected text) as part of the user message. Agents should handle these payloads to provide targeted document analysis.
Design Decisions
Normalized highlight coordinates. Highlights use 0-1 values relative to the page viewport rather than pixel values. This means the same PdfHighlight renders correctly at any zoom level without recalculation.
Annotations as entities. Rather than a dedicated pdf_annotations table, annotations are stored as regular platform entities related to the document entity. This keeps the schema domain-agnostic — any product can configure a different entity type for annotations — and lets agents query and create annotations through the standard entity API.
Lazy-loading PdfViewer. next/dynamic({ ssr: false }) is used at the block boundary. PDF.js includes a large WASM worker that cannot run in Node.js. Lazy-loading prevents it from appearing in the server bundle and avoids a Next.js build error.
scrollToPageRef over prop callbacks. The virtualizer's scrollToIndex must be called on the virtualizer instance, which lives inside PdfScrollArea. Rather than lifting the virtualizer out or prop-drilling a callback through multiple layers, a mutable ref is shared. usePdfViewer populates scrollToPageRef as a hook return value; PdfScrollArea writes the actual implementation into it after the virtualizer is ready.
Current page derived from scroll. Programmatic navigation sets currentPage immediately (optimistic), but freeform scrolling updates it via a 50 ms debounced midpoint detection loop. This avoids fighting the virtualizer's scroll state and keeps the toolbar page indicator accurate during keyboard or mouse scrolling.
Cancellable batch search. Long documents (200+ pages) would freeze the UI if searched synchronously. Batching with requestIdleCallback yields between batches, and AbortController ensures a new keystroke immediately stops the previous search rather than letting stale results race in.
Related Modules
- Document Processing — upload, chunking, signed URL generation;
DocumentRecordtype consumed by the block layer - Block System —
ResolvedBlocktype, how block data is resolved server-side before rendering - Entity System — annotation entities are stored and queried through the standard entity API
- Chat — receives
amble:send-to-aicustom events from the AI action bar
Analytics and Cost Tracking
Fire-and-forget event recording for analytics, plus per-call AI cost tracking and runtime telemetry for prompt caching, reasoning, and context-management visibility.
Unified Response System
One typed submission layer for people and agents — criteria sets define templates, entity_responses preserve versioned evidence, and promotion writes approved values into canonical record fields.