PDF Viewer

Virtualized, full-featured PDF viewer with text search, annotations, and AI integration.

Overview

features/pdf/ is a self-contained PDF rendering module built on react-pdf (PDF.js) and @tanstack/react-virtual. It renders arbitrarily large documents without mounting all pages at once, and exposes a composable API so it can be embedded directly or wrapped in a block.

The companion block at features/blocks/components/pdf-viewer/ extends the core viewer with multi-document navigation, persistent highlight annotations stored as entities, and an AI action bar.

Primary use case (MortgageQ): Loan files, appraisals, and title reports displayed inline on entity detail pages with one-click "Send to AI" context injection.

Key Concepts

PdfHighlight

A highlight overlay drawn on top of a rendered page. Coordinates are normalized (0–1) relative to the page viewport so they remain stable as the user changes zoom level.

interface PdfHighlight {
  id: string;
  page: number;        // 0-based page index
  x: number;          // 0-1 normalized left
  y: number;          // 0-1 normalized top
  width: number;      // 0-1 normalized width
  height: number;     // 0-1 normalized height
  color?: string;     // CSS color value
  text?: string;      // captured text (optional)
}

Search matches and persistent annotations both arrive as PdfHighlight[]. The viewer merges them before rendering — the active search match gets a brighter activeHighlightColor via mixBlendMode: "multiply".

SelectionAction

An extension point on the text-selection popup menu. Pass custom actions via PdfViewerProps.selectionActions[] to add product-specific buttons (e.g., "Send to AI", "Add to brief").

interface SelectionAction {
  id: string;
  label: string;
  icon?: React.ReactNode;
  onAction: (context: SelectionContext) => void;
}

interface SelectionContext {
  text: string;
  pageIndex: number;                                 // 0-based
  coordinates: { x: number; y: number; width: number; height: number };
  rect: DOMRect;
}

PdfViewerBlockConfig

Configuration stored in a BlockConfig.config JSON field for the pdf-viewer block type.

Field	Type	Default	Purpose
`documentIds`	`string[]`	—	Explicit document IDs (standalone mode)
`entityId`	`string`	from block context	Override entity for document lookup
`showAiActions`	`boolean`	`true`	Show AI action bar below the viewer
`showAnnotations`	`boolean`	`true`	Enable highlight annotation creation/display
`annotationEntityTypeSlug`	`string`	—	Entity type slug used to store annotations as entities
`showThumbnails`	`boolean`	`false`	Show thumbnail sidebar inside the viewer
`defaultZoom`	`number`	—	Initial zoom % (e.g., `100`). Falls back to fit-page.

How It Works

Rendering pipeline

PdfProvider           — initializes PDF.js worker once per render tree
  └─ PdfErrorBoundary — React error boundary with fallback UI
       └─ PdfViewer   — entry component (lazy-loaded in the block layer)
            ├─ usePdfViewer()          — all viewer state + document lifecycle
            ├─ PdfToolbar              — navigation, zoom, search toggle, print, download
            ├─ PdfSearch               — search input with match counter
            ├─ PdfTextSelection        — selection popup wrapper
            └─ PdfScrollArea           — @tanstack/react-virtual list
                 └─ PdfPage × n        — react-pdf <Page/> + highlight overlays

Virtual scrolling

PdfScrollArea uses useVirtualizer to render only the pages visible inside the scroll container plus overscan: 3 pages on each side. Page height is estimated from the first page's viewport metadata, then refined by measureElement as pages are rendered. A ResizeObserver re-measures on container width change (debounced 250 ms).

Scroll-to-page is wired via scrollToPageRef — a mutable ref populated by PdfScrollArea and consumed by usePdfViewer navigation methods and search navigation. This avoids prop-drilling a callback through the virtualizer boundary.

Current page is derived from the scroll position (midpoint detection) rather than set imperatively, so it stays accurate during freeform scrolling.

Text search

createSearchController() in features/pdf/lib/text-search.ts returns a stateful controller with three methods: search(), cancel(), and clearCache().

Pages are searched in batches of 10 (SEARCH_CONFIG.batchSize).
Between batches the controller yields to the main thread via requestIdleCallback (or setTimeout(0) fallback).
Extracted text positions are cached in a Map<pageIndex, TextPosition[]> so repeated queries on the same document skip PDF.js API calls.
Each search() call issues a new AbortController, automatically cancelling the previous search.
Progress highlights are reported after each batch that yields new matches, making results appear incrementally for large documents.

extractTextPositions() converts PDF.js transform matrices (PDF coordinate space, origin bottom-left) to normalized top-left CSS coordinates.

Annotations (block layer)

useAnnotations in the block component manages highlight annotations as regular platform entities:

Read: GET /api/entities?typeSlug=&relatedTo= fetches annotation entities for the current document entity, cached by React Query (60 s stale time).
Create: POST /api/entities creates an annotation entity with content: { page_number, coordinates, selected_text, color, annotation_type }. Page numbers are stored 1-based and converted to 0-based only at render time.
Delete: DELETE /api/entities/:id removes the annotation entity; the query is invalidated on success.

toHighlights() and fromSelectionContext() convert between the entity content shape and PdfHighlight.

AI integration

The block dispatches window.CustomEvent("amble:send-to-ai", { detail }) for three actions:

Action	Event `type`	Payload extras
Send page	`pdf-page`	`documentId`, `documentTitle`, `page`
Send selection	`pdf-selection`	`documentId`, `documentTitle`, `text`, `page`
Extract insights	`pdf-extract`	`documentId`, `documentTitle`

The chat dock listens for this event. Neither the viewer nor the block imports from features/chat/, keeping the coupling one-directional.

Worker initialization

PdfInitializer (mounted by PdfProvider) configures pdfjs.GlobalWorkerOptions.workerSrc to /pdf.worker.min.mjs from /public/ on first mount, avoiding external CDN requests. It also suppresses expected cancellation errors (AbortException, TextLayer task cancelled, etc.) from the console and the global unhandledrejection handler to keep the DevTools noise-free.

API Reference

`<PdfViewer>`

import { PdfViewer } from "@/features/pdf"

<PdfViewer
  fileUrl={signedUrl}
  initialPage={0}
  initialZoom={100}
  fitOnLoad="page"           // "page" | "width" | "none"
  enableSearch={true}
  enableTextSelection={true}
  enableAnnotationLayer={true}
  showToolbar={true}
  showThumbnails={false}
  selectionActions={[...]}   // SelectionAction[]
  highlights={[...]}         // PdfHighlight[] — merged with search matches
  onPageChange={(page) => {}}
  onDocumentLoad={(numPages, doc) => {}}
  onError={(err) => {}}
/>

Always wrap in <PdfProvider> and <PdfErrorBoundary>. Lazy-load via next/dynamic({ ssr: false }) — the module imports react-pdf which is ~400 KB and browser-only.

`usePdfViewer(options)`

Core hook. Manages all viewer state. Consumed internally by PdfViewer but exported for custom viewer implementations.

Options:

Param	Type	Default
`initialPage`	`number`	`0`
`initialZoom`	`number`	`100`
`fitToPageOnLoad`	`boolean`	`true`
`onPageChange`	`(page: number) => void`	—
`onZoomChange`	`(zoom: number) => void`	—
`onDocumentLoad`	`(doc: PdfDocumentProxy) => void`	—
`onError`	`(err: Error) => void`	—

Key return values:

Name	Type	Description
`document`	`PdfDocumentProxy \| null`	Loaded PDF.js document proxy
`numPages`	`number`	Total page count
`currentPage`	`number`	0-based current page
`zoom`	`number`	Current zoom %
`highlights`	`PdfHighlight[]`	Search result highlights
`scrollToPageRef`	`MutableRefObject<(i: number) => void \| null>`	Wire to virtualizer for programmatic scroll
`goToPage(page)`	`(page: number) => void`	Navigate to 0-based page
`search(query)`	`(q: string) => void`	Run full-text search (async, cancellable)
`fitToWidth()`	`() => void`	Fit zoom to container width
`fitToPage()`	`() => void`	Fit zoom to full page in container
`download(url, filename?)`	—	Trigger file download

`createSearchController()`

import { createSearchController } from "@/features/pdf/lib/text-search"

const controller = createSearchController()

// Returns Promise<PdfHighlight[]>. Calls onProgress after each batch.
await controller.search(doc, numPages, query, onProgress)

controller.cancel()      // abort current search
controller.clearCache()  // clear text position cache (e.g., on document change)

`useAnnotations(options)`

import { useAnnotations } from "@/features/blocks/components/pdf-viewer/use-annotations"

const { highlights, createAnnotation, deleteAnnotation, isLoading } = useAnnotations({
  entityTypeSlug: "annotation",          // entity type slug for annotation records
  documentEntityId: doc.document_entity_id,
  enabled: true,
})

// Creates annotation entity from a SelectionContext
createAnnotation(selectionContext, "yellow")

// Deletes by entity ID or "annotation-{entityId}" prefixed ID
deleteAnnotation("annotation-uuid")

Constants

import {
  ZOOM_CONFIG,       // { default: 100, min: 25, max: 500, step: 25 }
  SEARCH_CONFIG,     // { minQueryLength: 2, debounceDelay: 300, batchSize: 10, ... }
  PERFORMANCE_CONFIG,// { overscan: 3, resizeDebounce: 250, pageGap: 16, ... }
  ANNOTATION_COLORS, // { yellow, blue, green, pink, orange } → rgba strings
} from "@/features/pdf"

For Agents

The PDF viewer is a read-only UI component — there are no AI tools in this module directly. Agents interact with PDF content through the surrounding entity system and documents module.

How agents access PDF content:

Use getEntity to retrieve a document entity and its signed_url field for direct access.
Use searchEntities with the annotation entity type slug to find existing highlights on a document.
Use createEntity with the annotation entity type to programmatically create highlight annotations.

Custom event integration:

The block fires amble:send-to-ai custom events when users click AI action bar buttons. The chat agent receives the document context (title, page number, selected text) as part of the user message. Agents should handle these payloads to provide targeted document analysis.

Design Decisions

Normalized highlight coordinates. Highlights use 0-1 values relative to the page viewport rather than pixel values. This means the same PdfHighlight renders correctly at any zoom level without recalculation.

Annotations as entities. Rather than a dedicated pdf_annotations table, annotations are stored as regular platform entities related to the document entity. This keeps the schema domain-agnostic — any product can configure a different entity type for annotations — and lets agents query and create annotations through the standard entity API.

Lazy-loading PdfViewer. next/dynamic({ ssr: false }) is used at the block boundary. PDF.js includes a large WASM worker that cannot run in Node.js. Lazy-loading prevents it from appearing in the server bundle and avoids a Next.js build error.

scrollToPageRef over prop callbacks. The virtualizer's scrollToIndex must be called on the virtualizer instance, which lives inside PdfScrollArea. Rather than lifting the virtualizer out or prop-drilling a callback through multiple layers, a mutable ref is shared. usePdfViewer populates scrollToPageRef as a hook return value; PdfScrollArea writes the actual implementation into it after the virtualizer is ready.

Current page derived from scroll. Programmatic navigation sets currentPage immediately (optimistic), but freeform scrolling updates it via a 50 ms debounced midpoint detection loop. This avoids fighting the virtualizer's scroll state and keeps the toolbar page indicator accurate during keyboard or mouse scrolling.

Cancellable batch search. Long documents (200+ pages) would freeze the UI if searched synchronously. Batching with requestIdleCallback yields between batches, and AbortController ensures a new keystroke immediately stops the previous search rather than letting stale results race in.

Document Processing — upload, chunking, signed URL generation; DocumentRecord type consumed by the block layer
Block System — ResolvedBlock type, how block data is resolved server-side before rendering
Entity System — annotation entities are stored and queried through the standard entity API
Chat — receives amble:send-to-ai custom events from the AI action bar

On this page