Research Library

Evidence library pattern for source cataloging, grading, and claim synthesis — with the DOC'S Research Protocol Builder as the worked example.

Overview

The Research Library pattern gives tenants a structured evidence pipeline: external sources are ingested, graded against a quality rubric, distilled into verifiable claims, and linked to existing domain records (protocols, conditions, guidelines). The pipeline is fully agent-driven with one human-in-the-loop approval gate.

This pattern was first implemented for the DOC'S tenant (recovery and wellness center, Wall Township NJ) to support Nolan's evidence-base work for six treatment modalities. It is built entirely on generic platform affordances — no DOC'S-specific slugs appear in platform code.

What the pattern provides:

A three-tier entity model: source feed → source-item article → evidence-claim assertion
Two criteria sets for grading articles and claims
Server-side aggregate recompute (source count + quality average) that keeps claim grades deterministic
Four agent roles with explicit capability scoping
A task pipeline from URL ingestion through HITL claim approval
Five views covering the full review and gap-analysis workflow

Entity Model

source (feed)
  └── source-item (article / study)
        ├── supports → evidence-claim (synthesized assertion)
        └── about    → modality (the treatment area)

evidence-claim
  ├── about      → modality
  ├── relevant-to → condition   (optional)
  └── justifies  → protocol    (optional — the Protocol Builder bridge)

Entity types

Type	Owner	Key fields
`source`	platform	`source_type`, `url`, `scrape_strategy`, schedule
`source-item`	platform	`title`, `body`, `author`, `published_at`, `source_url`, `ingest_status`
`modality`	tenant	`description`, `device`, `claims_scope`, `contraindication_profile`
`evidence-claim`	tenant	`statement`, `status`, `source_count`, `source_quality_avg`

*Server-computed via Inngest on every response.promoted event for a linked source-item.

Relation types

From	To	Type	Notes
`source-item`	`modality`	`about`	Which modality the article covers
`source-item`	`evidence-claim`	`supports`	Source-item cites / supports a claim
`evidence-claim`	`modality`	`about`	Which modality the claim concerns
`evidence-claim`	`condition`	`relevant-to`	Optional — clinical condition linkage
`evidence-claim`	`protocol`	`justifies`	Optional — Protocol Builder bridge

The source-item → evidence-claim direction (supports) is intentional. It matches how reviewers cite evidence ("this study supports this claim") and aligns with the platform's from → to idiom.

Criteria Set Pattern

`evidence-quality` — grading a source-item

Six dimensions, each scored 0–10 by the Methodology Reviewer agent:

Dimension	What it measures
`study_design`	RCT / systematic review = 10; cohort = 7; case-series = 3; opinion = 0
`sample_size`	n > 500 = 10; n > 100 = 7; n > 30 = 5; n < 30 = 3
`peer_review`	Peer-reviewed journal = 10; conference = 6; preprint = 3; blog = 0
`effect_size`	Large + CI excludes null = 10; moderate = 7; small = 4; none = 0
`methodology_rigor`	Blinded + randomized + controls = 10; observational = 3
`recency`	< 2 years = 10; < 5 = 7; < 10 = 5; < 20 = 3; older = 1

`body-of-evidence-strength` — grading an evidence-claim

Five dimensions. Two are server-pre-computed; three are scored qualitatively by the Evidence Strength Assessor agent:

Dimension	Scored by	What it measures
`source_count`	Server	# of linked source-items (≥ 5 = 10; 3–4 = 7; 2 = 5; 1 = 3)
`source_quality_avg`	Server	Mean evidence-quality promoted score of supporting source-items (0–10)
`consistency`	Agent	Do sources agree on effect direction? Strong = 10; mixed = 5
`clinical_relevance`	Agent	Does the effect translate to this tenant's clinical practice?
`safety_alignment`	Agent	Does the evidence respect safety posture / contraindications?

Why server-computed aggregates? LLMs are unreliable when asked to average scores across a variable-length set of entities. The Inngest function recompute-claim-aggregates-on-response maintains these fields automatically whenever a source-item's evidence-quality response is promoted, so the agent only grades the three qualitative dimensions.

Platform Affordances

These four platform additions ship as part of this pattern and are available to any tenant:

1. `extractUrlToArticle` composer

features/source-sync/server/extract-url.ts — single-URL ingestion chain (Firecrawl → browser-agent → readable-HTML fallback). See Source Sync for the full API.

2. `ingestUrl` agent tool

Registered slug "ingestUrl" in features/tools/source-tools.ts. Takes url, optional sourceSlug, optional linkToEntities[]. Creates a source-item entity and attaches relations by slug. Required permission: entities.team.create.

// Agent call example
await tools.ingestUrl({
  url: "https://pubmed.ncbi.nlm.nih.gov/12345678/",
  linkToEntities: [{ slug: "hbot", relationType: "about" }],
})

3. `getEntity` with `includeResponses`

getEntity accepts an optional includeResponses: boolean flag (default false). When true, the returned entity includes its promoted responses keyed by criteria-set slug. Downstream agents use this to read a source-item's evidence-quality grade without a separate tool call.

// Agent call example
await tools.getEntity({
  entityId: sourceItemId,
  includeResponses: true,
})
// response.responses["evidence-quality"] contains dimension scores

Backed by features/responses/server/get-responses-for-entities.ts.

4. `recompute-claim-aggregates-on-response` Inngest function

features/inngest/functions/recompute-claim-aggregates-on-response.ts — listens for response.promoted. If the entity is a source-item, calls recomputeClaimAggregates for all linked claims.

recomputeClaimAggregates in features/responses/server/recompute-claim-aggregates.ts is generic — callers override supportsRelationType (default "supports") and evidenceQualityCriteriaSlug (default "evidence-quality") to match their domain.

DOC'S Worked Example

Tenant configuration

6 modality records seeded: full-body-cryo, localized-cryo, hbot, normatec, pbm-bed, infrared-sauna
Criteria sets: evidence-quality + body-of-evidence-strength (as above)
Navigation: "Research" group (Modalities, Research Library, Claims Queue, Evidence Gaps) inserted before "Clinical"

Agents

All four agents use explicit customTools lists — no toolGroups. This prevents capability bleed (the Methodology Reviewer must not be able to create entities).

Agent	Tools	Heartbeat
Research Scout	`webSearch`, `fetchUrl`, `ingestUrl`, `searchEntities`, `createRelation`	Optional
Methodology Reviewer	`getEntity` (with `includeResponses`), `submitResponse`	No
Claim Synthesizer	`getEntity`, `searchEntities`, `createEntity`, `createRelation`	No
Evidence Strength Assessor	`getEntity` (with `includeResponses`), `submitResponse`	No

Each agent's system prompt embeds the full scoring rubric verbatim. The Claim Synthesizer prompt includes an explicit deduplication step: before creating a new claim, call searchEntities scoped to the target modality and skip any semantically similar existing claim.

Tasks

Task slug	Agent	Trigger	Output
`research-source-sweep`	Research Scout	manual / cron	entities
`grade-source-item`	Methodology Reviewer	entity_created	response
`ingest-url-to-source-item`	Research Scout	manual	entity
`extract-claims-for-modality`	Claim Synthesizer	manual	entities
`assess-claim-strength`	Evidence Strength Assessor	entity_created	response
`promote-claim-to-approved`	Human (HITL)	manual	status

No double-grading: ingest-url-to-source-item does not enqueue grading. The entity_created trigger on grade-source-item fires automatically when any source-item is created — whether by the manual task or by the capture hook.

Capture → ingestion path: when Nolan pastes a URL into Quick Capture, the platform's capture URL routing hook fires, invoking Research Scout with ingestUrl. A draft source-item appears in the inbox within seconds; the evidence-quality grade follows asynchronously via the grade-source-item task (may take a few minutes for long PDFs).

Views

View	Surface type	Entity	Purpose
Modality Library	grid	`modality`	6 modality cards with source count, claim count, gap flag
Research Library	grid	`source-item`	All articles; columns: title, modality, grade, date, status
Claims Queue	kanban	`evidence-claim`	draft → under-review → approved → rejected workflow
Evidence Gaps	grid	`modality`	Modalities with < 5 approved claims or avg grade < 5
Modality Dossier	page (detail)	`modality`	Per-modality: sources, claims, gaps, linked protocols

`agent_context` workspace setting

The DOC'S tenant agent_context is merged idempotently using sentinel markers ( ... ). Re-running any seed script is safe — it replaces only the content between the markers.

The context block instructs all agents to: prefer peer-reviewed primary sources, be conservative about claim strength, respect DOC'S absolute contraindications, and tie claims to modality parameters DOC'S actually uses (session duration, frequency, device settings).

For Agents

Research Scout — finding and ingesting sources

1. webSearch({ query: "HBOT traumatic brain injury RCT 2020-2025", numResults: 20 })
2. For each promising result URL:
   ingestUrl({ url, linkToEntities: [{ slug: "hbot", relationType: "about" }] })
3. Report: list of created source-item entityIds + titles

Methodology Reviewer — grading a source-item

1. getEntity({ entityId: sourceItemId, includeResponses: false })
   — read title, body, author, published_at, source_url
2. Score 6 dimensions against the evidence-quality rubric embedded in your system prompt
3. submitResponse({
     entityId: sourceItemId,
     criteriaSetSlug: "evidence-quality",
     values: { study_design: 8, sample_size: 7, ... },
     rationale: "..."
   })

Claim Synthesizer — extracting claims from graded sources

1. searchEntities({ typeSlug: "source-item", filters: { modality: "hbot" } })
   — find graded source-items for the target modality
2. For each source-item: getEntity({ entityId, includeResponses: true })
   — read body + evidence-quality scores
3. searchEntities({ typeSlug: "evidence-claim", query: "<proposed claim text>" })
   — dedup check BEFORE creating
4. If no near-duplicate: createEntity({ typeSlug: "evidence-claim", ... })
5. createRelation({ fromEntityId: sourceItemId, toEntityId: claimId, type: "supports" })

Evidence Strength Assessor — grading a claim

1. getEntity({ entityId: claimId, includeResponses: false })
   — read statement, source_count, source_quality_avg (server-computed fields)
2. For each supporting source-item: getEntity({ entityId, includeResponses: true })
   — read evidence-quality grades for consistency assessment
3. submitResponse({
     entityId: claimId,
     criteriaSetSlug: "body-of-evidence-strength",
     values: {
       source_count: <read from entity field, do not recalculate>,
       source_quality_avg: <read from entity field, do not recalculate>,
       consistency: 8,
       clinical_relevance: 7,
       safety_alignment: 9,
     },
     rationale: "..."
   })

Note: source_count and source_quality_avg are read from the entity's content fields — do not recompute them. The server maintains them via Inngest.

Design Decisions

customTools over toolGroups for research agents — toolGroups bundle broad tool sets (e.g., all entity tools). The Methodology Reviewer must not have createEntity; the Claim Synthesizer must not have submitResponse before creating a claim. Explicit per-agent tool lists are the only way to correctly scope capability. Caught and enforced during codex review.

Server-computed aggregates — source_count and source_quality_avg are maintained by Inngest, not computed by the Evidence Strength Assessor. LLMs are unreliable when asked to average a variable-length set of numbers; embedding arithmetic in the agent prompt drifts over model updates. Denormalized fields with a server trigger are deterministic and auditable.

source-item → evidence-claim relation direction — the relation goes from source-item to claim (supports), not from claim to source-item (supported-by). This matches reviewer citation behavior ("this study supports this claim") and the platform's from → to idiom where from is the describing entity and to is the target. The initial spec had it inverted; caught in codex review.

Rubric text duplicated in agent prompts — criteria dimension description fields store the threshold text, but agent prompts embed the full rubric verbatim. This prevents prompt drift if criteria dimensions are edited in the admin UI. The grading agents are the rubric's primary consumers; they should not need a separate getCriteriaSets lookup at inference time.

Claim deduplication is a prompt instruction, not a DB constraint — a unique constraint on claim statement text would prevent legitimate rewording. Instead, the Claim Synthesizer is instructed to searchEntities before creating, and acceptance criterion 5 verifies re-running the task does not create duplicates.

Capture hook fires event, not a direct session — the capture server action must complete quickly. URL ingestion (especially Firecrawl) can take 10–30 seconds. The hook fires an Inngest event (capture.url.routed) so ingestion is decoupled from the capture response time.

Source Sync — extraction primitives, extractUrlToArticle, ingestUrl tool
Entity System — entity types, relations, field schemas
Sessions — task execution, session events
Capture — URL routing hook
Response System — criteria sets, scoring, promotion
Inngest — recompute-claim-aggregates-on-response function

On this page