Documentation source
External Agent Management Suite
Follow-up tasks and improvements for full external agent support — OpenClaw, Claude Managed Agents, Hermes, A2A, and future providers.
## Context
PR #648 (`feat/managed-agent-providers`) shipped the SprinterAgent abstraction with three providers: Local, Claude Managed, and OpenClaw. It also added incremental streaming for managed agents and vault credential sync for MCP tool portability. This spec covers the follow-up work needed to build out the full external agent management suite.
### What Shipped (PR #648)
- `SprinterAgent` interface with `execute()` returning normalized `SprinterExecutionResult`
- `ClaudeManagedSprinterAgent` — full bridge to Anthropic's managed agents API (beta)
- `OpenClawSprinterAgent` — OpenAI-compatible agent execution via AI SDK
- `LocalSprinterAgent` — wraps AI SDK with HITL tool approval support
- Streaming callbacks (`onTextDelta`, `onToolCall`, `onToolResult`) on `SprinterExecutionContext`
- Chat route incremental streaming for managed agents
- Vault credential sync — tenant MCP connections auto-synced to Anthropic vaults
- Connection CRUD, testing, discovery endpoints
- Shared `loadConnection()` helper, comprehensive event mapping
---
## P0 — Immediate Follow-ups (from /simplify review)
These are concrete code improvements identified during review that should land before the next feature push.
### 1. Vault sync performance
**Problem:** `syncMcpCredentialsToVault` runs on every managed agent execution with no caching. For a tenant with 5 MCP connections, that's 6+ Anthropic API calls (list + creates/updates) blocking the response before streaming starts.
**Fix:** Add a short-lived per-tenant cache (module-level `Map<string, { vaultIds: string[]; expiresAt: number }>` with ~5 minute TTL). Invalidate on connection create/update/delete. This eliminates redundant syncs across repeated chat turns while still picking up fresh credentials within minutes.
**Files:** `features/agents/providers/claude-managed/vault-sync.ts`, `features/agents/providers/claude-managed/adapter.ts`
### 2. Deduplicate MCP connection query
**Problem:** `vault-sync.ts` re-queries `agent_connections` for active MCP connections — the same query that `features/mcp/resolve-configs.ts` runs with `unstable_cache`. The vault-sync version bypasses the cache.
**Fix:** Either accept the pre-loaded connection list as a parameter, or extract a shared cached `getActiveMcpConnections(tenantId)` helper that both consumers use.
**Files:** `features/agents/providers/claude-managed/vault-sync.ts`, `features/mcp/resolve-configs.ts`
### 3. Extract bearer token helper from resolve-headers
**Problem:** `extractBearerToken()` in vault-sync.ts re-implements the `token > apiKey > basicAuth` priority logic from `lib/connections/resolve-headers.ts`. Different output shapes (raw token vs HTTP header with `Bearer ` prefix) prevented direct reuse.
**Fix:** Export a lower-level `extractBearerCredential(credentials): string | null` from `resolve-headers.ts` that returns the raw token. Have both `resolveConnectionAuthHeaders` and `vault-sync` call it. Also usable by `openclaw/adapter.ts` which has the same `raw.token ?? raw.apiKey` pattern.
**Files:** `lib/connections/resolve-headers.ts`, `features/agents/providers/claude-managed/vault-sync.ts`, `features/agents/providers/openclaw/adapter.ts`
### 4. Eliminate redundant `loadConnectionType` DB call
**Problem:** Chat route makes a separate `SELECT connection_type FROM agent_connections` call at line 93. The adapter then makes another full `SELECT *` inside `execute()`. Two DB round-trips for the same connection.
**Fix:** Include `connection_type` in the `ResolvedAgent` interface, populated during agent resolution (where the connection ID is already known). This lets the chat route skip the pre-flight query entirely.
**Files:** `features/agents/agent-resolver.ts`, `features/agents/providers/resolve.ts`, `app/api/chat/route.ts`
### 5. Remove redundant event collection in service.ts
**Problem:** `runManagedSession` builds its own `allEvents[]` array AND calls `params.onEvent()` for each event. The adapter uses `onEvent` to collect events, never reads `result.events`. Every event is stored twice in memory.
**Fix:** Remove `allEvents` from `runManagedSession`. Keep `events` in the return type for consumers that don't use the callback pattern, but populate it from `onEvent` instead of maintaining a parallel array.
**Files:** `features/agents/providers/claude-managed/service.ts`
### 6. Internalize `eventId` in onCustomToolCall callback
**Problem:** The `onCustomToolCall` callback signature includes `eventId` (an Anthropic stream event ID) — an internal protocol detail that leaks through the abstraction. The adapter ignores it.
**Fix:** Remove `eventId` from the `onCustomToolCall` signature. The `executeCustomToolRelay` helper already has access to `eventId` from the stream event — pass it directly there.
**Files:** `features/agents/providers/claude-managed/service.ts`
---
## P1 — Provider Expansion
### 7. A2A (Agent-to-Agent) protocol support
**What exists:** Connection type `a2a` is defined, UI supports creating A2A connections with agent card URL, `testConnection` probes `/.well-known/agent-card.json`. But `resolveSprinterAgent()` throws "not yet supported."
**What to build:**
- `features/agents/providers/a2a/adapter.ts` — `A2ASprinterAgent` implementing the A2A protocol (task creation → polling → result)
- A2A session management (A2A tasks are async — need polling or webhook)
- Map A2A task statuses to `SprinterExecutionResult.status`
- Map A2A artifacts to Sprinter events
- Streaming: A2A doesn't natively stream; use polling with `onTextDelta` updates as available
**Key design question:** A2A is inherently async (create task, poll for completion). The `SprinterAgent.execute()` interface assumes a single awaitable call. Options:
- (a) Poll internally within `execute()`, calling `onTextDelta` with updates
- (b) Return `status: 'pending'` and add a `resume` pattern for async providers
- (c) Treat A2A as background-only (no chat streaming)
### 8. MCP gateway agents
**What exists:** Connection type `mcp` is defined and used for MCP server connections (tool providers). But MCP servers can also BE agents (via `sampling` capability).
**What to build:**
- `features/agents/providers/mcp-agent/adapter.ts` — `McpAgentSprinterAgent` that uses MCP sampling to execute agent-like workflows
- Distinguish MCP-as-tool-provider from MCP-as-agent in the connection config
- Reuse existing MCP connection infrastructure for auth/transport
### 9. Hermes agent protocol
**What exists:** Nothing yet. Hermes is a newer agent protocol that may emerge as a standard.
**What to build:**
- Research Hermes protocol spec when available
- `features/agents/providers/hermes/adapter.ts` — adapter implementation
- Connection type addition (or reuse `api` type with protocol detection)
- Map Hermes message format to `SprinterExecutionResult`
### 10. Provider-agnostic agent discovery
**What exists:** `/api/agent-connections/[id]/discover` endpoint with basic implementation for OpenClaw. Claude Managed lists agents via `listManagedAgents()`.
**What to build:**
- Unified discovery interface: `discoverAgents(connectionId): ExternalAgent[]`
- Per-provider discovery adapters:
- OpenClaw: List available models/agents from `/v1/models`
- Claude Managed: `listManagedAgents()` (exists)
- A2A: Parse agent card capabilities
- MCP: Query sampling capability
- UI: Agent discovery dialog that shows available external agents and lets users link them
---
## P1 — Execution & Reliability
### 11. Tool approval (HITL) for managed agents
**What exists:** `LocalSprinterAgent` has full HITL support — pauses execution with `pausedState`, returns `awaiting_tool` status, resumes with approval/denial. Managed agents stream directly with no approval gate.
**What to build:**
- Extend `ClaudeManagedSprinterAgent` to check if a tool is gated (`always_ask`)
- When a gated tool is invoked, pause the managed session (don't send result)
- Return `awaiting_tool` with `pausedState` containing the Anthropic session ID + tool event
- On resume, send the tool result (or denial) back to the managed session
- Challenge: Anthropic sessions may time out while waiting for approval
### 12. Resume support for external providers
**What exists:** `SprinterExecutionContext.resume` field with `pausedState`, `decision`, `denialReason`. Only `LocalSprinterAgent` implements it.
**What to build:**
- `ClaudeManagedSprinterAgent.execute()` checks `context.resume`:
- Reconnect to the paused Anthropic session
- Send the tool result or denial
- Continue streaming
- `OpenClawSprinterAgent.execute()` checks `context.resume`:
- Re-inject the tool result into the message history
- Call `generateText` with the extended history
### 13. Background execution for external agents
**What exists:** Heartbeat config on agents, `shouldRunNow()` cron check. Session executor dispatches tasks. But external agents are only used in chat.
**What to build:**
- Wire session executor to use `resolveSprinterAgent()` for agents with connections
- Background execution: no streaming callbacks needed (plan doc already notes "for background jobs we don't care")
- Map execution result to session status + events
- Cost tracking: external provider token usage → cost records
- Timeout handling: external agents may take longer than local
### 14. Error recovery and retries
**What exists:** Basic error handling — catch, log event, return `status: 'failed'`.
**What to build:**
- Transient error detection (rate limits, timeouts, 5xx responses)
- Configurable retry policy per connection (max retries, backoff)
- Circuit breaker per connection (after N failures, mark `status: 'error'`, auto-recover after cooldown)
- Dead letter queue for failed tool relay results
---
## P2 — Admin & Observability
### 15. Connection health monitoring
**What exists:** Manual "Test" button in admin UI. `last_health_check` and `last_error` fields on connections.
**What to build:**
- Scheduled health checks via Inngest cron (every 5 min for active connections)
- Connection status dashboard in admin with uptime history
- Alert on connection degradation (Sentry, webhook)
- Auto-disable connections after sustained failures
### 16. External agent cost tracking
**What exists:** `AccumulatedUsage` from Anthropic events (input/output tokens). `recordAnalyticsEvent("chat_started")` fires for all chats. Runtime telemetry records model + tokens.
**What to build:**
- Per-provider cost model (Anthropic pricing differs from OpenClaw, A2A may have no token concept)
- External agent cost caps per tenant (like existing `ai_limits`)
- Cost attribution: which connection, which agent, which user
- Dashboard widget for external agent spend
### 17. Provider-specific admin UI
**What exists:** Generic connection dialog with conditional fields per type. Managed agent fields (agent ID, environment ID, vault IDs).
**What to build:**
- Claude Managed: Sync status indicator, vault credential list, agent version history
- OpenClaw: Model selector from discovery, capability display
- A2A: Agent card viewer, supported skills display
- Per-provider config editors with validation
### 18. Audit trail for external executions
**What exists:** Session events log tool calls, results, errors. `ExternalSessionRef` stores provider session ID.
**What to build:**
- Admin view: "External Sessions" tab showing all managed/openclaw/a2a sessions
- Link from session → external provider's session (if provider has a UI)
- Replay capability: view the full event stream for debugging
- Diff view: compare tool relay inputs/outputs between local and external execution
---
## P3 — Advanced Capabilities
### 19. Multi-provider agent orchestration
Enable a single Sprinter agent to delegate to multiple external providers based on task type. E.g., use Claude Managed for reasoning-heavy tasks, OpenClaw for tool-heavy tasks, A2A for specialized domain agents.
### 20. External agent as tool
Expose external agents as tools that local agents can invoke via `createDelegateToAgentTool()`. Currently delegation goes through the Sprinter agent resolution system. Extend to support external agents as delegation targets.
### 21. Credential rotation
Auto-rotate API keys for external connections. Store rotation schedule, generate new keys, verify they work, swap atomically. Particularly important for vault credentials that may expire.
### 22. Provider capability negotiation
Before execution, query the provider's capabilities (streaming support, tool types, max context, supported modalities) and adapt the execution strategy accordingly. E.g., if a provider doesn't support streaming, fall back to polling. If it doesn't support tools, skip tool injection.
### 23. External agent testing framework
Automated test suite that exercises each provider adapter against a mock server. Test matrix: streaming, tool relay, error handling, resumption, timeout, rate limiting. Run as part of CI with provider-specific mock fixtures.