External Agent Management Suite

Follow-up tasks and improvements for full external agent support — OpenClaw, Claude Managed Agents, Hermes, A2A, and future providers.

Context

PR #648 (feat/managed-agent-providers) shipped the SprinterAgent abstraction with three providers: Local, Claude Managed, and OpenClaw. It also added incremental streaming for managed agents and vault credential sync for MCP tool portability. This spec covers the follow-up work needed to build out the full external agent management suite.

What Shipped (PR #648)

SprinterAgent interface with execute() returning normalized SprinterExecutionResult
ClaudeManagedSprinterAgent — full bridge to Anthropic's managed agents API (beta)
OpenClawSprinterAgent — OpenAI-compatible agent execution via AI SDK
LocalSprinterAgent — wraps AI SDK with HITL tool approval support
Streaming callbacks (onTextDelta, onToolCall, onToolResult) on SprinterExecutionContext
Chat route incremental streaming for managed agents
Vault credential sync — tenant MCP connections auto-synced to Anthropic vaults
Connection CRUD, testing, discovery endpoints
Shared loadConnection() helper, comprehensive event mapping

P0 — Immediate Follow-ups (from /simplify review)

These are concrete code improvements identified during review that should land before the next feature push.

1. Vault sync performance

Problem: syncMcpCredentialsToVault runs on every managed agent execution with no caching. For a tenant with 5 MCP connections, that's 6+ Anthropic API calls (list + creates/updates) blocking the response before streaming starts.

Fix: Add a short-lived per-tenant cache (module-level Map<string, { vaultIds: string[]; expiresAt: number }> with ~5 minute TTL). Invalidate on connection create/update/delete. This eliminates redundant syncs across repeated chat turns while still picking up fresh credentials within minutes.

Files: features/agents/providers/claude-managed/vault-sync.ts, features/agents/providers/claude-managed/adapter.ts

2. Deduplicate MCP connection query

Problem: vault-sync.ts re-queries agent_connections for active MCP connections — the same query that features/mcp/resolve-configs.ts runs with unstable_cache. The vault-sync version bypasses the cache.

Fix: Either accept the pre-loaded connection list as a parameter, or extract a shared cached getActiveMcpConnections(tenantId) helper that both consumers use.

Files: features/agents/providers/claude-managed/vault-sync.ts, features/mcp/resolve-configs.ts

3. Extract bearer token helper from resolve-headers

Problem: extractBearerToken() in vault-sync.ts re-implements the token > apiKey > basicAuth priority logic from lib/connections/resolve-headers.ts. Different output shapes (raw token vs HTTP header with Bearer prefix) prevented direct reuse.

Fix: Export a lower-level extractBearerCredential(credentials): string | null from resolve-headers.ts that returns the raw token. Have both resolveConnectionAuthHeaders and vault-sync call it. Also usable by openclaw/adapter.ts which has the same raw.token ?? raw.apiKey pattern.

Files: lib/connections/resolve-headers.ts, features/agents/providers/claude-managed/vault-sync.ts, features/agents/providers/openclaw/adapter.ts

4. Eliminate redundant `loadConnectionType` DB call

Problem: Chat route makes a separate SELECT connection_type FROM agent_connections call at line 93. The adapter then makes another full SELECT * inside execute(). Two DB round-trips for the same connection.

Fix: Include connection_type in the ResolvedAgent interface, populated during agent resolution (where the connection ID is already known). This lets the chat route skip the pre-flight query entirely.

Files: features/agents/agent-resolver.ts, features/agents/providers/resolve.ts, app/api/chat/route.ts

5. Remove redundant event collection in service.ts

Problem: runManagedSession builds its own allEvents[] array AND calls params.onEvent() for each event. The adapter uses onEvent to collect events, never reads result.events. Every event is stored twice in memory.

Fix: Remove allEvents from runManagedSession. Keep events in the return type for consumers that don't use the callback pattern, but populate it from onEvent instead of maintaining a parallel array.

Files: features/agents/providers/claude-managed/service.ts

6. Internalize `eventId` in onCustomToolCall callback

Problem: The onCustomToolCall callback signature includes eventId (an Anthropic stream event ID) — an internal protocol detail that leaks through the abstraction. The adapter ignores it.

Fix: Remove eventId from the onCustomToolCall signature. The executeCustomToolRelay helper already has access to eventId from the stream event — pass it directly there.

Files: features/agents/providers/claude-managed/service.ts

P1 — Provider Expansion

7. A2A (Agent-to-Agent) protocol support

What exists: Connection type a2a is defined, UI supports creating A2A connections with agent card URL, testConnection probes /.well-known/agent-card.json. But resolveSprinterAgent() throws "not yet supported."

What to build:

features/agents/providers/a2a/adapter.ts — A2ASprinterAgent implementing the A2A protocol (task creation → polling → result)
A2A session management (A2A tasks are async — need polling or webhook)
Map A2A task statuses to SprinterExecutionResult.status
Map A2A artifacts to Sprinter events
Streaming: A2A doesn't natively stream; use polling with onTextDelta updates as available

Key design question: A2A is inherently async (create task, poll for completion). The SprinterAgent.execute() interface assumes a single awaitable call. Options:

(a) Poll internally within execute(), calling onTextDelta with updates
(b) Return status: 'pending' and add a resume pattern for async providers
(c) Treat A2A as background-only (no chat streaming)

8. MCP gateway agents

What exists: Connection type mcp is defined and used for MCP server connections (tool providers). But MCP servers can also BE agents (via sampling capability).

What to build:

features/agents/providers/mcp-agent/adapter.ts — McpAgentSprinterAgent that uses MCP sampling to execute agent-like workflows
Distinguish MCP-as-tool-provider from MCP-as-agent in the connection config
Reuse existing MCP connection infrastructure for auth/transport

9. Hermes agent protocol

What exists: Nothing yet. Hermes is a newer agent protocol that may emerge as a standard.

What to build:

Research Hermes protocol spec when available
features/agents/providers/hermes/adapter.ts — adapter implementation
Connection type addition (or reuse api type with protocol detection)
Map Hermes message format to SprinterExecutionResult

10. Provider-agnostic agent discovery

What exists: /api/agent-connections/[id]/discover endpoint with basic implementation for OpenClaw. Claude Managed lists agents via listManagedAgents().

What to build:

Unified discovery interface: discoverAgents(connectionId): ExternalAgent[]
Per-provider discovery adapters:
- OpenClaw: List available models/agents from /v1/models
- Claude Managed: listManagedAgents() (exists)
- A2A: Parse agent card capabilities
- MCP: Query sampling capability
UI: Agent discovery dialog that shows available external agents and lets users link them

P1 — Execution & Reliability

11. Tool approval (HITL) for managed agents

What exists: LocalSprinterAgent has full HITL support — pauses execution with pausedState, returns awaiting_tool status, resumes with approval/denial. Managed agents stream directly with no approval gate.

What to build:

Extend ClaudeManagedSprinterAgent to check if a tool is gated (always_ask)
When a gated tool is invoked, pause the managed session (don't send result)
Return awaiting_tool with pausedState containing the Anthropic session ID + tool event
On resume, send the tool result (or denial) back to the managed session
Challenge: Anthropic sessions may time out while waiting for approval

12. Resume support for external providers

What exists: SprinterExecutionContext.resume field with pausedState, decision, denialReason. Only LocalSprinterAgent implements it.

What to build:

ClaudeManagedSprinterAgent.execute() checks context.resume:
- Reconnect to the paused Anthropic session
- Send the tool result or denial
- Continue streaming
OpenClawSprinterAgent.execute() checks context.resume:
- Re-inject the tool result into the message history
- Call generateText with the extended history

13. Background execution for external agents

What exists: Heartbeat config on agents, shouldRunNow() cron check. Session executor dispatches tasks. But external agents are only used in chat.

What to build:

Wire session executor to use resolveSprinterAgent() for agents with connections
Background execution: no streaming callbacks needed (plan doc already notes "for background jobs we don't care")
Map execution result to session status + events
Cost tracking: external provider token usage → cost records
Timeout handling: external agents may take longer than local

14. Error recovery and retries

What exists: Basic error handling — catch, log event, return status: 'failed'.

What to build:

Transient error detection (rate limits, timeouts, 5xx responses)
Configurable retry policy per connection (max retries, backoff)
Circuit breaker per connection (after N failures, mark status: 'error', auto-recover after cooldown)
Dead letter queue for failed tool relay results

P2 — Admin & Observability

15. Connection health monitoring

What exists: Manual "Test" button in admin UI. last_health_check and last_error fields on connections.

What to build:

Scheduled health checks via Inngest cron (every 5 min for active connections)
Connection status dashboard in admin with uptime history
Alert on connection degradation (Sentry, webhook)
Auto-disable connections after sustained failures

16. External agent cost tracking

What exists: AccumulatedUsage from Anthropic events (input/output tokens). recordAnalyticsEvent("chat_started") fires for all chats. Runtime telemetry records model + tokens.

What to build:

Per-provider cost model (Anthropic pricing differs from OpenClaw, A2A may have no token concept)
External agent cost caps per tenant (like existing ai_limits)
Cost attribution: which connection, which agent, which user
Dashboard widget for external agent spend

17. Provider-specific admin UI

What exists: Generic connection dialog with conditional fields per type. Managed agent fields (agent ID, environment ID, vault IDs).

What to build:

Claude Managed: Sync status indicator, vault credential list, agent version history
OpenClaw: Model selector from discovery, capability display
A2A: Agent card viewer, supported skills display
Per-provider config editors with validation

18. Audit trail for external executions

What exists: Session events log tool calls, results, errors. ExternalSessionRef stores provider session ID.

What to build:

Admin view: "External Sessions" tab showing all managed/openclaw/a2a sessions
Link from session → external provider's session (if provider has a UI)
Replay capability: view the full event stream for debugging
Diff view: compare tool relay inputs/outputs between local and external execution

P3 — Advanced Capabilities

19. Multi-provider agent orchestration

Enable a single Sprinter agent to delegate to multiple external providers based on task type. E.g., use Claude Managed for reasoning-heavy tasks, OpenClaw for tool-heavy tasks, A2A for specialized domain agents.

20. External agent as tool

Expose external agents as tools that local agents can invoke via createDelegateToAgentTool(). Currently delegation goes through the Sprinter agent resolution system. Extend to support external agents as delegation targets.

21. Credential rotation

Auto-rotate API keys for external connections. Store rotation schedule, generate new keys, verify they work, swap atomically. Particularly important for vault credentials that may expire.

22. Provider capability negotiation

Before execution, query the provider's capabilities (streaming support, tool types, max context, supported modalities) and adapt the execution strategy accordingly. E.g., if a provider doesn't support streaming, fall back to polling. If it doesn't support tools, skip tool injection.

23. External agent testing framework

Automated test suite that exercises each provider adapter against a mock server. Test matrix: streaming, tool relay, error handling, resumption, timeout, rate limiting. Run as part of CI with provider-specific mock fixtures.

External Agent Management Suite

On this page