External Agent Management Suite
Follow-up tasks and improvements for full external agent support — OpenClaw, Claude Managed Agents, Hermes, A2A, and future providers.
Context
PR #648 (feat/managed-agent-providers) shipped the SprinterAgent abstraction with three providers: Local, Claude Managed, and OpenClaw. It also added incremental streaming for managed agents and vault credential sync for MCP tool portability. This spec covers the follow-up work needed to build out the full external agent management suite.
What Shipped (PR #648)
SprinterAgentinterface withexecute()returning normalizedSprinterExecutionResultClaudeManagedSprinterAgent— full bridge to Anthropic's managed agents API (beta)OpenClawSprinterAgent— OpenAI-compatible agent execution via AI SDKLocalSprinterAgent— wraps AI SDK with HITL tool approval support- Streaming callbacks (
onTextDelta,onToolCall,onToolResult) onSprinterExecutionContext - Chat route incremental streaming for managed agents
- Vault credential sync — tenant MCP connections auto-synced to Anthropic vaults
- Connection CRUD, testing, discovery endpoints
- Shared
loadConnection()helper, comprehensive event mapping
P0 — Immediate Follow-ups (from /simplify review)
These are concrete code improvements identified during review that should land before the next feature push.
1. Vault sync performance
Problem: syncMcpCredentialsToVault runs on every managed agent execution with no caching. For a tenant with 5 MCP connections, that's 6+ Anthropic API calls (list + creates/updates) blocking the response before streaming starts.
Fix: Add a short-lived per-tenant cache (module-level Map<string, { vaultIds: string[]; expiresAt: number }> with ~5 minute TTL). Invalidate on connection create/update/delete. This eliminates redundant syncs across repeated chat turns while still picking up fresh credentials within minutes.
Files: features/agents/providers/claude-managed/vault-sync.ts, features/agents/providers/claude-managed/adapter.ts
2. Deduplicate MCP connection query
Problem: vault-sync.ts re-queries agent_connections for active MCP connections — the same query that features/mcp/resolve-configs.ts runs with unstable_cache. The vault-sync version bypasses the cache.
Fix: Either accept the pre-loaded connection list as a parameter, or extract a shared cached getActiveMcpConnections(tenantId) helper that both consumers use.
Files: features/agents/providers/claude-managed/vault-sync.ts, features/mcp/resolve-configs.ts
3. Extract bearer token helper from resolve-headers
Problem: extractBearerToken() in vault-sync.ts re-implements the token > apiKey > basicAuth priority logic from lib/connections/resolve-headers.ts. Different output shapes (raw token vs HTTP header with Bearer prefix) prevented direct reuse.
Fix: Export a lower-level extractBearerCredential(credentials): string | null from resolve-headers.ts that returns the raw token. Have both resolveConnectionAuthHeaders and vault-sync call it. Also usable by openclaw/adapter.ts which has the same raw.token ?? raw.apiKey pattern.
Files: lib/connections/resolve-headers.ts, features/agents/providers/claude-managed/vault-sync.ts, features/agents/providers/openclaw/adapter.ts
4. Eliminate redundant loadConnectionType DB call
Problem: Chat route makes a separate SELECT connection_type FROM agent_connections call at line 93. The adapter then makes another full SELECT * inside execute(). Two DB round-trips for the same connection.
Fix: Include connection_type in the ResolvedAgent interface, populated during agent resolution (where the connection ID is already known). This lets the chat route skip the pre-flight query entirely.
Files: features/agents/agent-resolver.ts, features/agents/providers/resolve.ts, app/api/chat/route.ts
5. Remove redundant event collection in service.ts
Problem: runManagedSession builds its own allEvents[] array AND calls params.onEvent() for each event. The adapter uses onEvent to collect events, never reads result.events. Every event is stored twice in memory.
Fix: Remove allEvents from runManagedSession. Keep events in the return type for consumers that don't use the callback pattern, but populate it from onEvent instead of maintaining a parallel array.
Files: features/agents/providers/claude-managed/service.ts
6. Internalize eventId in onCustomToolCall callback
Problem: The onCustomToolCall callback signature includes eventId (an Anthropic stream event ID) — an internal protocol detail that leaks through the abstraction. The adapter ignores it.
Fix: Remove eventId from the onCustomToolCall signature. The executeCustomToolRelay helper already has access to eventId from the stream event — pass it directly there.
Files: features/agents/providers/claude-managed/service.ts
P1 — Provider Expansion
7. A2A (Agent-to-Agent) protocol support
What exists: Connection type a2a is defined, UI supports creating A2A connections with agent card URL, testConnection probes /.well-known/agent-card.json. But resolveSprinterAgent() throws "not yet supported."
What to build:
features/agents/providers/a2a/adapter.ts—A2ASprinterAgentimplementing the A2A protocol (task creation → polling → result)- A2A session management (A2A tasks are async — need polling or webhook)
- Map A2A task statuses to
SprinterExecutionResult.status - Map A2A artifacts to Sprinter events
- Streaming: A2A doesn't natively stream; use polling with
onTextDeltaupdates as available
Key design question: A2A is inherently async (create task, poll for completion). The SprinterAgent.execute() interface assumes a single awaitable call. Options:
- (a) Poll internally within
execute(), callingonTextDeltawith updates - (b) Return
status: 'pending'and add aresumepattern for async providers - (c) Treat A2A as background-only (no chat streaming)
8. MCP gateway agents
What exists: Connection type mcp is defined and used for MCP server connections (tool providers). But MCP servers can also BE agents (via sampling capability).
What to build:
features/agents/providers/mcp-agent/adapter.ts—McpAgentSprinterAgentthat uses MCP sampling to execute agent-like workflows- Distinguish MCP-as-tool-provider from MCP-as-agent in the connection config
- Reuse existing MCP connection infrastructure for auth/transport
9. Hermes agent protocol
What exists: Nothing yet. Hermes is a newer agent protocol that may emerge as a standard.
What to build:
- Research Hermes protocol spec when available
features/agents/providers/hermes/adapter.ts— adapter implementation- Connection type addition (or reuse
apitype with protocol detection) - Map Hermes message format to
SprinterExecutionResult
10. Provider-agnostic agent discovery
What exists: /api/agent-connections/[id]/discover endpoint with basic implementation for OpenClaw. Claude Managed lists agents via listManagedAgents().
What to build:
- Unified discovery interface:
discoverAgents(connectionId): ExternalAgent[] - Per-provider discovery adapters:
- OpenClaw: List available models/agents from
/v1/models - Claude Managed:
listManagedAgents()(exists) - A2A: Parse agent card capabilities
- MCP: Query sampling capability
- OpenClaw: List available models/agents from
- UI: Agent discovery dialog that shows available external agents and lets users link them
P1 — Execution & Reliability
11. Tool approval (HITL) for managed agents
What exists: LocalSprinterAgent has full HITL support — pauses execution with pausedState, returns awaiting_tool status, resumes with approval/denial. Managed agents stream directly with no approval gate.
What to build:
- Extend
ClaudeManagedSprinterAgentto check if a tool is gated (always_ask) - When a gated tool is invoked, pause the managed session (don't send result)
- Return
awaiting_toolwithpausedStatecontaining the Anthropic session ID + tool event - On resume, send the tool result (or denial) back to the managed session
- Challenge: Anthropic sessions may time out while waiting for approval
12. Resume support for external providers
What exists: SprinterExecutionContext.resume field with pausedState, decision, denialReason. Only LocalSprinterAgent implements it.
What to build:
ClaudeManagedSprinterAgent.execute()checkscontext.resume:- Reconnect to the paused Anthropic session
- Send the tool result or denial
- Continue streaming
OpenClawSprinterAgent.execute()checkscontext.resume:- Re-inject the tool result into the message history
- Call
generateTextwith the extended history
13. Background execution for external agents
What exists: Heartbeat config on agents, shouldRunNow() cron check. Session executor dispatches tasks. But external agents are only used in chat.
What to build:
- Wire session executor to use
resolveSprinterAgent()for agents with connections - Background execution: no streaming callbacks needed (plan doc already notes "for background jobs we don't care")
- Map execution result to session status + events
- Cost tracking: external provider token usage → cost records
- Timeout handling: external agents may take longer than local
14. Error recovery and retries
What exists: Basic error handling — catch, log event, return status: 'failed'.
What to build:
- Transient error detection (rate limits, timeouts, 5xx responses)
- Configurable retry policy per connection (max retries, backoff)
- Circuit breaker per connection (after N failures, mark
status: 'error', auto-recover after cooldown) - Dead letter queue for failed tool relay results
P2 — Admin & Observability
15. Connection health monitoring
What exists: Manual "Test" button in admin UI. last_health_check and last_error fields on connections.
What to build:
- Scheduled health checks via Inngest cron (every 5 min for active connections)
- Connection status dashboard in admin with uptime history
- Alert on connection degradation (Sentry, webhook)
- Auto-disable connections after sustained failures
16. External agent cost tracking
What exists: AccumulatedUsage from Anthropic events (input/output tokens). recordAnalyticsEvent("chat_started") fires for all chats. Runtime telemetry records model + tokens.
What to build:
- Per-provider cost model (Anthropic pricing differs from OpenClaw, A2A may have no token concept)
- External agent cost caps per tenant (like existing
ai_limits) - Cost attribution: which connection, which agent, which user
- Dashboard widget for external agent spend
17. Provider-specific admin UI
What exists: Generic connection dialog with conditional fields per type. Managed agent fields (agent ID, environment ID, vault IDs).
What to build:
- Claude Managed: Sync status indicator, vault credential list, agent version history
- OpenClaw: Model selector from discovery, capability display
- A2A: Agent card viewer, supported skills display
- Per-provider config editors with validation
18. Audit trail for external executions
What exists: Session events log tool calls, results, errors. ExternalSessionRef stores provider session ID.
What to build:
- Admin view: "External Sessions" tab showing all managed/openclaw/a2a sessions
- Link from session → external provider's session (if provider has a UI)
- Replay capability: view the full event stream for debugging
- Diff view: compare tool relay inputs/outputs between local and external execution
P3 — Advanced Capabilities
19. Multi-provider agent orchestration
Enable a single Sprinter agent to delegate to multiple external providers based on task type. E.g., use Claude Managed for reasoning-heavy tasks, OpenClaw for tool-heavy tasks, A2A for specialized domain agents.
20. External agent as tool
Expose external agents as tools that local agents can invoke via createDelegateToAgentTool(). Currently delegation goes through the Sprinter agent resolution system. Extend to support external agents as delegation targets.
21. Credential rotation
Auto-rotate API keys for external connections. Store rotation schedule, generate new keys, verify they work, swap atomically. Particularly important for vault credentials that may expire.
22. Provider capability negotiation
Before execution, query the provider's capabilities (streaming support, tool types, max context, supported modalities) and adapt the execution strategy accordingly. E.g., if a provider doesn't support streaming, fall back to polling. If it doesn't support tools, skip tool injection.
23. External agent testing framework
Automated test suite that exercises each provider adapter against a mock server. Test matrix: streaming, tool relay, error handling, resumption, timeout, rate limiting. Run as part of CI with provider-specific mock fixtures.