Agent Harness for Multi-Source Financial Intelligence

A practical account of designing a quality loop, token budget strategy, and multi-data-source prompt architecture for long-running LangGraph agents on AWS Bedrock.

Anthropic recently published "Harness Primitives for Long-Running Claude Agents", a repo describing three quality-loop primitives: a default-FAIL contract (structural done-criteria), a fresh-context evaluator (separate grading agent), and an agent-maintained handoff. It is an excellent pattern reference — but it is designed for Claude Code offline coding tasks. Shell hooks, file-read evidence gates, and git checkpoints don’t translate to a live HTTP API serving financial data to enterprise users.

What follows is how we solved the same underlying problems — robust done-criteria, token efficiency, multi-session continuity, and multi-data-source context — in a production system called MAO (Multi-Agent Orchestrator). MAO serves financial queries across NetSuite, Salesforce, and ServiceHub over a real-time SSE streaming API built on FastAPI, LangGraph, and AWS Bedrock.

The Problem Space

Enterprise financial workflows have properties that make naive agentic loops expensive and fragile:

Multiple data sources: NetSuite (ERP), Salesforce (CRM), and ServiceHub each have different query APIs, schema conventions, and authentication flows.
Large tool outputs: A single SuiteQL query can return thousands of rows. Naively passing all of that back to the LLM blows context budgets.
Write workflows need approval gates: Creating or updating ERP records requires human review before execution. The agent cannot simply fire the write tool.
Session continuity: A user may ask five follow-up questions in a conversation. Each question must carry forward entity context without re-injecting the full history every time.
Token cost is real: A Sonnet call for every token of UI formatting is expensive. Reasoning and formatting have very different capability requirements.

Architecture Overview

The entry point is a FastAPI router. A request carries a client_type field (ns, sf, sh) alongside the user query. This single field drives the entire downstream branching: which skills load, which MCP backend connects, which few-shot bank retrieves from. A JWT middleware and rate-limiter sit in front. Everything downstream is async and streams SSE events back to the caller.

Primitive 1: Skill-Based System Prompt Composition

Each agent role has a directory of SKILL.md files. The result is concatenated into a single system message. No skill is ever repeated across a request. This is fundamentally different from sending a monolithic prompt.

def _load_skills(self, skill_paths: List[str]) -> str:
    sections: list[str] = []
    for sp in skill_paths:
        skill_md = Path(sp) / "SKILL.md"
        if skill_md.exists():
            content = skill_md.read_text(encoding="utf-8").strip()
            sections.append(content)
    return "\n\n---\n\n".join(sections)

Primitive 2: Structural Tool Budget Enforcement

The tool call count is tracked in the graph state on every agent node invocation and checked structurally before any tool executes.

def _route(self, state: AgentState) -> str:
    last = state["messages"][-1]
    has_calls = hasattr(last, "tool_calls") and last.tool_calls
    tc_count = state.get("tool_call_count", 0)
    over_limit = tc_count >= self.max_tool_calls
    
    if over_limit:
        logger.warning("Tool call limit reached — ending loop")
        return END
        
    if not has_calls:
        return END
    return "tools"

Primitive 4: Two-Phase Reasoning/Formatting Pipeline

This is our most impactful token reduction. Reasoning and formatting have very different capability requirements. Sonnet handles multi-step reasoning and tool orchestration. Haiku handles formatting. Offloading this to Haiku cuts that portion’s cost by ~80%.

Conclusion

The hardest part isn’t implementing any one primitive. It’s keeping them composable: skills are files, context is variables, state is a DynamoDB row, model choice is a config flag. None of these need to know about the others. That composability is what makes the harness extensible to new data sources without rewriting the agent core.