Agent Harness for Multi-Source Financial Intelligence
A practical account of designing a quality loop, token budget strategy, and multi-data-source prompt architecture for long-running LangGraph agents on AWS Bedrock.
Anthropic recently published "Harness Primitives for Long-Running Claude Agents", a repo describing three quality-loop primitives: a default-FAIL contract (structural done-criteria), a fresh-context evaluator (separate grading agent), and an agent-maintained handoff. It is an excellent pattern reference — but it is designed for Claude Code offline coding tasks. Shell hooks, file-read evidence gates, and git checkpoints don’t translate to a live HTTP API serving financial data to enterprise users.
What follows is how we solved the same underlying problems — robust done-criteria, token efficiency, multi-session continuity, and multi-data-source context — in a production system called MAO (Multi-Agent Orchestrator). MAO serves financial queries across NetSuite, Salesforce, and ServiceHub over a real-time SSE streaming API built on FastAPI, LangGraph, and AWS Bedrock.
The Problem Space
Enterprise financial workflows have properties that make naive agentic loops expensive and fragile:
- Multiple data sources: NetSuite (ERP), Salesforce (CRM), and ServiceHub each have different query APIs, schema conventions, and authentication flows.
- Large tool outputs: A single SuiteQL query can return thousands of rows. Naively passing all of that back to the LLM blows context budgets.
- Write workflows need approval gates: Creating or updating ERP records requires human review before execution. The agent cannot simply fire the write tool.
- Session continuity: A user may ask five follow-up questions in a conversation. Each question must carry forward entity context without re-injecting the full history every time.
- Token cost is real: A Sonnet call for every token of UI formatting is expensive. Reasoning and formatting have very different capability requirements.
Architecture Overview
The entry point is a FastAPI router. A request carries a client_type field (ns, sf, sh) alongside the user query. This single field drives the entire downstream branching: which skills load, which MCP backend connects, which few-shot bank retrieves from. A JWT middleware and rate-limiter sit in front. Everything downstream is async and streams SSE events back to the caller.
Primitive 1: Skill-Based System Prompt Composition
Each agent role has a directory of SKILL.md files. The result is concatenated into a single system message. No skill is ever repeated across a request. This is fundamentally different from sending a monolithic prompt.
def _load_skills(self, skill_paths: List[str]) -> str:
sections: list[str] = []
for sp in skill_paths:
skill_md = Path(sp) / "SKILL.md"
if skill_md.exists():
content = skill_md.read_text(encoding="utf-8").strip()
sections.append(content)
return "\n\n---\n\n".join(sections)
Primitive 2: Structural Tool Budget Enforcement
The tool call count is tracked in the graph state on every agent node invocation and checked structurally before any tool executes.
def _route(self, state: AgentState) -> str:
last = state["messages"][-1]
has_calls = hasattr(last, "tool_calls") and last.tool_calls
tc_count = state.get("tool_call_count", 0)
over_limit = tc_count >= self.max_tool_calls
if over_limit:
logger.warning("Tool call limit reached — ending loop")
return END
if not has_calls:
return END
return "tools"
Primitive 4: Two-Phase Reasoning/Formatting Pipeline
This is our most impactful token reduction. Reasoning and formatting have very different capability requirements. Sonnet handles multi-step reasoning and tool orchestration. Haiku handles formatting. Offloading this to Haiku cuts that portion’s cost by ~80%.
Conclusion
The hardest part isn’t implementing any one primitive. It’s keeping them composable: skills are files, context is variables, state is a DynamoDB row, model choice is a config flag. None of these need to know about the others. That composability is what makes the harness extensible to new data sources without rewriting the agent core.