This article is the direct sequel to Agentic systems with Claude: architecture and theory. There we covered the what and the why. Here we tackle the how.
You'll find two parallel paths:
- Path A: the Claude Agent SDK — Anthropic's official path, opinionated, fast to set up.
- Path B: raw API + MCP — custom orchestration, more control, more pedagogical for understanding the internal mechanics.
Both paths lead to the same practical case: a "market intelligence" system with an orchestrator and 3 collaborating sub-agents. At the end, a comparison table helps you choose based on your context.
0. Common setup for both paths
Before diving in, let's prepare the environment.
Recommended project structure
my-agent-system/
├── .env # ANTHROPIC_API_KEY=sk-ant-...
├── requirements.txt
├── path_a_sdk/
│ ├── orchestrator.py
│ ├── sub_agents.py
│ └── tools/
└── path_b_custom/
├── orchestrator.py
├── mcp_server.py
└── workers.pyInstallation
python -m venv .venv
source .venv/bin/activate # or .venv\Scripts\activate on Windows
# Path A: official SDK
pip install claude-agent-sdk python-dotenv
# Path B: raw API + MCP
pip install anthropic mcp httpx python-dotenvThe .env file
ANTHROPIC_API_KEY=sk-ant-api03-...And loading it in your scripts:
from dotenv import load_dotenv
load_dotenv().env. Add it to .gitignore right after git init.1. Path A — Claude Agent SDK
Why this path
The Claude Agent SDK (formerly Claude Code SDK, renamed in 2025) is the infrastructure Anthropic uses internally for Claude Code, exposed as a library. You inherit their work on nO loop management, automatic context compaction, sub-agent spawning with isolated context, native MCP integration, lifecycle hooks, and per-session USD budget caps.
The SDK is not a pure API library: it bundles a Claude Code CLI binary that it launches as a subprocess. It weighs 270-340 MB per release, something to consider for Docker images and CI pipelines.
Your first agent in 15 lines
# path_a_sdk/first_agent.py
import anyio
from claude_agent_sdk import query, ClaudeAgentOptions
from dotenv import load_dotenv
load_dotenv()
async def main():
options = ClaudeAgentOptions(
system_prompt="You are an assistant that answers precisely.",
max_budget_usd=0.50, # hard cap, mandatory
)
async for message in query(
prompt="What's the difference between a workflow and an agent?",
options=options,
):
print(message)
anyio.run(main)That's it. No manual loop management, no tool call parsing, no context tracking. The SDK does it all.
max_budget_usd is not optional. Without this cap, an agent can loop and burn dozens of dollars in minutes. It's the first parameter to set.Adding tools (built-in MCP)
The SDK accepts tools via the Model Context Protocol (MCP). You can point to an existing MCP server (Asana, GitHub, Slack…) or define your own tools in a few lines.
from claude_agent_sdk import query, ClaudeAgentOptions, tool
from dotenv import load_dotenv
import anyio
load_dotenv()
@tool(
name="get_weather",
description="Retrieves the current weather for a city",
)
async def get_weather(city: str) -> dict:
# Here you'd call a real weather API
return {"city": city, "temp": 18, "conditions": "cloudy"}
async def main():
options = ClaudeAgentOptions(
system_prompt="You are a weather assistant.",
tools=[get_weather],
max_budget_usd=0.50,
)
async for message in query(
prompt="What's the weather in Paris and Lyon?",
options=options,
):
print(message)
anyio.run(main)Claude will call get_weather twice (once per city), aggregate the results, and respond to you.
Spawning sub-agents with isolated context
This is where the SDK shines. To do orchestrator-workers, you declare sub-agents that the main agent can invoke as tools.
# path_a_sdk/orchestrator.py
import anyio
from claude_agent_sdk import query, ClaudeAgentOptions, subagent
from dotenv import load_dotenv
load_dotenv()
# Definition of a specialized sub-agent
researcher = subagent(
name="web_researcher",
description="Performs deep web research on a specific topic and returns a structured summary.",
system_prompt="""You are an expert web researcher.
For each query, perform 3-5 targeted searches.
Return a structured JSON summary with: findings, sources, confidence_level.""",
tools=["web_search"],
max_turns=10,
)
synthesizer = subagent(
name="synthesizer",
description="Takes multiple research summaries and produces a coherent synthesis.",
system_prompt="You are an analyst who cross-references sources to produce neutral syntheses.",
max_turns=5,
)
async def main():
options = ClaudeAgentOptions(
system_prompt="""You are a research orchestrator.
Break the query into independent sub-researches.
Launch multiple web_researcher in parallel when relevant.
Use synthesizer to produce the final report.""",
subagents=[researcher, synthesizer],
max_budget_usd=5.00,
)
async for message in query(
prompt="Do market intelligence on French AI agent observability tools in 2026.",
options=options,
):
print(message)
anyio.run(main)What's happening under the hood:
- The orchestrator reads the request and decides to spawn multiple
web_researcherin parallel (one per angle). - Each
web_researcherruns in its own isolated context with its own token budget. - Condensed summaries flow back to the orchestrator.
- The orchestrator calls
synthesizerwith the summaries to produce the report. - The final report is returned to the user.
Hooks: observability and guardrails
The SDK exposes lifecycle hooks to intercept each step:
async def log_pre_tool(tool_name, tool_input):
print(f"[PRE] {tool_name} called with {tool_input}")
async def log_post_tool(tool_name, tool_result):
print(f"[POST] {tool_name} returned {len(str(tool_result))} chars")
options = ClaudeAgentOptions(
# ... config ...
pre_tool_use=log_pre_tool,
post_tool_use=log_post_tool,
)That's your minimum viable observability. In production you replace print with sends to Datadog, Sentry, or a custom tracing database.
2. Path B — Raw API + MCP
Why this path
The Agent SDK is elegant but hides things from you. If you want to understand precisely what happens in the loop, have total control over orchestration (custom business logic, exotic persistent state), not embed 300 MB of binary in your Docker image, or simply learn, then you code orchestration yourself on the raw API, using MCP as the protocol to expose your tools.
MCP's role in this path
MCP (Model Context Protocol) is an open standard that defines how an agent and a tool communicate. You write an MCP server that exposes capabilities (read a file, query a DB, call an API…), and any MCP-compatible agent can connect to it.
The benefits: your tools become reusable across agents and projects, you can version and test an MCP server independently, you decouple your business logic from the model.
Building a custom MCP server
Here's a minimal MCP server that exposes a web search tool:
# path_b_custom/mcp_server.py
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import Tool, TextContent
import httpx
import asyncio
app = Server("mcp-search")
@app.list_tools()
async def list_tools() -> list[Tool]:
return [
Tool(
name="search_web",
description="Performs a web search and returns the top 5 results.",
inputSchema={
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"},
},
"required": ["query"],
},
)
]
@app.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
if name == "search_web":
async with httpx.AsyncClient() as client:
r = await client.get(
"https://api.tavily.com/search",
params={"query": arguments["query"], "max_results": 5},
)
results = r.json()
return [TextContent(type="text", text=str(results))]
raise ValueError(f"Unknown tool: {name}")
async def main():
async with stdio_server() as (read, write):
await app.run(read, write, app.create_initialization_options())
if __name__ == "__main__":
asyncio.run(main())You launch this server with python mcp_server.py. It communicates via stdin/stdout per the MCP standard.
Coding your own orchestrator
Now the most instructive part: rebuilding the nO loop and the orchestrator-workers pattern by hand.
# path_b_custom/orchestrator.py
import os, json, asyncio
from anthropic import AsyncAnthropic
from dotenv import load_dotenv
load_dotenv()
client = AsyncAnthropic()
MODEL_LEAD = "claude-opus-4-7"
MODEL_WORKER = "claude-sonnet-4-6"
# --- Worker definition ---
async def run_worker(worker_id: str, mission: str, tools: list) -> dict:
"""A worker runs in its own isolated context."""
messages = [{"role": "user", "content": mission}]
iterations = 0
MAX_ITER = 10
while iterations < MAX_ITER:
iterations += 1
response = await client.messages.create(
model=MODEL_WORKER,
max_tokens=4096,
system=f"You are worker {worker_id}. Single mission: {mission}. Be concise.",
tools=tools,
messages=messages,
)
if response.stop_reason == "end_turn":
final = next((b.text for b in response.content if hasattr(b, "text")), "")
return {"worker_id": worker_id, "summary": final, "iterations": iterations}
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = await execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": str(result),
})
messages.append({"role": "user", "content": tool_results})
return {"worker_id": worker_id, "summary": "MAX_ITER reached", "iterations": iterations}
async def execute_tool(name: str, args: dict):
"""Tool dispatcher — here you'd call your MCP server or your functions."""
if name == "search_web":
return {"results": ["..."]} # placeholder
raise ValueError(f"Unknown tool: {name}")
# --- The orchestrator ---
async def orchestrate(user_request: str) -> str:
print(f"🎯 Request: {user_request}\n")
# STEP 1: the lead breaks the request into missions
plan_response = await client.messages.create(
model=MODEL_LEAD,
max_tokens=2048,
system="""You are an orchestrator. Break the user's request into 2-4
INDEPENDENT and PARALLELIZABLE missions. Return strict JSON:
{"missions": [{"id": "w1", "mission": "..."}, ...]}""",
messages=[{"role": "user", "content": user_request}],
)
plan_text = plan_response.content[0].text
plan = json.loads(plan_text[plan_text.find("{"):plan_text.rfind("}")+1])
print(f"📋 Plan: {len(plan['missions'])} missions\n")
# STEP 2: launch workers IN PARALLEL
tools = [{
"name": "search_web",
"description": "Web search.",
"input_schema": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]},
}]
worker_tasks = [
run_worker(m["id"], m["mission"], tools)
for m in plan["missions"]
]
worker_results = await asyncio.gather(*worker_tasks)
print(f"✅ {len(worker_results)} workers finished\n")
# STEP 3: the lead synthesizes
synthesis_input = "\n\n".join([
f"### Worker {r['worker_id']} ({r['iterations']} iterations)\n{r['summary']}"
for r in worker_results
])
final = await client.messages.create(
model=MODEL_LEAD,
max_tokens=4096,
system="You are an orchestrator. Synthesize the workers' results into a coherent response.",
messages=[{
"role": "user",
"content": f"Initial request: {user_request}\n\nWorker results:\n{synthesis_input}",
}],
)
return final.content[0].text
if __name__ == "__main__":
request = "Market intelligence on French AI agent observability tools in 2026."
result = asyncio.run(orchestrate(request))
print("\n" + "="*60)
print(result)while iterations < MAX_ITER), star communication (the orchestrator formulates missions, workers return summaries), parallelization (asyncio.gather), context isolation (each worker has its own messages = []), and final synthesis by the lead.It's ~150 lines to reproduce what the SDK does on its own. But now you know exactly what's happening.
3. Practical case: a "market intelligence" system
Let's put both paths to work on the same concrete problem.
The objective
A marketing team wants weekly intelligence on its 3 competitors. They don't want to read 50 articles, they want a 1-page brief per competitor, delivered Monday morning.
The architecture
Each worker does: web search (5 recent articles) → extraction of announcements / products / pricing → summary in 5 bullets. The orchestrator aggregates the 3 summaries and produces the final formatted brief.
SDK implementation (Path A)
options = ClaudeAgentOptions(
system_prompt="You drive weekly intelligence on 3 competitors listed by the user. "
"Launch one competitor_watcher per competitor in parallel, then call brief_writer.",
subagents=[competitor_watcher, brief_writer],
max_budget_usd=3.00,
)
async for msg in query(prompt="Competitors: Acme, Globex, Initech", options=options):
print(msg)The SDK orchestrates everything, you manage nothing. ~30 lines total with sub-agent definitions.
Custom implementation (Path B)
You reuse the orchestrate() you saw earlier, specifying the missions:
plan = {"missions": [
{"id": "wAcme", "mission": "Weekly intel on Acme: 5 recent articles, announcements, pricing, 5-bullet summary"},
{"id": "wGlobex", "mission": "Same for Globex"},
{"id": "wInitech", "mission": "Same for Initech"},
]}~200 lines total, but you control every model call, every tool, every return format.
4. Final comparison: SDK vs Custom
| Criterion | Claude Agent SDK | Raw API + MCP | |---|---|---| | Setup time | 1-2 days | 1-2 weeks | | Lines of code (intel case) | ~30 | ~200 | | Fine loop control | Limited (hooks) | Total | | Auto context compaction | ✅ Yes | ❌ Code it yourself | | Native parallel sub-agents | ✅ Yes | Code with asyncio.gather | | Built-in budget cap | ✅ max_budget_usd | ❌ Custom counter | | Deployment weight | 270-340 MB (CLI binary) | API library alone | | Debug & observability | Hooks provided | You instrument what you want | | Tool reusability | Via MCP | Via MCP | | Token cost | ~Identical | ~Identical | | Learning curve | Low | High | | Maintenance | You follow SDK versions | You maintain everything |
Choose your path at a glance
| Agent SDK (Path A) | API + MCP (Path B) | |
|---|---|---|
| Setup | 1-2 days | 1-2 weeks |
| Lines of code (intel case) | ~30 | ~200 |
| Loop control | Limited (hooks) | Total |
| Context compaction | Automatic | Code it yourself |
| Deployment weight | 270-340 MB | Library only |
| Multi-provider | No (Anthropic) | Yes |
| Learning curve | Low | High |
When to choose the SDK
- You're starting out and want an MVP in 48 hours.
- Your use case fits default patterns (research, generation, classic automation).
- You accept strong dependency on Anthropic.
- Binary weight isn't a blocker.
When to choose Custom
- You have exotic business logic (multi-model orchestration, persistent DB state, deep integration with an existing workflow).
- You want to switch providers (Claude, GPT, Gemini, Mistral) without rewriting your orchestration.
- You deploy in constrained environments (Lambda, edge, minimal containers).
- You're building a platform where orchestration is the core value.
5. Production deployment: what really breaks
You have a working prototype. Here's what will hit you in production.
💸 Capped budget, always — On the SDK: max_budget_usd on each session. On custom: token counter incremented on each call, kill switch on overage. Beyond: quota systems per user, per project, per day.
📊 Observability from day 1 — You must be able to answer: "Why did this agent make this choice on March 14 at 11:47?". So complete tracing: each message sent/received, each tool call with arguments, each result, each error. Store in a queryable database (not just text logs).
🛂 Permissions and safety hooks — Whitelist of tools usable per sub-agent. Validation of tool arguments before execution (an agent can generate SQL injection if you don't validate). Sandbox for code execution (if you allow the agent to execute).
🧪 Continuous evaluation — Build a suite of 20-50 reference scenarios that you replay on every prompt or model change. Measure: success rate, average cost, average duration. Without it, you break behaviors without knowing it.
🔁 Compaction and external memory — For long sessions (> 50 turns), implement external memory (file, database, vector store) where the orchestrator saves the plan and findings. When context saturates, compact to a condensed summary + reload from external memory.
⏱️ Timeouts everywhere — Timeout per model call (30s default), timeout per tool call (varies by tool), global session timeout (typically max 10 min in production). Without timeouts, a network bug leaves an open session consuming indefinitely.
6. Test your understanding
You want a multi-agent MVP in 48h, without reinventing the loop. Which path?
📚Technical glossary (expand)
Claude Agent SDK — Anthropic's official library (formerly Claude Code SDK) providing the nO loop, context compaction, sub-agents, and hooks; ships a CLI binary (270-340 MB).
Raw API — Direct calls to Anthropic's messages endpoint, with no orchestration layer: you code the loop yourself.
MCP (Model Context Protocol) — Open standard describing how an agent and a tool communicate. Lets you expose reusable tools.
MCP server — A service exposing tools/resources via MCP, often communicating over stdio (stdin/stdout).
stdio — Standard input/output channel (stdin/stdout) used by an MCP server to talk to the agent.
nO loop — An agent's execution loop: assemble context → call model → execute tools → loop until end_turn.
stop_reason — Field returned by the API: end_turn = done; tool_use = a tool is requested.
Sub-agent / worker — Secondary agent with its own isolated context, to which the orchestrator delegates a mission.
Orchestrator (Lead Researcher) — Central agent that decomposes the request, spawns workers, aggregates results.
Hooks (pre/post tool use) — Lifecycle interception points to log, validate, or block before/after a tool.
max_budget_usd — Per-session spend cap on the SDK side; a guardrail against costly loops.
asyncio.gather — Python primitive that launches multiple coroutines in parallel and awaits their results — used to parallelize workers.
Context compaction — Condensed summary of context when it saturates, to continue without losing the thread (auto in the SDK, hand-coded in custom).
External memory — Persistent storage (file, database, vector store) where the orchestrator saves plan and findings to survive long sessions.
Going further
- 📘 Official docs: Claude Agent SDK on
docs.claude.com - 📘 MCP Protocol:
modelcontextprotocol.io - 📘 Anthropic's reference post: "How we built our multi-agent research system" (June 2025)
- 📘 Code examples:
github.com/anthropics/claude-agent-sdk-demos
Summary
- Two viable paths: official SDK (fast, opinionated) or raw API + MCP (total control, more code).
- The SDK gives you the nO loop, compaction, parallel sub-agents, hooks, and capped budget for free.
- In custom, you rebuild these bricks (~200 lines for a functional orchestrator) but you master every detail.
- MCP is the common standard: your MCP-written tools are portable between SDK and custom.
- In production, the real work begins: budget, observability, permissions, continuous evaluation, external memory, timeouts.
- Recommended pattern: prototype with the SDK, then migrate to custom when business needs become specific.