Advanced⚙️

Building a multi-agent system with Claude: practical guide (SDK + MCP)

Complete tutorial to deploy Claude agents that collaborate. Comparison Agent SDK vs API + MCP. Python code, real case, production deployment.

38 min readPublished May 29, 2026 · 2 weeks ago

This article is the direct sequel to Agentic systems with Claude: architecture and theory. There we covered the what and the why. Here we tackle the how.

You'll find two parallel paths:

  • Path A: the Claude Agent SDK — Anthropic's official path, opinionated, fast to set up.
  • Path B: raw API + MCP — custom orchestration, more control, more pedagogical for understanding the internal mechanics.

Both paths lead to the same practical case: a "market intelligence" system with an orchestrator and 3 collaborating sub-agents. At the end, a comparison table helps you choose based on your context.

Prerequisites: Python 3.10+, an active Anthropic API key, and at least one API call already under your belt. If you've never used a tool call, first read Anthropic's official documentation on tools.

0. Common setup for both paths

Before diving in, let's prepare the environment.

my-agent-system/
├── .env                    # ANTHROPIC_API_KEY=sk-ant-...
├── requirements.txt
├── path_a_sdk/
│   ├── orchestrator.py
│   ├── sub_agents.py
│   └── tools/
└── path_b_custom/
    ├── orchestrator.py
    ├── mcp_server.py
    └── workers.py

Installation

python -m venv .venv
source .venv/bin/activate  # or .venv\Scripts\activate on Windows

# Path A: official SDK
pip install claude-agent-sdk python-dotenv

# Path B: raw API + MCP
pip install anthropic mcp httpx python-dotenv

The .env file

ANTHROPIC_API_KEY=sk-ant-api03-...

And loading it in your scripts:

from dotenv import load_dotenv
load_dotenv()
Security: never commit your .env. Add it to .gitignore right after git init.
Path A (SDK) vs Path B (API + MCP)
Path A · Claude Agent SDK official · opinionated · fast ✓ nO loop handled ✓ Auto context compaction ✓ Native parallel sub-agents ✓ Built-in max_budget_usd ✗ 270-340 MB binary ✗ Strong Anthropic coupling Path B · Raw API + MCP custom · full control · pedagogical ✓ Control of every call ✓ Multi-provider possible ✓ Lightweight deployment ✓ Portable tools via MCP ✗ Loop + budget to code ✗ Steep learning curve
The SDK hands you the loop, compaction, and sub-agents out of the box, at the cost of a heavy binary and strong coupling. The raw API gives back full control, at the cost of code to write.

1. Path A — Claude Agent SDK

Why this path

The Claude Agent SDK (formerly Claude Code SDK, renamed in 2025) is the infrastructure Anthropic uses internally for Claude Code, exposed as a library. You inherit their work on nO loop management, automatic context compaction, sub-agent spawning with isolated context, native MCP integration, lifecycle hooks, and per-session USD budget caps.

The SDK is not a pure API library: it bundles a Claude Code CLI binary that it launches as a subprocess. It weighs 270-340 MB per release, something to consider for Docker images and CI pipelines.

Your first agent in 15 lines

# path_a_sdk/first_agent.py
import anyio
from claude_agent_sdk import query, ClaudeAgentOptions
from dotenv import load_dotenv

load_dotenv()

async def main():
    options = ClaudeAgentOptions(
        system_prompt="You are an assistant that answers precisely.",
        max_budget_usd=0.50,  # hard cap, mandatory
    )
    async for message in query(
        prompt="What's the difference between a workflow and an agent?",
        options=options,
    ):
        print(message)

anyio.run(main)

That's it. No manual loop management, no tool call parsing, no context tracking. The SDK does it all.

max_budget_usd is not optional. Without this cap, an agent can loop and burn dozens of dollars in minutes. It's the first parameter to set.

Adding tools (built-in MCP)

The SDK accepts tools via the Model Context Protocol (MCP). You can point to an existing MCP server (Asana, GitHub, Slack…) or define your own tools in a few lines.

from claude_agent_sdk import query, ClaudeAgentOptions, tool
from dotenv import load_dotenv
import anyio

load_dotenv()

@tool(
    name="get_weather",
    description="Retrieves the current weather for a city",
)
async def get_weather(city: str) -> dict:
    # Here you'd call a real weather API
    return {"city": city, "temp": 18, "conditions": "cloudy"}

async def main():
    options = ClaudeAgentOptions(
        system_prompt="You are a weather assistant.",
        tools=[get_weather],
        max_budget_usd=0.50,
    )
    async for message in query(
        prompt="What's the weather in Paris and Lyon?",
        options=options,
    ):
        print(message)

anyio.run(main)

Claude will call get_weather twice (once per city), aggregate the results, and respond to you.

Spawning sub-agents with isolated context

This is where the SDK shines. To do orchestrator-workers, you declare sub-agents that the main agent can invoke as tools.

# path_a_sdk/orchestrator.py
import anyio
from claude_agent_sdk import query, ClaudeAgentOptions, subagent
from dotenv import load_dotenv

load_dotenv()

# Definition of a specialized sub-agent
researcher = subagent(
    name="web_researcher",
    description="Performs deep web research on a specific topic and returns a structured summary.",
    system_prompt="""You are an expert web researcher.
    For each query, perform 3-5 targeted searches.
    Return a structured JSON summary with: findings, sources, confidence_level.""",
    tools=["web_search"],
    max_turns=10,
)

synthesizer = subagent(
    name="synthesizer",
    description="Takes multiple research summaries and produces a coherent synthesis.",
    system_prompt="You are an analyst who cross-references sources to produce neutral syntheses.",
    max_turns=5,
)

async def main():
    options = ClaudeAgentOptions(
        system_prompt="""You are a research orchestrator.
        Break the query into independent sub-researches.
        Launch multiple web_researcher in parallel when relevant.
        Use synthesizer to produce the final report.""",
        subagents=[researcher, synthesizer],
        max_budget_usd=5.00,
    )

    async for message in query(
        prompt="Do market intelligence on French AI agent observability tools in 2026.",
        options=options,
    ):
        print(message)

anyio.run(main)

What's happening under the hood:

  1. The orchestrator reads the request and decides to spawn multiple web_researcher in parallel (one per angle).
  2. Each web_researcher runs in its own isolated context with its own token budget.
  3. Condensed summaries flow back to the orchestrator.
  4. The orchestrator calls synthesizer with the summaries to produce the report.
  5. The final report is returned to the user.

Hooks: observability and guardrails

The SDK exposes lifecycle hooks to intercept each step:

async def log_pre_tool(tool_name, tool_input):
    print(f"[PRE] {tool_name} called with {tool_input}")

async def log_post_tool(tool_name, tool_result):
    print(f"[POST] {tool_name} returned {len(str(tool_result))} chars")

options = ClaudeAgentOptions(
    # ... config ...
    pre_tool_use=log_pre_tool,
    post_tool_use=log_post_tool,
)

That's your minimum viable observability. In production you replace print with sends to Datadog, Sentry, or a custom tracing database.

2. Path B — Raw API + MCP

Why this path

The Agent SDK is elegant but hides things from you. If you want to understand precisely what happens in the loop, have total control over orchestration (custom business logic, exotic persistent state), not embed 300 MB of binary in your Docker image, or simply learn, then you code orchestration yourself on the raw API, using MCP as the protocol to expose your tools.

MCP's role in this path

MCP (Model Context Protocol) is an open standard that defines how an agent and a tool communicate. You write an MCP server that exposes capabilities (read a file, query a DB, call an API…), and any MCP-compatible agent can connect to it.

The benefits: your tools become reusable across agents and projects, you can version and test an MCP server independently, you decouple your business logic from the model.

MCP's role: one protocol, portable tools
Claude agent (SDK or custom) stdio / MCP MCP server exposes tools search_web query_db call_api
The MCP server exposes capabilities (search, DB, API) via a standard protocol. Any MCP-compatible agent connects to it — your tools become reusable and independently testable.

Building a custom MCP server

Here's a minimal MCP server that exposes a web search tool:

# path_b_custom/mcp_server.py
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import Tool, TextContent
import httpx
import asyncio

app = Server("mcp-search")

@app.list_tools()
async def list_tools() -> list[Tool]:
    return [
        Tool(
            name="search_web",
            description="Performs a web search and returns the top 5 results.",
            inputSchema={
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"},
                },
                "required": ["query"],
            },
        )
    ]

@app.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
    if name == "search_web":
        async with httpx.AsyncClient() as client:
            r = await client.get(
                "https://api.tavily.com/search",
                params={"query": arguments["query"], "max_results": 5},
            )
            results = r.json()
        return [TextContent(type="text", text=str(results))]
    raise ValueError(f"Unknown tool: {name}")

async def main():
    async with stdio_server() as (read, write):
        await app.run(read, write, app.create_initialization_options())

if __name__ == "__main__":
    asyncio.run(main())

You launch this server with python mcp_server.py. It communicates via stdin/stdout per the MCP standard.

Coding your own orchestrator

Now the most instructive part: rebuilding the nO loop and the orchestrator-workers pattern by hand.

# path_b_custom/orchestrator.py
import os, json, asyncio
from anthropic import AsyncAnthropic
from dotenv import load_dotenv

load_dotenv()
client = AsyncAnthropic()

MODEL_LEAD = "claude-opus-4-7"
MODEL_WORKER = "claude-sonnet-4-6"

# --- Worker definition ---
async def run_worker(worker_id: str, mission: str, tools: list) -> dict:
    """A worker runs in its own isolated context."""
    messages = [{"role": "user", "content": mission}]
    iterations = 0
    MAX_ITER = 10

    while iterations < MAX_ITER:
        iterations += 1
        response = await client.messages.create(
            model=MODEL_WORKER,
            max_tokens=4096,
            system=f"You are worker {worker_id}. Single mission: {mission}. Be concise.",
            tools=tools,
            messages=messages,
        )

        if response.stop_reason == "end_turn":
            final = next((b.text for b in response.content if hasattr(b, "text")), "")
            return {"worker_id": worker_id, "summary": final, "iterations": iterations}

        messages.append({"role": "assistant", "content": response.content})
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                result = await execute_tool(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": str(result),
                })
        messages.append({"role": "user", "content": tool_results})

    return {"worker_id": worker_id, "summary": "MAX_ITER reached", "iterations": iterations}


async def execute_tool(name: str, args: dict):
    """Tool dispatcher — here you'd call your MCP server or your functions."""
    if name == "search_web":
        return {"results": ["..."]}  # placeholder
    raise ValueError(f"Unknown tool: {name}")


# --- The orchestrator ---
async def orchestrate(user_request: str) -> str:
    print(f"🎯 Request: {user_request}\n")

    # STEP 1: the lead breaks the request into missions
    plan_response = await client.messages.create(
        model=MODEL_LEAD,
        max_tokens=2048,
        system="""You are an orchestrator. Break the user's request into 2-4
        INDEPENDENT and PARALLELIZABLE missions. Return strict JSON:
        {"missions": [{"id": "w1", "mission": "..."}, ...]}""",
        messages=[{"role": "user", "content": user_request}],
    )
    plan_text = plan_response.content[0].text
    plan = json.loads(plan_text[plan_text.find("{"):plan_text.rfind("}")+1])
    print(f"📋 Plan: {len(plan['missions'])} missions\n")

    # STEP 2: launch workers IN PARALLEL
    tools = [{
        "name": "search_web",
        "description": "Web search.",
        "input_schema": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]},
    }]

    worker_tasks = [
        run_worker(m["id"], m["mission"], tools)
        for m in plan["missions"]
    ]
    worker_results = await asyncio.gather(*worker_tasks)

    print(f"✅ {len(worker_results)} workers finished\n")

    # STEP 3: the lead synthesizes
    synthesis_input = "\n\n".join([
        f"### Worker {r['worker_id']} ({r['iterations']} iterations)\n{r['summary']}"
        for r in worker_results
    ])

    final = await client.messages.create(
        model=MODEL_LEAD,
        max_tokens=4096,
        system="You are an orchestrator. Synthesize the workers' results into a coherent response.",
        messages=[{
            "role": "user",
            "content": f"Initial request: {user_request}\n\nWorker results:\n{synthesis_input}",
        }],
    )
    return final.content[0].text


if __name__ == "__main__":
    request = "Market intelligence on French AI agent observability tools in 2026."
    result = asyncio.run(orchestrate(request))
    print("\n" + "="*60)
    print(result)
🔍
Read this code slowly. You can explicitly see the nO loop (while iterations < MAX_ITER), star communication (the orchestrator formulates missions, workers return summaries), parallelization (asyncio.gather), context isolation (each worker has its own messages = []), and final synthesis by the lead.

It's ~150 lines to reproduce what the SDK does on its own. But now you know exactly what's happening.

3. Practical case: a "market intelligence" system

Let's put both paths to work on the same concrete problem.

The objective

A marketing team wants weekly intelligence on its 3 competitors. They don't want to read 50 articles, they want a 1-page brief per competitor, delivered Monday morning.

The architecture

Market intelligence system architecture
Orchestrator Lead Researcher Worker 1 Competitor A Worker 2 Competitor B Worker 3 Competitor C Synthesizer Final brief · 1 page
The orchestrator splits into 3 parallel missions (one per competitor); each worker researches in an isolated context, then the synthesizer aggregates into a single brief.

Each worker does: web search (5 recent articles) → extraction of announcements / products / pricing → summary in 5 bullets. The orchestrator aggregates the 3 summaries and produces the final formatted brief.

SDK implementation (Path A)

options = ClaudeAgentOptions(
    system_prompt="You drive weekly intelligence on 3 competitors listed by the user. "
                  "Launch one competitor_watcher per competitor in parallel, then call brief_writer.",
    subagents=[competitor_watcher, brief_writer],
    max_budget_usd=3.00,
)
async for msg in query(prompt="Competitors: Acme, Globex, Initech", options=options):
    print(msg)

The SDK orchestrates everything, you manage nothing. ~30 lines total with sub-agent definitions.

Custom implementation (Path B)

You reuse the orchestrate() you saw earlier, specifying the missions:

plan = {"missions": [
    {"id": "wAcme", "mission": "Weekly intel on Acme: 5 recent articles, announcements, pricing, 5-bullet summary"},
    {"id": "wGlobex", "mission": "Same for Globex"},
    {"id": "wInitech", "mission": "Same for Initech"},
]}

~200 lines total, but you control every model call, every tool, every return format.

4. Final comparison: SDK vs Custom

| Criterion | Claude Agent SDK | Raw API + MCP | |---|---|---| | Setup time | 1-2 days | 1-2 weeks | | Lines of code (intel case) | ~30 | ~200 | | Fine loop control | Limited (hooks) | Total | | Auto context compaction | ✅ Yes | ❌ Code it yourself | | Native parallel sub-agents | ✅ Yes | Code with asyncio.gather | | Built-in budget cap | ✅ max_budget_usd | ❌ Custom counter | | Deployment weight | 270-340 MB (CLI binary) | API library alone | | Debug & observability | Hooks provided | You instrument what you want | | Tool reusability | Via MCP | Via MCP | | Token cost | ~Identical | ~Identical | | Learning curve | Low | High | | Maintenance | You follow SDK versions | You maintain everything |

Choose your path at a glance

 Agent SDK (Path A)API + MCP (Path B)
Setup1-2 days1-2 weeks
Lines of code (intel case)~30~200
Loop controlLimited (hooks)Total
Context compactionAutomaticCode it yourself
Deployment weight270-340 MBLibrary only
Multi-providerNo (Anthropic)Yes
Learning curveLowHigh

When to choose the SDK

  • You're starting out and want an MVP in 48 hours.
  • Your use case fits default patterns (research, generation, classic automation).
  • You accept strong dependency on Anthropic.
  • Binary weight isn't a blocker.

When to choose Custom

  • You have exotic business logic (multi-model orchestration, persistent DB state, deep integration with an existing workflow).
  • You want to switch providers (Claude, GPT, Gemini, Mistral) without rewriting your orchestration.
  • You deploy in constrained environments (Lambda, edge, minimal containers).
  • You're building a platform where orchestration is the core value.
Common hybrid pattern: prototype with the SDK to validate the concept, then migrate orchestration to custom once business needs become clear. Since MCP is standard, your tools migrate without rewriting.

5. Production deployment: what really breaks

You have a working prototype. Here's what will hit you in production.

💸 Capped budget, always — On the SDK: max_budget_usd on each session. On custom: token counter incremented on each call, kill switch on overage. Beyond: quota systems per user, per project, per day.

📊 Observability from day 1 — You must be able to answer: "Why did this agent make this choice on March 14 at 11:47?". So complete tracing: each message sent/received, each tool call with arguments, each result, each error. Store in a queryable database (not just text logs).

🛂 Permissions and safety hooks — Whitelist of tools usable per sub-agent. Validation of tool arguments before execution (an agent can generate SQL injection if you don't validate). Sandbox for code execution (if you allow the agent to execute).

🧪 Continuous evaluation — Build a suite of 20-50 reference scenarios that you replay on every prompt or model change. Measure: success rate, average cost, average duration. Without it, you break behaviors without knowing it.

🔁 Compaction and external memory — For long sessions (> 50 turns), implement external memory (file, database, vector store) where the orchestrator saves the plan and findings. When context saturates, compact to a condensed summary + reload from external memory.

⏱️ Timeouts everywhere — Timeout per model call (30s default), timeout per tool call (varies by tool), global session timeout (typically max 10 min in production). Without timeouts, a network bug leaves an open session consuming indefinitely.

6. Test your understanding

🧠 Quiz
Question 1 of 4

You want a multi-agent MVP in 48h, without reinventing the loop. Which path?

📚Technical glossary (expand)

Claude Agent SDK — Anthropic's official library (formerly Claude Code SDK) providing the nO loop, context compaction, sub-agents, and hooks; ships a CLI binary (270-340 MB).

Raw API — Direct calls to Anthropic's messages endpoint, with no orchestration layer: you code the loop yourself.

MCP (Model Context Protocol) — Open standard describing how an agent and a tool communicate. Lets you expose reusable tools.

MCP server — A service exposing tools/resources via MCP, often communicating over stdio (stdin/stdout).

stdio — Standard input/output channel (stdin/stdout) used by an MCP server to talk to the agent.

nO loop — An agent's execution loop: assemble context → call model → execute tools → loop until end_turn.

stop_reason — Field returned by the API: end_turn = done; tool_use = a tool is requested.

Sub-agent / worker — Secondary agent with its own isolated context, to which the orchestrator delegates a mission.

Orchestrator (Lead Researcher) — Central agent that decomposes the request, spawns workers, aggregates results.

Hooks (pre/post tool use) — Lifecycle interception points to log, validate, or block before/after a tool.

max_budget_usd — Per-session spend cap on the SDK side; a guardrail against costly loops.

asyncio.gather — Python primitive that launches multiple coroutines in parallel and awaits their results — used to parallelize workers.

Context compaction — Condensed summary of context when it saturates, to continue without losing the thread (auto in the SDK, hand-coded in custom).

External memory — Persistent storage (file, database, vector store) where the orchestrator saves plan and findings to survive long sessions.

Going further

  • 📘 Official docs: Claude Agent SDK on docs.claude.com
  • 📘 MCP Protocol: modelcontextprotocol.io
  • 📘 Anthropic's reference post: "How we built our multi-agent research system" (June 2025)
  • 📘 Code examples: github.com/anthropics/claude-agent-sdk-demos

Summary

  • Two viable paths: official SDK (fast, opinionated) or raw API + MCP (total control, more code).
  • The SDK gives you the nO loop, compaction, parallel sub-agents, hooks, and capped budget for free.
  • In custom, you rebuild these bricks (~200 lines for a functional orchestrator) but you master every detail.
  • MCP is the common standard: your MCP-written tools are portable between SDK and custom.
  • In production, the real work begins: budget, observability, permissions, continuous evaluation, external memory, timeouts.
  • Recommended pattern: prototype with the SDK, then migrate to custom when business needs become specific.
Tags
tutorielautomationagentsclaudesdkmcpmulti-agentspython

Read next

Building a multi-agent system with Claude: SDK + MCP · nAIvigate