Danel Kurka·May 24, 2026·9 min read·PILLAR

What is a multi-agent system

For 80% of businesses, one AI agent is enough. For the other 20%, one agent becomes a 2,000-line prompt that breaks every Friday and nobody can debug. That is when you need a multi-agent system. Here is what it actually is, when you cross the threshold, and the stack to build one.

Definition: what a multi-agent system actually is

A multi-agent system is two or more LLM-powered agents that hand work between each other to complete a job no single agent does well alone. Concretely:

Orchestrator agent — receives the user's request, decides which specialist to route to, holds the overall goal in memory.
Specialist agents — narrow scope, deep expertise. A sales-qualifier agent only qualifies leads. A research agent only searches and summarizes. A scheduler agent only books calendars.
Shared memory layer — usually Postgres or a vector DB. Lets agents pass state without re-explaining context every handoff.
Tool registry — the catalog of functions each agent can call. Specialists have small, focused tool lists.

Think of it like an organization: a manager (orchestrator) decides what gets done, and specialists (analyst, salesperson, support) do the focused work.

When one agent is enough

Honest signals that you should NOT build multi-agent:

One scenario, one goal. "Answer FAQ in Telegram." One agent, done.
Fewer than 8 tools. A single agent can juggle 8 tools without confusion. Above 15 — selection accuracy drops.
Linear flow. Each turn does one thing, no parallel work, no waiting on external events.
Single domain. The agent talks about one product, one process, one type of customer.

90% of my clients ship a single-agent version first, even when they will eventually need multi-agent. It is faster, cheaper, and proves the business case.

When you actually need multi-agent

Five signals that one agent is no longer enough:

Tool count crosses 15-20. One agent with 30 tools picks the wrong one ~25% of the time. Splitting into specialists, each with 5-8 tools, brings that back to 5%.
Parallel work is required. "While the research agent is gathering competitor data, the writer agent drafts the intro." One agent does these in series — multi-agent does them at once.
Different agents need different models. Reasoning done by Claude Opus, fast tool use by GPT-5 Mini, sensitive data handled by Hermes self-hosted. One process orchestrates all three.
Domains are too different. Sales agent needs a friendly closer-tone prompt. Compliance agent needs a strict, conservative prompt. Mixing them in one prompt — neither works well.
Long-running workflows. "Monitor inbox, draft reply, get human approval, send." Hours or days. Multi-agent with state persistence is the natural fit.

Real example: B2B sales pipeline

$23,500 / 10 weeks for a Warsaw SaaS client. The system:

Inbound agent — receives lead from website or email, extracts company name, role, intent.
Qualifier agent — pulls company data from Clearbit + LinkedIn, scores fit against ICP, decides SQL or nurture.
Content agent — drafts personalized follow-up referencing the prospect's recent posts and the SaaS's strongest case study for their industry.
Calendar agent — when prospect replies positive, looks up sales rep's availability, sends 3 slot options.
Orchestrator — decides which agent runs next, escalates to human on edge cases.

Outcome: MQL→SQL conversion 3× higher, +$340,000 quarterly revenue. Could not have shipped this as one agent — the qualifier prompt alone is 900 tokens of ICP-specific rules.

Real example: research bot

Internal tool for a consulting firm. Input: "summarize the European market for X in the last 6 months". Output: a 4-page briefing with sources.

Planner agent — splits the request into 8-12 sub-questions.
4× search agents — run in parallel, each takes 2-3 sub-questions, hits web + internal DB + paid research APIs.
Synthesizer agent — merges results, removes duplicates, ranks sources by trustworthiness.
Editor agent — writes the briefing in the firm's house style, inserts citations.

Single-agent version of this exists — it takes 25 minutes per brief and misses 30-40% of sources. Multi-agent version: 4 minutes, near complete coverage. Five agents, one orchestrator, one Postgres for shared state.

Real example: operations dashboard

Multi-agent reporting pipeline for a Berlin fintech. Every morning at 8 AM:

Collector agent × 4 — pulls from 4 databases in parallel (transactions, support tickets, ops events, finance ledger).
Anomaly agent — looks for outliers (refund spikes, latency, churn signals).
Narrator agent — writes a 1-page Slack summary with anomalies surfaced.
Router agent — decides who gets pinged for which anomaly (CFO for finance, CTO for latency, head of support for tickets).

Result: −40 hours/week of analyst work, anomalies caught 4 days earlier on average. Payback in 2.5 months on a $36,800 build.

The stack: how I actually build this

Orchestration layer

LangGraph — my default in 2026. Explicit state-graph, deterministic transitions, replayable. Good for production. Python or TS.
OpenAI Swarm — lighter, more declarative. Good for prototypes and OpenAI-only stacks. Less control over state.
Custom (FSM + handoffs in raw code) — when I need tight control or to avoid framework dependency. About 30% of my production builds. More code, fewer surprises.

Memory layer

Postgres — for structured state (current step, who owns the task, results so far).
pgvector or Pinecone — for semantic memory (past conversations, embedded knowledge).
Redis — for ephemeral state between agent turns, rate limits, locking.

Model layer

Almost always heterogeneous. Orchestrator on Claude Sonnet 4.5, latency-critical specialists on GPT-5 Mini, compliance specialist on Hermes self-hosted. See the model decision matrix →

Observability layer

Langfuse — traces every turn across every agent. Without it, debugging multi-agent is hell.
Helicone — alternative, cost-focused, less detailed traces.
Sentry — for the application-level errors around the agents.

Cost reality

Build cost — $5,000-50,000+ depending on agent count and integrations. See full pricing →
Token cost — 3-7× higher than single-agent because every handoff is extra context. Plan $200-1,500/mo for medium-volume systems.
Maintenance — multi-agent is harder to debug. Budget 20-30% of build cost per year for ongoing care.

Pitfalls I have learned the hard way

Do not let agents talk to each other freely.Always go through the orchestrator. Free chatter between agents causes infinite loops and runaway token bills.
Specialist agents need narrow tool lists. Give each specialist only the tools it needs. Sharing tools across agents kills the accuracy gain.
State must be explicit. Implicit "the agents will figure it out" never works. Define every handoff payload.
Eval each agent independently. Then eval the whole system. Two pass rates: per-agent and end-to-end.
Start with 2 agents, not 6. Most multi-agent systems I see in the wild have 3-4 too many agents. Each agent adds latency and a failure mode.

Should you build multi-agent?

Honest test: if a single agent with a 1,500-token prompt does not do your job, you might need multi-agent. If you have not tried that yet, build the single agent first and measure where it fails.

I have shipped both. I am quick to recommend the simpler one. Book a call and I will tell you which side of the threshold you are on — usually within 30 minutes.

Message @tribeofdanel →