When one agent isn't enough — orchestrator pattern for SMBs
Limits of a single LLM agent (context, bias, rate limit), concrete thresholds for switching to multi-agent, patterns.
TL;DR
- A single LLM that “does everything” hits a ceiling fast. Three hard limits: context size (~200k useful tokens on Claude, ~128k on GPT-4o), convergence bias (the model anchors on its first interpretation), and API rate limit (5-20 requests/sec on standard tier).
- The multi-agent switch is justified when the input exceeds 50k tokens per task, when the decision needs orthogonal angles (drafting vs critique), or when parallelization divides wall time by 3 or more.
- Three patterns that work in SMBs: orchestrator + workers (plan/exec separation), sequential pipeline (extraction → validation → drafting), cross-review (drafter + reviewer + arbiter).
- Typical SMB cost (25 employees): single-agent stack runs ~50-200 euros/month of API. A well-cadenced multi-agent stack runs ~150-600 euros/month (2-4x more), for a measurable quality gain of 15-40 percent on complex tasks and a wall-time reduction of 2 to 5x.
- Anti-pattern #1: cloning the same agent five times with slightly different prompts. Costs five times as much, gains nothing. Roles must be truly orthogonal (different tools, different context, different objective function).
- Verdict: switch to multi-agent when you can measure it, not when you guess. Simple test: if a single LLM achieves 70 percent first-pass accuracy, aiming for 90 percent in multi-agent makes sense. If single-agent is already at 95 percent, change nothing.
The single-agent ceiling — three hard limits
Before we talk patterns, we need to understand why a single agent stops being enough past a certain threshold. Three concrete ceilings, observed on real SMB engagements.
1. The useful context window isn’t the advertised one
Claude Sonnet 4.6 advertises 200,000 tokens of context. GPT-4o advertises 128,000. In practice, “needle in a haystack” benchmarks (finding hidden info in a long text) show clear degradation past 60-80 percent fill. A prompt loading 150k tokens has a high chance of seeing the agent “forget” instructions placed in the middle.
Concrete consequence in an SMB: an agent processing a customer file with 15 supplier emails, 4 PDF specs and the past 6 quote history starts losing thread past around the third document. Recall drops from greater than 95 percent on the first 30k tokens to 60-75 percent on tokens 100k-200k.
The fix isn’t a bigger context window — it’s splitting the work into agents that each load only their relevant slice.
2. Convergence bias — the model anchors on its first interpretation
When an LLM is asked an ambiguous question, it picks an interpretation in the first generated tokens and sticks with it. Even if late information contradicts that initial interpretation, the model rarely backtracks autonomously.
Field example: agent receives an RFQ for “a stainless-steel structural part.” Initial interpretation: 304 stainless. Five paragraphs in, the spec mentions “operation at 850°C” — which forces 310 or 316Ti. The agent often misses the contradiction and quotes 304.
The fix: a second agent whose job is only to challenge the first one’s interpretations. We’ll come back to this in the cross-review pattern.
3. API rate limit and parallelization
Standard API tiers (Anthropic, OpenAI) cap at 5-20 requests/sec. For a long task that calls the LLM 100 times sequentially, that’s 5-20 seconds floor — even if each individual call is fast. With multi-agent and parallelization, you compress those 100 calls into a few seconds.
The economic rule: each second saved on a customer-facing flow has a measurable value. Field examples:
- Quote drafting agent: from 8 minutes (single-agent sequential) to 1.5 minutes (5-agent parallel). 6.5 minutes saved per quote, 30 quotes per week, 25 hours per quarter.
- Customer email triage: from 12 seconds per email to 3 seconds. 1500 emails per month, 4 hours per month saved on the salesperson.
The three patterns that work in SMBs
Pattern 1: Orchestrator + Workers
One main agent (the orchestrator) reads the task, decomposes it into sub-tasks, dispatches each sub-task to a specialized worker agent, then aggregates the results.
%%{init: {'theme':'base','themeVariables':{'primaryColor':'#F3EADE','primaryBorderColor':'#7DB5A5','primaryTextColor':'#2C3E42','tertiaryColor':'#E99971','lineColor':'#7DB5A5','fontFamily':'Inter, system-ui, sans-serif','fontSize':'13px'}}}%%
flowchart TB
user["Task input"] --> orch["Orchestrator agent<br/>(decomposes, dispatches, aggregates)"]
orch --> w1["Worker 1<br/>extraction"]
orch --> w2["Worker 2<br/>cross-check"]
orch --> w3["Worker 3<br/>drafting"]
w1 --> agg["Aggregated result"]
w2 --> agg
w3 --> agg
agg --> out["Final output"]
classDef orch fill:#E99971,stroke:#C97A55,color:#FFFFFF
classDef worker fill:#7DB5A5,stroke:#7DB5A5,color:#FDFBF8
classDef io fill:#F3EADE,stroke:#7DB5A5,color:#2C3E42
class user,out,agg io
class orch orch
class w1,w2,w3 worker
When to use it: the task is decomposable into independent sub-tasks, each with a well-defined contract. Typical SMB applications: drafting an industrial bid where one worker handles the technical part, another the commercial part, a third checks regulatory compliance.
Common trap: an over-talkative orchestrator that re-injects all the worker output into its own context. You’re back to the single-agent ceiling problem. The orchestrator should aggregate summaries, not raw outputs.
Pattern 2: Sequential pipeline
Each agent receives the previous one’s output, processes it, hands off to the next. No parallelization, but each agent operates on a context narrowed to its specialty.
Example for supplier invoice processing:
- Extraction agent: PDF in, raw structured JSON out (header, lines, totals). Vision model.
- Validation agent: JSON in, JSON-with-checks out (dates consistent? tax rate matches the country? supplier known?). Reasoning model.
- Mapping agent: validated JSON in, accounting-entry-ready output out (debit/credit accounts mapped to your chart of accounts). RAG model with the chart of accounts.
- Reconciliation agent: entry in, status (matched / suggested / blocked) out. Domain model.
Each step is auditable, replayable, replaceable. If the extraction agent regresses on a new PDF format, you replace just it without breaking the pipeline.
When to use it: the workflow is naturally sequential and each step has measurable value. Typical SMB applications: any inbox-to-ERP flow (invoices, orders, returns).
Pattern 3: Cross-review
Two (or more) agents play opposing roles: drafter vs critic. A third agent arbitrates if they disagree.
%%{init: {'theme':'base','themeVariables':{'primaryColor':'#F3EADE','primaryBorderColor':'#7DB5A5','tertiaryColor':'#E99971','lineColor':'#7DB5A5','fontFamily':'Inter, system-ui, sans-serif','fontSize':'13px'}}}%%
flowchart LR
input["Input"] --> drafter["Drafter agent"]
drafter --> reviewer["Reviewer agent"]
reviewer -- "OK" --> output["Validated output"]
reviewer -- "challenge" --> arbiter["Arbiter agent"]
arbiter --> output
classDef agent fill:#E99971,stroke:#C97A55,color:#FFFFFF
classDef io fill:#F3EADE,stroke:#7DB5A5,color:#2C3E42
class input,output io
class drafter,reviewer,arbiter agent
When to use it: the task is sensitive (contractual, financial, regulatory) and an error costs more than the marginal cost of a second agent. Typical SMB applications: industrial bid response, sensitive customer email reply, important contract draft.
Common trap: drafter and reviewer share the same prompt and the same model. Then they agree on everything. The reviewer must have different instructions (e.g. “you are a paranoid lawyer reading the bid drafted by an over-enthusiastic engineer”) and ideally a different model.
When NOT to switch to multi-agent
Three signs that single-agent is enough:
- First-pass accuracy already greater than 95 percent. Adding agents won’t increase precision much, and you pay 2-4x.
- Single sub-second wall time. If the user gets their answer in less than a second, you have nothing to gain from parallelization.
- No orthogonality in the workflow. If all your sub-tasks are “rephrase this paragraph” with slight variations, splitting into 5 agents won’t help. They all do the same thing.
Anti-pattern observed often: a consultant deploys a 7-agent stack on a problem a smart prompt would have solved with one. The TCO bloats, debugging becomes hellish, and accuracy doesn’t improve. Multi-agent is a tool — not an end goal.
SMB cost — what we observe in 2026
For a 25-employee industrial SMB, three internalized agentic flows:
| Stack | Monthly LLM API cost | Yearly TCO (with maint.) |
|---|---|---|
| Single-agent (sequential, 1 model) | 50 - 200 euros/mo | 8 - 18k euros/year |
| Sequential multi-agent pipeline (3-4 agents) | 150 - 400 euros/mo | 14 - 30k euros/year |
| Orchestrator + workers (5-8 agents) | 300 - 800 euros/mo | 25 - 50k euros/year |
| Cross-review with arbiter (3 agents, sensitive flows) | 200 - 500 euros/mo | 18 - 35k euros/year |
The LLM cost is rarely the deciding factor. Maintenance (consultant days/year to maintain prompts, evaluations, replace deprecated models) is the dominant component on year 2 onwards. Plan 4-8 days/year minimum for an active multi-agent stack.
How to validate the switch — the simple test
Before deploying a multi-agent stack, do this measurement on 50-100 representative cases:
- Run the single-agent stack. Score first-pass accuracy.
- Run a 2-agent stack (drafter + reviewer for example). Score accuracy.
- Compute the gain. If less than 10 points of precision gained, give up multi-agent. The cost isn’t worth it.
- If greater than 10 points, study which interactions produce the gain and dimension your final stack accordingly.
The trap to avoid: validating multi-agent on cherry-picked anecdotal cases. Run on a representative sample, with hard scoring (not “feels better”).
What to remember
- Single-agent hits a ceiling on three things: context size, convergence bias, rate limit.
- Multi-agent makes sense when the gain is measurable (greater than 10 points of precision) and the wall time matters.
- Three robust patterns: orchestrator + workers, sequential pipeline, cross-review.
- Cost: 2-4x more than single-agent. Justifiable only on flows where the marginal value is real.
- Validate the switch on a representative sample before deploying. Don’t fall for the multi-agent toy.
If you’re thinking about this for a project, the right starting question is: what’s my current single-agent first-pass accuracy, and what would 10 extra points be worth? If you can’t quantify the answer, you don’t need multi-agent yet.