We run our orchestrator and researcher in parallel. Here's what that bought us.

We run our orchestrator and researcher in parallel. Here's what that bought us.

Most multi-agent systems route sequentially. Classify the user's intent first, run a knowledge base lookup, then hand off to a specialized agent to actually respond. It's the default shape a LangGraph tutorial will hand you, and it's the shape we started with too.
The problem shows up in latency. On a sales conversation, every turn runs three LLM calls back to back. Our p50 sat above 3 seconds, and the tail was meaningfully worse. The biggest single contributor was the KB search, and the orchestrator was sitting idle waiting for it to finish before it could even decide who should answer.
So we stopped waiting. The orchestrator and the researcher now run in parallel on the entry node. The KB search no longer sits on the critical path. We moved a second off p50, we changed what the orchestrator is actually allowed to know when it makes its decision, and we discovered a failure mode we did not know we had.
Here is how the pattern works, what it cost us to get there, and when it is the wrong call.
Where the seconds actually go
In a sales conversation, latency has a much harder ceiling than most product work. Our internal rule is simple: 1 to 2 seconds per turn is fine, 10 seconds kills the conversation. The buyer does not sit and wait. They bounce, and the next time they come back your agent has to earn the trust again from zero.
That budget is tighter than it sounds when you add up a realistic turn. Orchestrator decision: 400 to 800 ms. Knowledge base retrieval with a reranker on top: 300 to 700 ms. Specialized agent response, streamed: 1 to 2 seconds depending on the model. If you run them in order, the floor is already 2 seconds before any of the slower tail cases hit.
The sequential shape also means your tail is purely additive. One slow embedding lookup does not just slow the retriever, it slows everything downstream of it, and a 10-second turn becomes normal on a busy hour.
The sequential design we started with
The natural way to write a multi-agent graph is as a pipeline. Input arrives. The orchestrator node fires first, classifies the user's intent and the conversation phase, and writes a next_agent field into state. The graph's conditional edge reads that field and routes to the right specialized agent. The specialized agent does its own KB call, responds, and edges to the end.
That design has two things going for it. It's easy to reason about, and the orchestrator never sees information it doesn't need. Every agent owns its own retrieval, which is clean.
It also has two problems we ran into immediately. The first is the latency I already described. The second is subtler: if the orchestrator doesn't know what's in the knowledge base for this query, it routes partly on guesswork. When the KB is thin on a topic, the technical consultant agent gets handed a turn it can't really answer, and we don't find out until it has already opened its mouth.
The fix both problems point at is the same. Let the retrieval happen earlier, and let the orchestrator see the result before it decides.
The parallel-entry pattern
The production shape is small. The entry node runs two nodes as a LangGraph RunnableParallel: the orchestrator and the researcher. Both read the current state. The researcher does a vector search over the customer's knowledge base and writes its hits into research_results. The orchestrator independently classifies the phase and writes next_agent into state. Results merge before the graph's conditional edge fires.
The routing logic after that is boring by design. A single conditional edge reads next_agent and sends the conversation to one of the specialized agents: discovery, support, technical consultant, adaptive response, content generator. Every specialized agent edges directly to END. No loops back to the orchestrator. The orchestrator runs exactly once per turn.
Two state-shape choices make this work without ugly merges. research_results and agent_responses are separate fields, not a shared blob — the researcher and orchestrator write to different keys, so there is nothing to reconcile. And the structured memory we carry between turns, which we call conversation_contextual_memory, is owned by the specialized agents that run later. The orchestrator reads it, it does not write to it, which eliminates the other class of merge conflict.
What this buys us, specifically:
- The KB search no longer blocks the orchestrator. On turns where the orchestrator's decision is obvious — a clear qualification signal, an off-topic deflection, a straight "user wants to talk to a human" — the researcher's work still happens, but does not lengthen the turn.
- When the orchestrator does need the KB context to route well — "does the knowledge base actually cover this question, or should we offer an email follow-up instead?" — the research result is already there by the time the merge completes. The orchestrator now gates routing partly on
kb_information_level(ACCURATE, PARTIAL, MISSING), and that field only exists because the researcher has already run. - The specialized agent downstream does not need to re-run its own retrieval. The researcher's output is already in state. One less LLM call on the critical path.
The most honest thing I can say is that the latency win mattered less than the second effect. Making the orchestrator knowledge-aware before it routes fixed a bug class we had been patching for months.
What it costs
This pattern is not free, and a lot of writeups on parallel agent orchestration skip the honest part. The tradeoffs we actually made:
Every turn runs a KB search, even when the orchestrator would have short-circuited. If a user's message is small talk, an off-topic deflection, or an immediate escalation request, the researcher still burns a vector search and a reranker call. At our current scale this is cheap. At ten times our current scale, it will matter, and the right answer at that point is probably a lightweight intent classifier that decides whether to fire the researcher at all. We will get there. We are not there yet.
Your state schema becomes the hard part. When two nodes write to the same state object in parallel, any field either of them might touch has to be conflict-free. You cannot have both writing to messages or to a single context blob. We ended up splitting state into sharply scoped fields — research_results, agent_responses, qualification_status, conversation_contextual_memory — partly for readability and partly because the parallel merge forced our hand. In the long run it made the system cleaner, but the first refactor was painful.
You have to think about cancellation. If the orchestrator decides mid-flight that the turn needs a human handoff and the researcher is still mid-embedding, you either wait for the researcher to finish (wasted) or cancel it (now you have a dangling callback and a confused KB trace). We chose to always let the researcher finish. It is the simpler path and the observability is cleaner. A team with a more aggressive latency budget would make the opposite call.
Debugging a parallel node is harder than debugging a sequential one. Your traces have to show that both nodes ran, what they wrote to state, and what order the merge happened in. We had to adjust our LangSmith setup before the parallel pattern was legible at 2am.
When not to parallelize
Two cases where the sequential shape is still the right call.
When your KB is expensive per call. If a single KB lookup runs a large Cohere rerank or fires a chain of retrieval calls, doing it on every turn, including the ones that do not need it, starts to dominate your cost. Measure before you optimize. If KB cost is a line item on your bill, put an intent gate in front of the researcher.
When the orchestrator's decision depends almost entirely on the KB result. If 95% of your turns go to the same specialized agent and the interesting routing decisions only happen when retrieval returns something weird, running the orchestrator before the researcher is pointless — you are just making a second LLM call that will always say "route to the default agent." A sequential KB-first design is simpler and equivalent in outcome.
Most production agent graphs are neither of those cases. They have a real orchestrator that does real routing work, and a KB that is cheap enough per call that firing it on every turn is fine. For those, parallel is the right default.
The one question to ask yourself
Look at the last 100 turns your agent handled. For each one, which pipeline stage was the user waiting on? In our case it was the KB search, sitting idle behind a classification decision the orchestrator could have made without it. Yours might be different. But something is on the critical path that doesn't have to be.
What's the one stage you could move off of it this week?


