Chat interfaces look simple from the outside, but once multiple agents and tools are involved, the architecture becomes much more interesting. A good multi-agent system depends on clear intent detection, reliable routing, safe tool execution, and enough structure to keep the experience natural for the user.
Core concepts
- Intent: The underlying goal behind a user’s message.
- Utterance: The exact text the user types into the chat that activates the intent.
- Slot: A specific piece of extracted data needed to complete the request.
- Agent tool: A callable capability an agent uses to read data, trigger an action, or interact with another system.
Basic chat workflow
A simple multi-agent chat flow can be represented like this:
user input from chat interface
-> platform discovers user intent
-> platform calls the right agent
-> agent processes request
-> agent calls the tool
-> tool executes and returns its response
-> agent takes the tool result and replies back to the user
This flow looks straightforward, but one step carries most of the architectural risk: discovering the user’s intent. If the platform gets that part wrong, the wrong agent receives the request, the wrong tool may be called, and the final answer can become irrelevant or even dangerous.
The intent problem
In a multi-agent system, intent discovery is not a nice-to-have feature; it is the routing layer that determines whether the entire interaction succeeds. A natural language interface should understand what the user means without forcing them to manually pick an agent, because making users choose an internal system component breaks the conversational model.
From the developer’s perspective, this raises practical questions such as:
- How should free-text messages be classified into valid agent domains?
- What should happen when a message could belong to more than one agent?
- How should the system behave when the confidence score is low?
- Should routing rely on rules, embeddings, an LLM, or a hybrid approach?
- How can routing decisions be logged, evaluated, and improved over time?
- When should the platform ask a clarification question instead of guessing?
- How can the platform prevent an agent from using the wrong tool after a bad routing decision?
From static routing to semantic routing
A common first implementation is a static router with hardcoded assumptions. That approach often works for a demo, but it quickly breaks once users start asking for different kinds of help in plain language.
For example, if every request falls back to an inventory-related agent, a message like “How do I reset my password?” could be interpreted as a warehouse or product lookup. The root problem is not the downstream agent; it is the routing strategy.
The better approach is to move from a static router to a dynamic semantic router. Two common patterns are especially useful: the LLM Supervisor and the Semantic Vector Router.
Pattern 1: The LLM Supervisor
The LLM Supervisor pattern uses a small, fast language model whose only responsibility is to classify the user’s message into one of the available agent categories. Instead of hardcoding routing logic in application code, the platform delegates intent detection to a constrained model call that returns a valid routing key.
This pattern is especially helpful when the agent domains are semantically close and require reasoning to distinguish. It also works well when the router needs to handle messy real-world phrasing, indirect requests, or sentences with missing context.
Example: anonymized Ruby router
class AgentRouter
def initialize(message:)
@message = message.to_s
end
def call
agent_key = discover_intent(@message)
case agent_key
when "inventory"
InventoryAgent.handle(message: @message).generate_now!
when "support"
SupportAgent.handle(message: @message).generate_now!
when "billing"
BillingAgent.handle(message: @message).generate_now!
else
GeneralFallbackAgent.handle(message: @message).generate_now!
end
end
private
def discover_intent(message)
response = LLMClient.new.chat(
parameters: {
model: "small-fast-model",
response_format: { type: "json_object" },
messages: [
{
role: "system",
content: "You are a routing assistant. Classify the user message into one of these categories: 'inventory', 'support', or 'billing'. Respond only with JSON like {\"category\":\"inventory\"}."
},
{
role: "user",
content: message
}
]
}
)
JSON.parse(response.dig("choices", 0, "message", "content"))["category"]
rescue StandardError
"fallback"
end
end
In this pattern, the model acts like a semantic dispatcher. It does not solve the user’s problem directly; it only decides which specialized agent should handle the next step.
Pattern 2: The Semantic Vector Router
The Semantic Vector Router avoids a chat-completion call for routing and instead relies on embeddings and similarity search. This is often faster and cheaper, especially when the available agent domains are clearly different from each other.
What are embeddings?
Embeddings are numeric vector representations of text that encode semantic meaning, so phrases with similar meaning tend to be located close to each other in vector space. In practice, this means the system can compare a user’s message to pre-defined example phrases and choose the nearest semantic match.
Example: anonymized routing approach
class SemanticRouter
def initialize(embedding_client:)
@embedding_client = embedding_client
end
def call(message)
message_vector = embed(message)
scores = {
inventory: max_similarity(message_vector, inventory_examples),
support: max_similarity(message_vector, support_examples),
billing: max_similarity(message_vector, billing_examples)
}
best_match = scores.max_by { |_, score| score }
best_match ? best_match.first.to_s : "fallback"
end
private
def embed(text)
@embedding_client.embed(text)
end
def inventory_examples
[
embed("where is the stock"),
embed("create a warehouse"),
embed("list available products")
]
end
def support_examples
[
embed("login issue"),
embed("error on screen"),
embed("reset my password")
]
end
def billing_examples
[
embed("show my invoice"),
embed("payment failed"),
embed("what is my current balance")
]
end
def max_similarity(message_vector, example_vectors)
example_vectors.map { |vector| cosine_similarity(message_vector, vector) }.max
end
def cosine_similarity(a, b)
dot_product(a, b) / (magnitude(a) * magnitude(b))
end
def dot_product(a, b)
a.zip(b).sum { |x, y| x * y }
end
def magnitude(vector)
Math.sqrt(vector.sum { |value| value**2 })
end
end
This pattern works best when the semantic boundaries between agents are distinct enough that nearest-neighbor matching is reliable. It is also a strong option when routing speed matters and the platform needs to operate with minimal latency.
When to use each pattern
The LLM Supervisor is usually the better choice when the system needs reasoning, contextual understanding, or cleaner handling of ambiguous requests. The Semantic Vector Router is a better fit when the domains are clearly separated and the main goal is low-cost, low-latency intent classification.
In practice, many production systems combine both. For example, a vector router can handle obvious cases quickly, while low-confidence matches can be escalated to an LLM supervisor for a second opinion.
Multi-agent complexity in the real world
Multi-agent development becomes complex very quickly because each agent is not only responsible for reasoning, but also for selecting tools, managing state, and deciding when to act. Once several agents and tools are connected to the same chat interface, the architecture must handle intent routing, permission boundaries, retries, tool failures, observability, and conversational continuity at the same time.
The difficulty grows even more when tools are mixed across read-only and update-capable operations. Reading data is usually safe, but writing data changes system state, which means the platform must be much more careful about validation and execution.
Complexities of mixing read-only and update tools
- A misclassified intent can trigger a write action when the user only wanted information.
- A read-only tool may be safe to run automatically, but an update tool may require confirmation first.
- Two agents can read the same state and then perform conflicting updates based on stale data.
- Retries become dangerous when a failed request may have partially completed a write operation.
- Audit logs become essential because the system must explain not only what it answered, but also what it changed.
- Access control becomes more granular because some agents may be allowed to read records but not modify them.
- Rollback strategies become necessary when one tool in a multi-step chain succeeds and another fails afterward.
- Human approval may be needed for sensitive operations such as payments, deletions, account changes, or inventory adjustments.
Final thoughts
A strong multi-agent chat system is not defined only by the intelligence of its agents, but by the quality of its orchestration. Intent discovery, routing strategy, and safe tool usage are what make the difference between a chat interface that feels smart and one that feels unpredictable.