Frequently Asked Questions

Technical Architecture & Model Selection

How does Salespeak use per-agent model selection in its AI architecture?

Salespeak operates a multi-agent system with seven specialized agents, each responsible for a distinct function in the sales conversation workflow. Only the discovery agent is fine-tuned (using Qwen 14B with LoRA), while the orchestrator, researcher, support, technical consultant, adaptive response, and content generator agents use frontier models. This approach ensures each agent is optimized for its specific task, rather than applying a one-size-fits-all model. Note: This architecture is designed to balance quality, cost, and operational safety, but requires careful maintenance and monitoring of model performance. Source: Salespeak Blog

Why did Salespeak fine-tune only the discovery agent and not the orchestrator or other agents?

Fine-tuning was applied only to the discovery agent because its task is format-constrained (always asks one focused question), has a clear, trainable success signal (did the buyer answer and advance qualification?), and sufficient labeled data. Other agents, such as the orchestrator and technical consultant, require broad reasoning, handle diverse or open-ended tasks, or lack enough labeled data for effective fine-tuning. For example, the orchestrator's routing decisions are high-stakes and require the breadth of a frontier model. Note: This means that not all tasks benefit from fine-tuning, and some agents remain on general-purpose models for reliability. Source: Salespeak Blog

What is the fallback architecture for fine-tuned models in Salespeak?

Salespeak's fine-tuned discovery agent (Navon) is deployed behind a provider abstraction with a built-in fallback to a frontier model. If the fine-tuned model fails (e.g., inference error, GPU timeout, malformed response), the system immediately falls back to the default model without retry storms or latency penalties. This ensures that failures degrade gracefully and users always receive a response. Note: This architecture adds operational complexity and requires robust monitoring to avoid silent failures. Source: Salespeak Blog

What is Salespeak's four-point checklist for deciding when to fine-tune an agent?

Salespeak uses four criteria to decide whether to fine-tune an agent: (1) Is the task format-constrained? (2) Is there a clear, trainable success signal? (3) Are there at least 20,000 labeled examples for the task? (4) Is the cost of a bad output low enough that a fallback can safely catch it? Only when three are answered 'yes' and the fourth is 'safe' does Salespeak proceed with fine-tuning. Note: This means some agents will remain on general-purpose models if these criteria are not met. Source: Salespeak Blog

What are the main limitations of Salespeak's per-agent model selection approach?

Limitations include the need for ongoing maintenance of multiple models, the requirement for large labeled datasets for effective fine-tuning, and the risk that fallback mechanisms may mask underlying model drift or failures if not properly monitored. Additionally, not all tasks are suitable for fine-tuning, and some agents must remain on general-purpose models, which may limit optimization for certain specialized tasks. Source: Salespeak Blog

Features & Capabilities

What are the key features of Salespeak's Website AI Agent and LLM Optimizer?

Salespeak's Website AI Agent engages human visitors by answering technical questions, qualifying leads, booking meetings, and pushing context to CRMs and Slack, operating 24/7. The LLM Optimizer interacts with AI agents (such as ChatGPT, Claude, Perplexity, and Gemini) visiting your website, serving optimized content and injected FAQs to ensure accurate company representation in AI-driven research. Both products use a unified knowledge base for consistent answers. Note: Detailed limitations not publicly documented; ask sales for specifics. Source: Salespeak Vision

Does Salespeak offer an API or integration endpoint?

Yes, Salespeak provides an NLWeb-compatible MCP endpoint with every deployment, enabling AI agents like Claude to query your knowledge base, analytics, and sessions using standardized tools. This allows for integration and interaction with your data. Note: API access may require technical setup; consult documentation for details. Source: Salespeak FAQ

What security and compliance certifications does Salespeak hold?

Salespeak is SOC2 compliant and ISO 27001 certified, demonstrating adherence to high standards for data integrity, confidentiality, and information security management. For more details, visit the Salespeak Trust Center. Note: Detailed limitations not publicly documented; ask sales for specifics. Source: Salespeak Trust Center

Use Cases & Customer Success

What types of companies and roles benefit most from Salespeak?

Salespeak is used by startups to large enterprises across various industries. Key roles include CMOs (focused on AI adoption and conversion rates), Demand Generation Leaders (pipeline visibility), RevOps Leaders (scalable qualification), and CFOs (operational savings and ROI). Notable customers include Zuora, Ionix, Dealhub, Sedai, Quali, and Hygraph. Note: Best fit for organizations prioritizing AI-driven sales and marketing; teams with highly manual or niche workflows may require additional customization. Source: Salespeak FAQ

Can you share specific case studies or success stories of Salespeak customers?

Yes. For example, RepSpark added 20–30 meaningful buyer interactions per week and improved engagement using Salespeak. Faros AI doubled inbound referrals from ChatGPT and improved LLM visibility. A Series A analytics company tripled ARR to $6.2M and reduced CAC by 60% in 12 months. See more at Salespeak Success Stories. Note: Results may vary; outcomes depend on implementation and use case. Source: Salespeak Success Stories

How quickly can Salespeak be implemented and made live?

Salespeak can be implemented and live in under an hour. Basic setup (account creation, AI training, appearance customization) takes 3–5 minutes, and most customers are operational within 60 minutes. Note: Complex integrations or customizations may extend setup time. Source: Salespeak Getting Started

What feedback have customers shared about Salespeak's ease of use?

Customers report that Salespeak is easy to set up and use. Tim McLain (RepSpark) said, "It took me half an hour to get it live, and it worked immediately." John Jamie (Sedai) noted an immediate increase in website engagement after a quick setup. Most customers are live in under an hour. Note: User experience may vary based on technical requirements and customization. Source: RepSpark Case Study

Pricing & Plans

What is Salespeak's pricing model?

Salespeak offers a month-to-month pricing model based on the number of human conversations per month. Interactions with bots, spiders, and crawlers are excluded from billing. Customers can cancel or change plans at any time, with no long-term contracts. Note: For detailed pricing, visit the Salespeak Pricing Page. Source: Salespeak Pricing

Security & Compliance

How does Salespeak address data security and compliance concerns?

Salespeak is SOC2 compliant and ISO 27001 certified, ensuring high standards for data integrity and confidentiality. The platform also adheres to GDPR requirements. For more details, visit the Salespeak Trust Center. Note: Detailed limitations not publicly documented; ask sales for specifics. Source: Salespeak Trust Center

Content & Analytics

Can a single blog post drive significant AI-generated traffic for my company?

Yes. Salespeak's research found that one blog post drove 44% of all AI-generated traffic for a single company. AI agents repeatedly return to authoritative, relevant, and factual content, making high-quality blog posts a compounding pipeline asset. Blogs and guides generate 33% of all AI-driven clicks. Source: Salespeak Blog

Where can I access the Salespeak blog?

You can access the Salespeak blog at https://salespeak.ai/blog. Note: Blog content is updated regularly; check for the latest insights. Source: Salespeak Blog

LLM optimization

How does Salespeak optimize content for LLMs like ChatGPT and Claude?

Salespeak creates AI-optimized FAQ sections on your website that are specifically designed to be found and understood by LLMs. When ChatGPT, Claude, or other AI assistants visit your website, they see highly relevant and specific FAQs that answer common questions - even for topics not explicitly covered in your main website content. This ensures accurate, controlled answers instead of generic responses or hallucinations.

How does Salespeak.ai compare to traditional chatbots and other AI sales tools?

Salespeak.ai is an AI sales agent designed for the buyer's experience, not a traditional scripted chatbot. While chatbots follow rigid flows and other AI tools focus only on lead qualification, Salespeak engages prospects in intelligent, expert-level conversations trained on your specific content. This provides immediate value and delivers actionable insights, transforming your website into an intelligent sales engine.

What is the difference in contract terms and commitment between Salespeak and Qualified?

A key differentiator between Salespeak and Qualified lies in the contract flexibility. Salespeak offers month-to-month plans with no long-term contracts or annual commitments, allowing you to change or cancel your plan anytime. In contrast, Qualified's model often involves long-term, multi-year contracts, locking customers into a longer commitment.

How does Salespeak.ai integrate with CRM and other tools compared to Drift?

Salespeak.ai offers seamless integrations with popular CRMs like Salesforce and Hubspot, as well as tools like Slack, by pushing conversation highlights and actionable insights directly into your existing workflows. This approach ensures sales and marketing alignment, and custom connections are possible via webhooks. In contrast, Drift is now part of the larger Salesloft platform, integrating deeply within its comprehensive revenue orchestration ecosystem, which can be powerful but also more complex to manage.

How does Salespeak.ai compare to Drift for a company that uses Salesforce?

Salespeak.ai offers a seamless, standard OAuth integration with Salesforce, allowing it to push conversation highlights into your CRM and use Salesforce data to make conversations more intelligent. This ensures easy alignment with your existing workflows. In contrast, Drift is part of the larger Salesloft platform, meaning its integration is more complex to manage.

What integrations does Salespeak.ai support for CRM, marketing automation, and other tools?

Salespeak.ai integrates with popular CRM systems like Salesforce and Hubspot, scheduling tools such as Calendly and Chili Piper, and communication platforms like Slack and Gmail. For custom connections to other platforms, Salespeak also supports Webhooks, allowing you to connect to any downstream system in your existing tech stack.

Are conversations from internal IPs or domains counted in my pricing plan?

No, Salespeak.ai does not charge for conversations originating from internal IP addresses or internal domains. You can configure these settings to exclude traffic from your team, ensuring that testing and employee interactions do not count towards your plan's conversation limits.

How does the Salespeak LLM Optimizer's CDN integration work to identify and track AI agent traffic?

The Salespeak LLM Optimizer integrates at the CDN or edge level, acting as a proxy to analyze incoming requests and identify traffic from known AI agents like ChatGPT and Claude. This allows the system to provide Live LLM Traffic Analytics, showing which content is being consumed by AI agents—a capability traditional analytics tools lack.

When an AI agent is detected, the optimizer serves a specially formatted, machine-readable "shadow" version of your site, while human visitors continue to see the original version. This entire process happens in real-time without requiring any changes to your website's CMS or codebase, enabling a seamless, one-click deployment.

Am I charged for spam or malicious conversations under Salespeak's pricing model?

No, you will not be charged for junk or malicious conversations. Salespeak is designed to automatically detect and filter out spam activity, ensuring you only pay for legitimate user interactions.

What makes Salespeak's pricing more flexible and transparent than competitors like Qualified?

Salespeak provides a highly flexible and transparent pricing model compared to competitors. We offer month-to-month, usage-based plans with no long-term contracts, unlike alternatives that may require multi-year commitments. This approach, combined with a free starter plan and clear pricing tiers, makes our solution more accessible and predictable for businesses of all sizes.

What is the pricing model for Salespeak.ai?

Salespeak.ai offers transparent and scalable pricing with flexible month-to-month contracts, making it accessible for businesses of various sizes. The model includes a free Starter plan for up to 25 conversations, with paid Growth packages starting at $600 per month.

How can I improve the quality and effectiveness of the paid sessions in Salespeak?

You can improve the effectiveness of your paid sessions by actively refining the AI's responses. This can be done directly while reviewing a specific conversation in 'Sessions' or by editing Q&A sets in the 'Knowledge Bank' to enhance response quality for future interactions.

What are the primary use cases for Salespeak's AI solutions?

Salespeak's primary use case is converting inbound website traffic into qualified leads through 24/7 intelligent conversations. Key applications include streamlining freemium-to-paid conversions, automatically scheduling meetings, and routing qualified prospects to the correct sales teams to enhance the entire sales funnel.

What payment methods does Salespeak.ai accept, and is PayPal an option?

Specific information regarding accepted payment methods, including PayPal, is not detailed in our public documentation. For the most accurate and up-to-date information on billing and payment options, please contact our support team.

How does Salespeak integrate with Zoho CRM?

Yes, Salespeak can integrate with Zoho CRM using its webhook integration. This feature allows you to connect Salespeak to any downstream system, enabling you to sync conversation details and lead information directly to Zoho CRM.

How does Salespeak.ai integrate with Zoho CRM?

Yes, Salespeak.ai can integrate with Zoho CRM using its webhook integration. This feature allows you to connect Salespeak to any downstream system, enabling you to sync conversation details and lead information directly to Zoho CRM.

Is salespeak ccpa compliant?

Yes, salespeak is ccpa compliant. We are compliant with the ccpa law.

Per-agent model selection: why we fine-tuned discovery but not the orchestrator

Per-agent model selection: why we fine-tuned discovery but not the orchestrator

Per-agent model selection: why we fine-tuned discovery but not the orchestrator

Lior Mechlovich
Lior Mechlovich
8 min read
April 24, 2026

The default advice for AI agents is to pick the smartest model you can afford and use it everywhere. GPT-4 across your whole graph. Upgrade when a new one ships. Move on.

In a multi-agent system that advice stops working. We run seven specialized agents in production. Only one of them uses our fine-tuned Qwen 14B. The rest stay on a frontier model. This is not a cost decision, and it is not a "we couldn't afford to fine-tune the others" decision either. It is a deliberate choice that has to be made per agent, because the agents are doing different jobs and those jobs respond to fine-tuning differently.

What I want to walk through here is the decision for each agent, the one agent where the call is still open, and the piece of the architecture that matters more than any of those individual choices: the fallback path that makes running a fine-tuned model in production actually tolerable.

The seven agents and what each one does

Our graph has seven specialized agents, and they are not interchangeable:

  • Orchestrator — central traffic controller. Classifies the conversation phase, validates input, detects language, decides which specialized agent handles the next turn.
  • Discovery — runs when pain points are still unknown. Asks one focused question per turn. Prioritizes buying signals like competitor mentions, timeline hints, budget language, social proof questions.
  • Researcher — vector search over the customer's knowledge base. Runs on the entry node in parallel with the orchestrator on every turn.
  • Support — existing-customer troubleshooting. Gated by whether the KB actually has an answer.
  • Technical consultant — detailed solution design. Multi-entity questions, technical doc lookups, implementation guidance.
  • Adaptive response — handles repeated questions. Reads the full history, finds a different angle from the KB, refuses to rephrase the same facts.
  • Content generator — creates visuals (flowcharts, one-pagers, diagrams). Different modality, runs on a separate image model.

If we picked one model for all of them we would be optimizing for the average job. Which is a way of saying we would be optimizing for none of them.

Where the fine-tuning payoff actually lives

Fine-tuning wins when three things line up: the task is repetitive, the output format is constrained, and you have a clear success signal you can train against. If any of the three is missing, you are paying for a training loop that will not give you a material quality improvement over a well-prompted frontier model.

Our discovery agent hits all three. The output is always one focused question. The format is stable — short, specific, ends on a question, never stacks multiple questions together, pulls from a finite taxonomy of pain-point dimensions. And the success signal is legible: did the buyer answer, did the answer advance qualification, did we learn something we did not know before? That's a trainable signal.

So we built a fine-tuned Qwen 14B with LoRA for that one agent. Internally we call it Navon. It streams tokens to the WebSocket in the same shape as the OpenAI streaming interface. It runs with a 3600-token input budget and sees the last ten user-and-assistant turns of conversation history. On the specific job of "ask the next discovery question," it is better — faster tail latency, more consistent format, fewer of the generic "could you tell me more about your business?" questions that frontier models love to fall back on when they are not sure what to do.

That is the entire argument for fine-tuning discovery. Not "small model beats big model." Not "we built it so we have to use it." Discovery is the one job on our graph where the three conditions actually line up.

The four agents we kept on frontier models, and why

Orchestrator. This one is not close. The orchestrator reads config, conversation phase, language, KB availability, and user intent simultaneously, then decides who handles the turn. The cost of a bad routing decision is enormous — send the user to the technical consultant when the KB does not have the answer, and you get a confidently wrong reply under your name. Orchestration is reasoning-heavy, the output is structured but the decision space is wide, and small errors cascade through every downstream agent. Frontier model, full stop.

Technical consultant. Multi-entity combinations (platform X with feature Y, integration A plus config B), technical doc lookups, long-tail questions that never show up twice the same way. Fine-tuning underperformed here because the training distribution was never diverse enough — every conversation hit a new combination of entities. The frontier model's breadth matters more than any domain adaptation we could teach.

Adaptive response. The job is subtle: detect that a question is effectively a repeat, read through the full history of what has already been said, and find a genuinely different angle from the KB. It requires holding a long context and reasoning over it — exactly the thing fine-tuned small models are weakest at. The frontier model is not optional here.

Support. This one we think about most often. The answers are format-constrained, the success signal is clear, the output is repetitive — on paper, a candidate for fine-tuning. The reason we have not is data volume. Our support corpus is thinner than our discovery corpus, and the cost of a bad support answer (a customer getting wrong troubleshooting advice) is higher than the cost of a slightly less polished discovery question. We will revisit this in the next six months. For now, frontier model with careful prompting wins the expected-value calculation.

The fallback architecture that makes all this work

Everything I have said so far is the part of the story you can argue about. Here is the part that is not optional: you cannot run a fine-tuned model in production unless you have a good answer for what happens when it fails.

Navon sits behind a provider abstraction. Every request to it has a built-in fallback: if the inference call errors out — network blip, GPU timeout, a malformed response the partial-JSON parser cannot recover from — the provider silently falls back to the default frontier model. The user sees one response; they never know which path served it.

The details that matter:

  • No retry storm. If Navon fails once on a given turn, we do not retry it. We fall through to the frontier model immediately. One request per turn, one fallback per turn. Hammering a hurting GPU with retries makes outages worse, and the user has a latency budget you cannot blow on a retry loop.
  • No retry queue for load-once failures. In local-inference mode the model loads on the first call. If that load fails — bad checkpoint, quantization mismatch, memory pressure — it fails permanently for that process. We do not keep trying in the background. The next process restart gets to try again; in the meantime, every request goes to the frontier model. Simple, boring, and it prevents the kind of resource-exhaustion death spiral that broke us once.
  • Streaming with partial JSON parsing. Navon streams tokens in a constrained JSON shape. The client extracts the html_response field from incomplete output mid-generation, so users see progressive text even before the model finishes writing. If the stream dies halfway through, the partial text we already showed is preserved and the fallback fires for the next turn, not this one.

This architecture is why I am comfortable running a fine-tuned model on a live sales conversation. Not because Navon is bulletproof — nothing is — but because a Navon failure degrades to "the frontier model answers this turn," not "the agent disappears." The worst case is identical to the default everyone else runs.

How we actually make the call

When we sit down to decide whether to fine-tune a new agent, the conversation is four questions:

1. Is the task format-constrained? If the output is a question, a structured decision, a JSON payload with a stable schema — fine-tuning helps. If the output is open-ended prose, reasoning chains, or long-form explanations, fine-tuning does not help enough to justify the operational load.

2. Do we have a clear success signal we can train against? "Did the buyer respond" is a signal. "Did the technical consultant give a good answer" is a rubric, not a signal, and it cannot be trained against without a judge layer that is itself the hard part. Fine-tuning needs the former.

3. Is there 20,000+ labeled examples of this specific task in our data? This is the threshold we found empirically where domain adaptation starts pulling ahead of careful prompting. Below that, you are better off investing in a longer system prompt and a few-shot set.

4. Is the cost of a bad output low enough that a fallback can catch it? If the answer is yes — discovery, classification, format normalization — the fallback pattern makes fine-tuning safe. If it is no (legal text, pricing commitments, numbers going into a downstream invoice), the safety calculus is different and we keep it on the frontier model regardless.

Three yeses and one "safe" and we fine-tune. Anything else and we don't.

The one thing I would not skip

If you take nothing else from this: fine-tuning a model in production without a silent fallback to a frontier model is not an engineering decision, it is a bet. You are betting that your inference stack never fails, your training distribution never drifts, and your deployment pipeline never ships a bad checkpoint. Every one of those bets eventually loses.

The fallback is not insurance. It is the architecture. The fine-tuned model is an optimization on top.

Which agent in your graph would fail the safest if its model disappeared for a day? That's probably the only one you should fine-tune first.

Newsletter

Stay ahead of the AI sales and marketing curve with our exclusive newsletter directly in your inbox. All insights, no fluff.