Why Building a GTM AI Agent Is Harder Than You Think

A red, orange and blue "S" - Salespeak Images

Why Building a GTM AI Agent Is Harder Than You Think

Omer Gotlieb Cofounder and CEO - Salespeak Images
Omer Gotlieb
7 min read
March 9, 2026

Everyone wants a GTM AI agent.

Few understand what it actually takes to build one that works.

LangChain recently shared how they built their GTM agent — and the details confirm what we've been seeing: this is a real engineering problem. Not a weekend hackathon. Not a wrapper around an LLM.

Their results? Lead-to-qualified-opportunity conversion up 250%. Reps reclaiming 40 hours per month each. 86% weekly active usage. But those numbers came from treating this as serious infrastructure.

Here's what makes it so hard.

The Research Problem Is Deceptively Complex

The pitch sounds simple: automate the 15 minutes a rep spends toggling between Salesforce, Gong, LinkedIn, and a company website before writing an email.

In practice? You're building a system that has to:

  • Pull from 6+ data sources with different APIs, rate limits, and data shapes
  • Reason across all of them to decide whether to reach out at all
  • Adapt its output based on the state of each relationship

LangChain found that inputs are "inherently spiky" — meeting data, CRM history, and web research vary wildly in size and structure. A single LLM call can't handle this. They needed multi-step orchestration with a virtual filesystem just to manage the data.

Anyone who tells you "just connect GPT to Salesforce" is underselling the problem by an order of magnitude.

The "Do Not Send" Problem Is the Real Product

The hardest part isn't writing the email.

It's knowing when not to.

LangChain's agent checks whether someone already reached out. Whether the contact just filed a support ticket. Whether the timing is wrong. They describe the agent as "programmed to be cautious."

This is the part most teams skip — and the part that kills trust fastest. One bad automated email to a contact your colleague spoke to yesterday, and reps stop using the tool. Permanently.

The do-not-send logic is table stakes. Without it, you don't have a product. You have a liability.

Human-in-the-Loop Creates an Engineering Tax

LangChain was explicit: nothing sends without rep approval. Drafts route to Slack with send/edit/cancel buttons and full reasoning. One poorly timed email can undo months of relationship-building.

But human-in-the-loop adds real complexity:

  • You need an approval UX
  • You need SLA logic (they auto-send silver leads after 48 hours if no rep responds)
  • You need to track every rep action for feedback and measurement
  • You need explainability — reps have to see why the agent chose a particular angle

HITL isn't a checkbox. It's a full product surface with its own design, edge cases, and infrastructure.

Personalization Requires Memory — and Memory Is Its Own System

When a rep edits a draft, LangChain's system diffs the original against the revision. It extracts style preferences. Stores them per rep. Future runs read those preferences before drafting.

A weekly cron compacts memories to prevent bloat.

This is a separate system — storage, diffing, compaction, retrieval — bolted onto the agent. Without it, every draft feels generic. With it, the agent improves over time.

"Learning from rep feedback" sounds like a feature bullet point. It's actually a persistent memory system with its own data model and maintenance.

Evals Have to Come First, Not After

LangChain's most counterintuitive move: they define success criteria and build eval scenarios before writing production code.

Their eval suite includes:

  • Rule-based checks — right tools, right order, no duplicate drafts
  • LLM-as-judge scoring on tone and formatting
  • Rep action tracking tied directly to traces
  • CI integration so regressions get caught automatically

They mock external APIs for controlled testing. They treat "unexplained drift in agent behavior" as a bug.

Without evals from day one, you're flying blind. Every prompt change, model swap, or data source update can silently degrade quality. You won't know until reps stop trusting the drafts.

Scaling Requires Subagent Architecture

For account intelligence — monitoring 50 to 100+ accounts per rep — LangChain uses compiled subagents. Lightweight, tool-constrained agents with structured output schemas. One per account, each isolated, each returning predictable data.

A single monolithic agent processing 100 accounts sequentially? Too slow. Too fragile.

The architecture that works for one lead breaks down at portfolio scale. Parallel subagent orchestration isn't a nice-to-have. It's a requirement.

The Surprise: Organic Adoption You Didn't Plan For

LangChain built the agent for SDRs.

It spread to engineers checking product usage without SQL. Customer success pulling support history before renewals. AEs summarizing Gong transcripts before meetings.

None of those workflows were designed. People found the path of least resistance because the agent already had access to the data they needed.

Connect the agent to your systems of record from the start, and the value compounds in ways you can't predict. But it also means the agent needs to handle users you never designed for.

What This Means for GTM Teams

A GTM AI agent is not a chatbot with extra steps.

It's a distributed system. Memory. Orchestration. Evaluation. Human interaction layers. All of it has to work together — reliably, at scale, without embarrassing your brand.

The teams that win will treat this as the infrastructure challenge it is. Not ship a demo and call it done.


At SalesPeak, we've been building at this exact intersection — AI that engages buyers at peak interest, understands full conversation context, and knows when to act, when to wait, and when to stay quiet.

If you're thinking about how AI fits into your GTM motion, let's talk. We'll show you what we've built — and what we've learned the hard way.

No items found.