First Block: Interview with Gabe Pereyra, Co-Founder and President of Harvey

TL;DR

Harvey’s scaling required shifting from document drafting to enterprise-grade orchestration: security, data scoping, and collaboration workflows.

Briefing Cornell Notes

Briefing

Harvey’s growth story hinges on a shift from early, high-value pilots to building the “broader machine” enterprises need: not just AI outputs, but the infrastructure, security, and collaboration workflows that let lawyers safely use models on real client matters. Gabe Pereira, co-founder and president of Harvey, describes landing unusually large seven-figure deals early for an enterprise startup—then learning that winning contracts wasn’t the same as having the operational muscle to scale. The key lesson was moving from a product that drafts or assists with legal work to an enterprise system that can orchestrate tools, manage data scope, and support long-running tasks across a firm.

Harvey began with co-pilots for lawyers and evolved into a platform for large law firms and their clients to collaborate on complex matters like transactions and litigation. The company’s origin traces to Pereira’s work on Meta’s large language model team during the release of GPT-3.5 and GPT-4-era capabilities, alongside Winston’s background as a litigation associate at a top firm. Their early traction came from identifying “customer zero” and “customer one” inside the legal ecosystem: Winston’s network and a senior technology partner at a major firm who quickly rolled the system out firmwide to thousands of lawyers after a pilot.

A recurring theme is that Harvey didn’t start as a narrow automation tool. Early demos that could draft NDAs sounded simple to lawyers, but users needed a mental model closer to “talk with this system and figure out what to do,” especially once they hit limits like missing access to case law or internal firm data. As model reasoning improved—particularly with step-change unlocks from stronger reasoning models—Harvey shifted toward agents that can chain tool calls for tasks such as legal research, reviewing emails, and synthesizing findings for litigation. Accuracy expectations also differed from consumer legal software: large law firms operate with layered review processes, similar to software engineering, where outputs are checked by multiple associates before reaching production.

Instead of relying primarily on custom model training, Pereira emphasizes that most value comes from infrastructure: an “IDE for lawyers” that connects the relevant tools and data for a specific matter (acquisition, litigation, fund formation) and constrains the agent to the correct scope. Domain improvements—post-training, retrieval-augmented generation, and other techniques—still matter, but the company’s emphasis is on orchestration and secure workflow design. He also argues that the next frontier is collaborative training between enterprises and their law firms, since client data can’t be reused broadly; the likely future is “inverted” training where legal models learn from the combined work between a specific enterprise and its counsel.

On go-to-market, the legal industry’s buyer-user split is compounded by an additional stakeholder: the law firm’s clients. Harvey’s early sales leaned on founder-led vision work with CIOs, partners, and clients to align on how teams would collaborate. Pereira’s advice to founders is to use models immediately—because the obvious applications are only the first wave—and to build domain expertise by pairing technical founders with real industry mental models. If rebuilding from scratch, he would invest earlier in foundational security and retention architecture, which became necessary as enterprise deals scaled, but could now be designed with clearer abstractions and collaboration workflows in mind.

Cornell Notes

Harvey’s early success came from deploying LLM-powered legal assistance, but scaling required a bigger shift: building the enterprise “machine” around the models—security, data scoping, collaboration workflows, and tool orchestration. Gabe Pereira credits traction to strong reasoning models that enabled agents to chain research and synthesis tasks, while also noting that large law firms tolerate imperfect drafts because multi-layer associate review functions like software testing. Most performance gains, he says, come from infrastructure that gives agents the right context (case law, emails, motions) and the right tools for each matter. Looking ahead, he expects model training to happen between an enterprise and its law firm, since client data can’t be reused generically across firms. The result is a platform approach rather than a narrow automation product.

Why did Harvey’s product need to evolve beyond “drafting NDAs” into something broader?

Early demos that could draft NDAs often landed with lawyers as “a tool to draft NDAs,” but real workflows required more than a single template. Users quickly ran into limits like missing access to case law, internal firm data, and the ability to share outputs with partners or clients. That feedback pushed the roadmap toward a general-purpose interaction model—“talk with this thing and figure out what to do”—and ultimately toward agents and workflows that operate within the correct matter context and tool access.

What changed as models improved, and how did that affect Harvey’s approach to legal work?

Stronger reasoning models created a step change for tasks that involve chaining multiple tool calls. In litigation, for example, associates must perform legal research, review emails, and synthesize findings—work that isn’t a single prompt-response. Harvey responded by building agents designed for long-running, multi-step tasks rather than only drafting documents.

How does Harvey reconcile the need for legal accuracy with imperfect model outputs?

Large law firms operate more like software organizations than consumer settings. Junior work typically goes through multiple layers of associate review, with checks that prevent mistakes from reaching final production. Pereira highlights that early traction came even when models weren’t perfect because the human review pipeline absorbed errors—so the product could focus on assisting associates and accelerating parts of the workflow.

Why does Pereira emphasize infrastructure over custom model training?

He argues that most value comes from the surrounding system: connecting the relevant tools and data for a specific client matter, constraining the agent to the correct data scope, and enabling collaboration across the firm. He compares the goal to an IDE for lawyers—an environment where agents can operate with full matter context—while noting that base model improvements are outpacing the incremental gains from post-training that previously required retraining cycles.

What does “customer training” mean in a legal context, and why can’t it be generic?

Pereira says enterprises can’t train on client data in a way that’s reusable across customers. Even within a single law firm, much of the data is actually client-specific, so a model can’t be broadly trained on “the firm’s data” without violating reuse constraints. The likely future is inverted: enterprises and their law firms build AI systems together using the combined history of work between them, creating a training signal that can’t be transferred elsewhere.

What go-to-market challenge is unique to enterprise legal, and how did Harvey address it?

In enterprise software, the buyer often isn’t the user. Legal adds another layer: the law firm’s clients are also end customers. Harvey’s early sales relied heavily on founder-led vision work with CIOs, partners, and clients to align on how collaboration would work across parties, not just how the tool would generate text.

Review Questions

What specific workflow limitations from early demos pushed Harvey toward agents and matter-scoped infrastructure?
How do multi-layer associate review processes change the accuracy requirements for AI-assisted legal drafting?
Why does Pereira believe model training will likely occur between an enterprise and its law firm rather than through generic customer fine-tuning?

Key Points

1
Harvey’s scaling required shifting from document drafting to enterprise-grade orchestration: security, data scoping, and collaboration workflows.
2
Early traction came from strong reasoning models enabling agents to chain tool calls for tasks like research, email review, and litigation synthesis.
3
Large law firms can tolerate imperfect AI outputs because layered associate review acts like a quality-control pipeline.
4
Most performance gains come from infrastructure that provides the right matter context and tool access, not from constant custom retraining.
5
Legal “training” is constrained by client data ownership, making enterprise–law-firm co-training a more realistic path than generic fine-tuning.
6
Go-to-market in legal is complicated by the buyer-user split plus the law firm’s client as an additional stakeholder, requiring founder-led alignment work.

Highlights

Harvey’s biggest leap wasn’t just better prompts—it was building the enterprise “IDE for lawyers” that connects tools and constrains agents to the correct matter data scope.

Stronger reasoning models enabled agents to handle long-running, multi-step legal tasks, turning research-and-synthesis workflows into something more automatable.

Pereira frames legal accuracy differently: large firms function like software teams with layered review, so AI assistance can improve speed without requiring perfect outputs every time.

The future of training, in his view, is “inverted”: enterprises and their specific law firms build AI systems together using their joint work history, because client data can’t be reused generically.

Topics

Harvey Platform
Enterprise Legal AI
LLM Agents
Data Security
Go-To-Market

Mentioned

Notion
Harvey
OpenAI
Meta
DeepMind
ChatGPT
Anthropic
Gabe Pereira
Winston
Sam
Brad
David Wakeling
Ash
Pat
LLM
GPT
RAG
RLHF
IDE
AI
AES