Chapter 2

The Restaurant at the End of the Pipeline

55 min read

The History of every major Galactic Civilization tends to pass through three distinct and recognizable phases: Survival, Inquiry, and Sophistication. Otherwise known as the How, Why, and Where-do-we-eat phases. DSPy programs follow a similar trajectory.

— Douglas Adams (adapted)

In Chapter 1, you learned how to ask an LLM a single question with dspy.Predict. That's useful — like knowing how to boil water is useful. But nobody opens a restaurant to serve boiled water.

Real AI applications are pipelines. A lead scoring system doesn't just "analyze a lead" — it researches the company, classifies the prospect's intent, scores their fit, and generates personalized outreach. A content moderation system doesn't just "check for toxicity" — it classifies the content type, evaluates severity, checks context, and decides on an action. Each step feeds the next. Each step might need different reasoning strategies or even different models.

In this chapter, we're building a Lead Intelligence Engine — a multi-step pipeline that takes a company name and prospect information, then runs a research-classify-score-outreach sequence. Along the way, you'll learn how DSPy modules compose like LEGO bricks, how to use Pydantic models for type-safe outputs, how to assign different LMs to different steps, and why adapters matter more than you think.

Modules: The Building Blocks

You met dspy.Module briefly in Chapter 1 when we wrapped our Predict in a class. Now it's time to understand why that pattern is the foundation of everything in DSPy.

A Module is just a Python class with two responsibilities:

__init__: Declare the prediction steps (Predict, ChainOfThought, or other Modules)
forward: Wire them together with your logic

That's the entire contract. Here's the minimal version:

class Summarizer(dspy.Module):
    def __init__(self):
        super().__init__()
        self.summarize = dspy.Predict("text -> summary")

    def forward(self, text):
        return self.summarize(text=text)

And here's why this simple pattern is powerful: Modules can contain other Modules. This means you can build complex pipelines from simple, testable pieces:

class ResearchAndSummarize(dspy.Module):
    def __init__(self):
        super().__init__()
        self.research = dspy.ChainOfThought("topic -> findings")
        self.summarize = dspy.Predict("findings -> summary")

    def forward(self, topic):
        findings = self.research(topic=topic)
        return self.summarize(findings=findings.findings)

Notice the pattern: the output of self.research (accessed via findings.findings) becomes the input to self.summarize. This is how you chain steps. It's just Python — function calls, attribute access, and control flow. No special DSPy syntax for chaining. No graph definitions. No YAML configuration. Just code.

The Lead Intelligence Engine: Architecture

Here's what we're building — a four-step pipeline that turns a company name and prospect info into actionable sales intelligence. Press RUN PIPELINE to watch each module fire in sequence:

InputAcme Corp — VP of Engineering, 500 employees, Series B

ResearchCompany

ChainOfThought

ClassifyIntent

Predict

ScoreFit

ChainOfThought

GenerateOutreach

Predict

Output

Research Company

Gather intel

Input: company name

Classify Intent

Buyer signals

Uses: research output

Score Lead

0-100 rating

Uses: classification

Generate Outreach

Personalized email

Uses: everything above

Each step is its own Module with its own Signature. The outer pipeline Module wires them together. Let's build it step by step.

Step 1: Signatures with Pydantic Models

In Chapter 1, we used simple types — str, int, list[str]. For a real pipeline, you want structured data flowing between steps. This is where Pydantic models shine.

DSPy natively supports Pydantic BaseModel subclasses as output types. When you use them, DSPy automatically generates JSON schema instructions for the LLM and validates the response:

from pydantic import BaseModel, Field
from typing import Literal


class CompanyIntel(BaseModel):
    """Structured intelligence about a company."""
    name: str = Field(description="Official company name")
    industry: str = Field(description="Primary industry or sector")
    size_estimate: Literal["startup", "smb", "mid-market", "enterprise"] = Field(
        description="Estimated company size category"
    )
    recent_developments: list[str] = Field(
        description="Notable recent news, launches, or changes"
    )
    potential_pain_points: list[str] = Field(
        description="Business challenges this company likely faces"
    )
    tech_stack_signals: list[str] = Field(
        description="Any signals about their technology choices"
    )

A few things to notice here:

Literal types act as constraints. When you declare size_estimate: Literal["startup", "smb", "mid-market", "enterprise"], DSPy tells the LLM it must pick from those exact values. Pydantic validates the response. If the LLM returns "small business" instead of "smb," Pydantic catches it and DSPy retries. This is how you get reliable, constrained outputs without writing prompt hacks like "ONLY respond with one of these values."

Field(description=...) feeds the LLM. Just like dspy.OutputField(desc=...), Pydantic's Field(description=...) becomes part of the prompt. Every description you write is an instruction to the model.

🚨 Gotcha: If you've seen older DSPy tutorials mention dspy.Assert or dspy.Suggest for output constraints, note that these are not available in DSPy 3.1.x. They were part of an earlier API. The modern approach is to use Pydantic models with Literal types, Field constraints, and validators. It's actually cleaner — you get validation, type safety, and IDE autocomplete for free.

Now let's define all four Signatures for our pipeline:

import dspy
from pydantic import BaseModel, Field
from typing import Literal


# --- Pydantic models for structured data flow ---

class CompanyIntel(BaseModel):
    """Structured intelligence about a company."""
    name: str = Field(description="Official company name")
    industry: str = Field(description="Primary industry or sector")
    size_estimate: Literal["startup", "smb", "mid-market", "enterprise"] = Field(
        description="Estimated company size category"
    )
    recent_developments: list[str] = Field(
        description="Notable recent news, launches, or changes"
    )
    potential_pain_points: list[str] = Field(
        description="Business challenges this company likely faces"
    )
    tech_stack_signals: list[str] = Field(
        description="Any signals about their technology choices"
    )


class IntentClassification(BaseModel):
    """Classification of a prospect's buying intent."""
    intent_level: Literal["hot", "warm", "cold", "unknown"] = Field(
        description="Assessed buying intent level"
    )
    intent_signals: list[str] = Field(
        description="Specific signals that indicate this intent level"
    )
    buyer_persona: str = Field(
        description="Likely buyer persona (e.g., 'Technical Decision Maker', "
                    "'Budget Holder', 'End User', 'Champion')"
    )

🔒

The rest of this chapter is for paid readers.

Unlock all 7 chapters with a one-time purchase. No account needed upfront — just pay and get instant access.

←

PreviousDon't Panic

NextLife, the Universe, and Retrieval

→

Chapter 2

The Restaurant at the End of the Pipeline

55 min read

Chapter code

— Douglas Adams (adapted)

In Chapter 1, you learned how to ask an LLM a single question with dspy.Predict. That's useful — like knowing how to boil water is useful. But nobody opens a restaurant to serve boiled water.

Modules: The Building Blocks

You met dspy.Module briefly in Chapter 1 when we wrapped our Predict in a class. Now it's time to understand why that pattern is the foundation of everything in DSPy.

A Module is just a Python class with two responsibilities:

__init__: Declare the prediction steps (Predict, ChainOfThought, or other Modules)
forward: Wire them together with your logic

That's the entire contract. Here's the minimal version:

class Summarizer(dspy.Module):
    def __init__(self):
        super().__init__()
        self.summarize = dspy.Predict("text -> summary")

    def forward(self, text):
        return self.summarize(text=text)

And here's why this simple pattern is powerful: Modules can contain other Modules. This means you can build complex pipelines from simple, testable pieces:

class ResearchAndSummarize(dspy.Module):
    def __init__(self):
        super().__init__()
        self.research = dspy.ChainOfThought("topic -> findings")
        self.summarize = dspy.Predict("findings -> summary")

    def forward(self, topic):
        findings = self.research(topic=topic)
        return self.summarize(findings=findings.findings)

The Lead Intelligence Engine: Architecture

Here's what we're building — a four-step pipeline that turns a company name and prospect info into actionable sales intelligence. Press RUN PIPELINE to watch each module fire in sequence:

InputAcme Corp — VP of Engineering, 500 employees, Series B

ResearchCompany

ChainOfThought

ClassifyIntent

Predict

ScoreFit

ChainOfThought

GenerateOutreach

Predict

Output

Research Company

Gather intel

Input: company name

Classify Intent

Buyer signals

Uses: research output

Score Lead

0-100 rating

Uses: classification

Generate Outreach

Personalized email

Uses: everything above

Each step is its own Module with its own Signature. The outer pipeline Module wires them together. Let's build it step by step.

Step 1: Signatures with Pydantic Models

In Chapter 1, we used simple types — str, int, list[str]. For a real pipeline, you want structured data flowing between steps. This is where Pydantic models shine.

DSPy natively supports Pydantic BaseModel subclasses as output types. When you use them, DSPy automatically generates JSON schema instructions for the LLM and validates the response:

from pydantic import BaseModel, Field
from typing import Literal


class CompanyIntel(BaseModel):
    """Structured intelligence about a company."""
    name: str = Field(description="Official company name")
    industry: str = Field(description="Primary industry or sector")
    size_estimate: Literal["startup", "smb", "mid-market", "enterprise"] = Field(
        description="Estimated company size category"
    )
    recent_developments: list[str] = Field(
        description="Notable recent news, launches, or changes"
    )
    potential_pain_points: list[str] = Field(
        description="Business challenges this company likely faces"
    )
    tech_stack_signals: list[str] = Field(
        description="Any signals about their technology choices"
    )

A few things to notice here:

🚨 Gotcha: If you've seen older DSPy tutorials mention dspy.Assert or dspy.Suggest for output constraints, note that these are not available in DSPy 3.1.x. They were part of an earlier API. The modern approach is to use Pydantic models with Literal types, Field constraints, and validators. It's actually cleaner — you get validation, type safety, and IDE autocomplete for free.

Now let's define all four Signatures for our pipeline:

import dspy
from pydantic import BaseModel, Field
from typing import Literal


# --- Pydantic models for structured data flow ---

class CompanyIntel(BaseModel):
    """Structured intelligence about a company."""
    name: str = Field(description="Official company name")
    industry: str = Field(description="Primary industry or sector")
    size_estimate: Literal["startup", "smb", "mid-market", "enterprise"] = Field(
        description="Estimated company size category"
    )
    recent_developments: list[str] = Field(
        description="Notable recent news, launches, or changes"
    )
    potential_pain_points: list[str] = Field(
        description="Business challenges this company likely faces"
    )
    tech_stack_signals: list[str] = Field(
        description="Any signals about their technology choices"
    )


class IntentClassification(BaseModel):
    """Classification of a prospect's buying intent."""
    intent_level: Literal["hot", "warm", "cold", "unknown"] = Field(
        description="Assessed buying intent level"
    )
    intent_signals: list[str] = Field(
        description="Specific signals that indicate this intent level"
    )
    buyer_persona: str = Field(
        description="Likely buyer persona (e.g., 'Technical Decision Maker', "
                    "'Budget Holder', 'End User', 'Champion')"
    )

🔒

The rest of this chapter is for paid readers.

Unlock all 7 chapters with a one-time purchase. No account needed upfront — just pay and get instant access.

←

PreviousDon't Panic

NextLife, the Universe, and Retrieval

→