Setting Up
import dspy
# Configure a language model
lm = dspy.LM(
"anthropic/claude-sonnet-4-6", # provider/model format (via LiteLLM)
api_key="sk-...", # or set ANTHROPIC_API_KEY env var
temperature=0.7, # 0.0 = deterministic, 1.0+ = creative
max_tokens=2048, # max output tokens
)
# Set as global default
dspy.configure(lm=lm)
# Per-request override (thread-safe)
with dspy.context(lm=other_lm):
result = module(inputs)Common model strings: anthropic/claude-sonnet-4-6 anthropic/claude-haiku-4-5-20251001 anthropic/claude-opus-4-6 openai/gpt-5.4 openai/gpt-5.4-mini
Defining What You Want: Signatures
# Inline signature (quick and simple)
predictor = dspy.Predict("question -> answer")
predictor = dspy.ChainOfThought("context, question -> answer, confidence")
# Class-based signature (production-grade, with types and descriptions)
class AnalyzeReview(dspy.Signature):
"""Analyze a product review and extract structured insights."""
review_text: str = dspy.InputField(desc="The product review to analyze")
category: str = dspy.InputField(desc="Product category for context")
sentiment: Literal["positive", "negative", "neutral", "mixed"] = dspy.OutputField()
analysis: MyPydanticModel = dspy.OutputField(desc="Structured analysis")sentiment_label gives better results than output. Docstrings become system instructions. Use Pydantic models for complex structured outputs.Prediction Modules
| Module | What It Does | When to Use |
|---|---|---|
| dspy.Predict | Direct inference, no reasoning | Simple extraction, classification |
| dspy.ChainOfThought | Adds a rationale field for step-by-step reasoning | Most tasks — the default workhorse |
| dspy.MultiChainComparison | Generates M attempts, compares, picks best | Nuanced tasks where quality matters more than cost |
| dspy.Reasoning | Captures extended thinking (str-like output field) | With reasoning models (o1, o3); use as output field type |
# Predict — straightforward
result = dspy.Predict("question -> answer")(question="What is DSPy?")
# ChainOfThought — with reasoning
result = dspy.ChainOfThought("question -> answer")(question="What is DSPy?")
print(result.rationale) # The model's reasoning steps
# MultiChainComparison — ensemble M attempts
mcc = dspy.MultiChainComparison("question -> answer", M=3, temperature=0.7)
# Reasoning — extended thinking (as output field type)
class DeepAnalysis(dspy.Signature):
question: str = dspy.InputField()
reasoning: dspy.Reasoning = dspy.OutputField() # str-like, captures thinking
answer: str = dspy.OutputField()Building Programs: Modules
class MyPipeline(dspy.Module):
def __init__(self):
# Declare sub-modules here — DSPy tracks them as parameters
self.step1 = dspy.ChainOfThought(Step1Signature)
self.step2 = dspy.Predict(Step2Signature)
def forward(self, **inputs):
# Your logic — regular Python, call sub-modules like functions
result1 = self.step1(field=inputs["field"])
result2 = self.step2(context=result1.output)
return dspy.Prediction(final=result2.answer)
# Use it
pipeline = MyPipeline()
output = pipeline(field="some input")
print(output.final)Composition patterns: Sequential (A → B → C), Branching (if/else on intermediate results), Parallel (dspy.Parallel), Fallback (try/except with dspy.context).
Agents and Tools
# Wrap any function as a tool
def search_web(query: str) -> str:
"""Search the web for information."""
return requests.get(f"https://api.example.com/search?q={query}").text
tool = dspy.Tool(search_web)
# ReAct — reasoning + acting loop
agent = dspy.ReAct(
"question -> answer",
tools=[tool],
max_iters=5,
)
result = agent(question="What's the latest on DSPy?")
# ProgramOfThought — generates code to solve problems
pot = dspy.ProgramOfThought("question -> answer")
# CodeAct — combines tools + code generation
code_agent = dspy.CodeAct("question -> answer", tools=[tool])Multimodal Inputs
from PIL import Image as PILImage
# Images — from PIL, URL, file path, or bytes
img = dspy.Image(PILImage.new("RGB", (100, 100), "red")) # PIL
img = dspy.Image("https://example.com/photo.jpg") # URL
img = dspy.Image("/path/to/photo.png") # File
# Audio and documents
audio = dspy.Audio.from_file("interview.wav")
doc = dspy.File.from_path("report.pdf")
# Use in signatures
class AnalyzeImage(dspy.Signature):
image: dspy.Image = dspy.InputField(desc="Product photo")
caption: str = dspy.InputField()
analysis: str = dspy.OutputField()Retrieval (RAG)
# Set up an embedder
embedder = dspy.Embedder("openai/text-embedding-3-small", batch_size=200)
# Build a retrieval module
retriever = dspy.Embeddings(embedder=embedder, corpus=chunks, k=5)
# Use it
results = retriever("What does dspy.Module do?")
# Save and reload the index
retriever.save("my_index")
loaded_retriever = dspy.Embeddings.from_saved("my_index", embedder=embedder)Optimizers
The magic of DSPy. These tune your program's prompts and demos automatically.
| Optimizer | What It Tunes | Best For | Cost |
|---|---|---|---|
| LabeledFewShot | Demo selection | When you have labeled examples and want a quick win | $ |
| BootstrapFewShot | Self-generated demos | Bootstrapping good examples from a teacher | $$ |
| BootstrapFewShotWithRandomSearch | Demos + hyperparams | More thorough than BootstrapFewShot | $$$ |
| MIPROv2 | Instructions + demos | Best general-purpose optimizer | $$–$$$$ |
| SIMBA | Instructions + demos | Memory-efficient alternative to MIPROv2 | $$$ |
| BootstrapFinetune | Model weights | When you want to distill into a smaller model | $$$$$ |
| GRPO | Model weights (RL) | When you have a reward function but no labels | $$$$$ |
| BetterTogether | Prompts + weights | Combined optimization (experimental) | $$$$$ |
| GEPA | Instructions (evolutionary) | Sophisticated prompt optimization (experimental) | $$$$ |
from dspy.teleprompt import BootstrapFewShot, MIPROv2
# Define a metric
def my_metric(example, prediction, trace=None):
return prediction.sentiment == example.sentiment
# Optimize
optimizer = MIPROv2(metric=my_metric, auto="light")
optimized = optimizer.compile(
MyPipeline(),
trainset=train_examples,
)
# Save the optimized program
optimized.save("optimized_pipeline.json")LabeledFewShot. If you need better → BootstrapFewShot. Still not enough → MIPROv2(auto="light"). For max quality → MIPROv2(auto="heavy"). For fine-tuning → BootstrapFinetune. For RL → GRPO. For experimental cutting-edge → GEPA or BetterTogether.Evaluation
evaluator = dspy.Evaluate(
devset=test_examples,
metric=my_metric,
num_threads=4,
display_progress=True,
display_table=5, # show 5 example rows
)
score = evaluator(my_pipeline)
print(f"Accuracy: {score}%")Production Patterns
Streaming
streaming = dspy.streamify(module)
for chunk in streaming(question="Tell me about DSPy"):
if isinstance(chunk, dspy.Prediction):
final_result = chunk # Last item is the complete prediction
else:
print(chunk, end="") # Intermediate text chunksAsync
async_module = dspy.asyncify(module)
result = await async_module(question="Tell me about DSPy")Batch Processing
results = module.batch(
examples,
num_threads=8,
return_failed_examples=True,
)Cost Tracking
with dspy.track_usage() as tracker:
result = module(question="Something")
print(f"Cost: ${tracker.total_cost:.4f}")
print(f"Tokens: {tracker.tokens}")Caching
from dspy.clients import configure_cache
configure_cache(
enable_disk_cache=True,
enable_memory_cache=True,
disk_cache_dir="~/.dspy_cache",
disk_size_limit_bytes=2 * 1024**3, # 2 GB
memory_max_entries=10_000,
)Callbacks (Observability)
from dspy.utils.callback import BaseCallback
class MyLogger(BaseCallback):
def on_module_end(self, call_id, outputs, exception):
if exception:
log.error(f"Module failed: {exception}")
else:
log.info(f"Module completed: {outputs}")
dspy.configure(lm=lm, callbacks=[MyLogger()])Save / Load
# Save (must use .json or .pkl extension)
module.save("my_pipeline_v1.json")
# Load
module = MyPipeline()
module.load("my_pipeline_v1.json")Per-Request Config (Thread-Safe)
# dspy.configure() is NOT thread-safe for concurrent calls
# Use dspy.context() for per-request overrides in web servers
with dspy.context(lm=request_specific_lm):
result = module(inputs)Adapters
# Global adapter
dspy.configure(lm=lm, adapter=dspy.JSONAdapter())
# Per-request adapter
with dspy.context(adapter=dspy.XMLAdapter()):
result = module(inputs)
# Available adapters:
# dspy.ChatAdapter() — default, uses [[ ## field ## ]] delimiters
# dspy.JSONAdapter() — forces JSON output
# dspy.XMLAdapter() — forces XML tagsDebugging
# See what DSPy actually sent to the LLM
dspy.inspect_history(n=3) # Last 3 LLM calls
# Check module parameters
for name, param in module.named_parameters():
print(f"{name}: {type(param)}")Common Import Patterns
# The essentials
import dspy
from pydantic import BaseModel, Field
from typing import Literal, Optional
# Optimizers
from dspy.teleprompt import (
LabeledFewShot,
BootstrapFewShot,
BootstrapFewShotWithRandomSearch,
MIPROv2,
SIMBA,
BootstrapFinetune,
BetterTogether,
GEPA,
)
from dspy.teleprompt.grpo import GRPO # Not in __init__.py
# Callbacks and cache
from dspy.utils.callback import BaseCallback
from dspy.clients import configure_cache
# Tools for agents
# dspy.Tool(function) — wraps any callableThe Optimizer Decision Tree
The single most-asked question in the DSPy Discord: “Which optimizer should I use?” Start at the top and follow the branches.
┌─────────────────────────┐
│ Do you have labeled │
│ training examples? │
└────────┬────────┬────────┘
│ │
YES NO
│ │
▼ ▼
┌─────────────┐ ┌──────────────────┐
│ How many? │ │ Can you write a │
└──┬──────┬───┘ │ reward/metric fn? │
│ │ └────────┬──────────┘
<50 50+ │
│ │ YES
▼ ▼ ▼
┌────────┐ ┌──────────┐ ┌──────────────┐
│Labeled │ │Bootstrap │ │Do you have │
│FewShot │ │FewShot │ │fine-tuning │
│ │ │ │ │infrastructure?│
└────┬───┘ └────┬─────┘ └──┬────────┬──┘
│ │ │ │
Good enough? Good enough? YES NO
│ │ │ │
NO NO ▼ ▼
│ │ ┌──────────┐ ┌────────┐
▼ ▼ │ GRPO │ │MIPROv2 │
┌──────────────────┐ │ (RL) │ │auto= │
│ MIPROv2 │ └──────────┘ │"light" │
│ auto="light" │ └───┬────┘
└────────┬─────────┘ │
│ Good enough?
Good enough? │
│ NO
NO │
│ ▼
▼ ┌──────────┐
┌──────────────────┐ │ MIPROv2 │
│ MIPROv2 │ │ auto= │
│ auto="medium" │ │ "heavy" │
└────────┬─────────┘ └──────────┘
│
Good enough?
│
NO
│
▼
┌──────────────────┐
│ Want to tune │
│ model weights? │
└──┬────────────┬──┘
│ │
YES NO
│ │
▼ ▼
┌────────────┐ ┌──────────────┐
│Bootstrap │ │ SIMBA │
│Finetune │ │ or │
│ │ │ GEPA │
└────────────┘ │ (experimental)│
└──────────────┘Optimizer Cheat Sheet
Start here — solves 90% of cases:
LabeledFewShot
You have labeled examples and want a quick baseline. Takes seconds, costs almost nothing. Always try this first.
BootstrapFewShot
You have some labels but want DSPy to generate better demonstrations automatically. The teacher model runs your pipeline, keeps the traces that pass your metric. 5–10 minutes, low cost.
MIPROv2(auto="light")
The recommended general-purpose optimizer. Generates optimized instructions AND selects demos. Start with "light" (6 candidates), move to "medium" (12) or "heavy" (18) if needed.
When you need more:
SIMBA
Self-reflective mini-batch optimization. Good when MIPROv2's cost is too high, or when you have larger datasets.
BootstrapFinetune
When prompt optimization isn't enough and you want to fine-tune model weights. Requires a fine-tuning-capable model.
GRPO
Reinforcement learning for LMs. No labeled outputs needed — just a reward function. For advanced users with RL infrastructure.
Experimental (but powerful):
BetterTogether
Alternates between prompt optimization and fine-tuning. Strategy string controls the sequence: "p -> w -> p" means optimize prompts, fine-tune weights, optimize prompts again.
GEPA
Evolutionary prompt optimization with reflection. The most sophisticated prompt-only optimizer. Requires a reflection_lm parameter.
The Golden Rule
Start simple. Measure. Only escalate when the numbers say you need to. LabeledFewShot → BootstrapFewShot → MIPROv2(auto="light") solves 90% of real-world cases. The fancy optimizers exist for the other 10%.
Where to Go From Here
You've made it to the end. If you started at Chapter 1 and worked your way through, you've gone from “what is a Signature?” to building multimodal pipelines with model cascading, ensemble reasoning, and advanced optimizers. That's a significant journey.
Stay Current
The DSPy GitHub repository is the source of truth. Watch the releases. The official docs at dspy.ai are the best API reference. The DSPy Discord is where the community lives.
Build Something
The projects in this book are starting points, not endpoints. Extend a chapter project. Combine patterns across chapters — an agent (Ch 5) that uses RAG (Ch 3) with streaming and observability (Ch 6) composes naturally in DSPy.
The Bigger Picture
DSPy is a bet that LLM development shifts from prompting to programming. That bet is paying off. The answer might be 42, but the question — how do we build reliable, maintainable, optimizable AI systems? — is worth spending your career on.
“The ships hung in the sky in much the same way that bricks don't. Your DSPy programs, however, will soar.”
Don't Panic. And don't forget your towel.