Chapter 6

Mostly Harmless (in Production)

60 min read

It is a well-known fact that those people who most want to rule people are, ipso facto, those least suited to do it. Similarly, those DSPy programs that work perfectly in your notebook are the ones least prepared for production traffic.

— Douglas Adams (adapted)

There's a special kind of confidence that comes from watching your DSPy pipeline ace every test case in your notebook. The structured outputs parse perfectly. The chain-of-thought reasoning is eloquent. The metrics look great. You push to production with the swagger of someone who has clearly figured it all out.

Then traffic hits.

The first user sends a 4,000-word manifesto about their cat's dietary preferences. Your context window explodes. The second user submits the same spam message 200 times in a row, and you watch your API bill climb in real-time. The third user sends content in a language your test suite never considered, and the LLM returns a response that makes your Pydantic parser weep.

This is the chapter that stands between your brilliant prototype and a service that survives contact with real users. We're going to take a straightforward DSPy pipeline — a content moderation system — and systematically armor it with every production feature DSPy offers: caching, cost tracking, observability callbacks, fallback chains, streaming, async processing, batch handling, and deployment behind a FastAPI service.

None of this is glamorous work. It's the engineering equivalent of wearing a seatbelt. But it's the difference between "works on my machine" and "handles 10,000 requests a day without waking me up at 3 AM."

In this chapter, we build a Production-Ready Content Moderation Pipeline that:

Caches intelligently — never pays for the same LLM call twice
Tracks costs — knows exactly what it's spending, per-request and in aggregate
Logs everything — via a custom callback system wired to your observability stack
Falls back gracefully — tries Model A, then Model B, instead of returning a 500
Streams responses — for real-time UI feedback
Handles concurrency — async and batch processing for throughput
Saves and loads state — deploy optimized modules like real artifacts
Runs behind FastAPI — with health checks, error handling, and SSE streaming

Project Setup

mkdir ch06_production && cd ch06_production
poetry init --name ch06-production --python ">=3.10,<3.15" --no-interaction

# pyproject.toml
[tool.poetry]
name = "ch06-production"
version = "0.1.0"
description = "Chapter 6: Mostly Harmless (in Production)"
authors = ["Your Name <you@example.com>"]

[tool.poetry.dependencies]
python = ">=3.10,<3.15"
dspy = ">=3.1.3,<4.0.0"
python-dotenv = ">=1.2.2,<2.0.0"
fastapi = ">=0.115.0,<1.0.0"
uvicorn = ">=0.34.0,<1.0.0"

[build-system]
requires = ["poetry-core>=2.0.0,<3.0.0"]
build-backend = "poetry.core.masonry.api"

poetry lock && poetry install && poetry shell

Your .env file:

LLM_API_KEY=your-anthropic-api-key-here
ANTHROPIC_API_KEY=your-anthropic-api-key-here

Why both keys? LLM_API_KEY is what we use in our code. ANTHROPIC_API_KEY is what LiteLLM (DSPy's backend) looks for automatically. Belt and suspenders.

The Base Pipeline: Content Moderation

Before we bolt on production armor, we need a pipeline worth protecting. Content moderation is ideal for this chapter because it's a real-world problem with clear requirements: take user-generated content, classify it, and decide whether to approve, flag, or reject it.

from pydantic import BaseModel, Field
from typing import Literal
import dspy

class ModerationDecision(BaseModel):
    """Structured output for a moderation decision."""
    category: Literal[
        "safe", "spam", "toxic", "misinformation",
        "adult", "violence", "self_harm"
    ]
    confidence: float = Field(
        ge=0.0, le=1.0,
        description="Confidence score from 0 to 1"
    )
    action: Literal["approve", "flag_for_review", "reject"]
    explanation: str = Field(
        description="Brief explanation of the moderation decision"
    )

The ModerationDecision Pydantic model does three things for us. First, it constrains the category to exactly seven values — the LLM can't invent new ones. Second, it enforces that confidence is a float between 0 and 1. Third, it requires an explanation, which is critical for audit trails in production.

Now the signature and module:

class ModerateContent(dspy.Signature):
    """You are a content moderator for a social platform. Analyze the given
    user-generated content and determine whether it should be approved,
    flagged for human review, or rejected. Be fair and avoid over-censoring
    legitimate speech. Only reject content that clearly violates policies."""

    content: str = dspy.InputField(
        desc="The user-generated content to moderate"
    )
    context: str = dspy.InputField(
        desc="Additional context about where this content appears",
        default="general social media post",
    )
    decision: ModerationDecision = dspy.OutputField(
        desc="The structured moderation decision"
    )


class ContentModerator(dspy.Module):
    def __init__(self):
        self.moderate = dspy.ChainOfThought(ModerateContent)

    def forward(self, content, context="general social media post"):
        result = self.moderate(content=content, context=context)
        return dspy.Prediction(decision=result.decision)

🔒

The rest of this chapter is for paid readers.

Unlock all 7 chapters with a one-time purchase. No account needed upfront — just pay and get instant access.

←

PreviousSo Long, and Thanks for All the Prompts

NextThe Answer Is 42 (Tokens)

→

Project Setup

mkdir ch06_production && cd ch06_production
poetry init --name ch06-production --python ">=3.10,<3.15" --no-interaction

# pyproject.toml
[tool.poetry]
name = "ch06-production"
version = "0.1.0"
description = "Chapter 6: Mostly Harmless (in Production)"
authors = ["Your Name <you@example.com>"]

[tool.poetry.dependencies]
python = ">=3.10,<3.15"
dspy = ">=3.1.3,<4.0.0"
python-dotenv = ">=1.2.2,<2.0.0"
fastapi = ">=0.115.0,<1.0.0"
uvicorn = ">=0.34.0,<1.0.0"

[build-system]
requires = ["poetry-core>=2.0.0,<3.0.0"]
build-backend = "poetry.core.masonry.api"

poetry lock && poetry install && poetry shell

Your .env file:

LLM_API_KEY=your-anthropic-api-key-here
ANTHROPIC_API_KEY=your-anthropic-api-key-here

Why both keys? LLM_API_KEY is what we use in our code. ANTHROPIC_API_KEY is what LiteLLM (DSPy's backend) looks for automatically. Belt and suspenders.

The Base Pipeline: Content Moderation

from pydantic import BaseModel, Field
from typing import Literal
import dspy

class ModerationDecision(BaseModel):
    """Structured output for a moderation decision."""
    category: Literal[
        "safe", "spam", "toxic", "misinformation",
        "adult", "violence", "self_harm"
    ]
    confidence: float = Field(
        ge=0.0, le=1.0,
        description="Confidence score from 0 to 1"
    )
    action: Literal["approve", "flag_for_review", "reject"]
    explanation: str = Field(
        description="Brief explanation of the moderation decision"
    )

Now the signature and module:

class ModerateContent(dspy.Signature):
    """You are a content moderator for a social platform. Analyze the given
    user-generated content and determine whether it should be approved,
    flagged for human review, or rejected. Be fair and avoid over-censoring
    legitimate speech. Only reject content that clearly violates policies."""

    content: str = dspy.InputField(
        desc="The user-generated content to moderate"
    )
    context: str = dspy.InputField(
        desc="Additional context about where this content appears",
        default="general social media post",
    )
    decision: ModerationDecision = dspy.OutputField(
        desc="The structured moderation decision"
    )


class ContentModerator(dspy.Module):
    def __init__(self):
        self.moderate = dspy.ChainOfThought(ModerateContent)

    def forward(self, content, context="general social media post"):
        result = self.moderate(content=content, context=context)
        return dspy.Prediction(decision=result.decision)