Unlock all 7 chapters with a one-time purchase. No account needed upfront — just pay and get instant access.
Deep Thought was asked for the Answer to the Ultimate Question of Life, the Universe, and Everything. It replied that it would need 7.5 million years to compute. Nobody thought to ask what would happen if you just handed it the document.
— Douglas Adams (adapted)
There's a thing that happens when you hand a language model a really large document.
You paste in 200 pages of contract, or a 50,000-line codebase, or six months of server logs. You ask your question. The model responds with something that sounds confident and detailed. You nod along. Then you check its work and realize it completely missed the critical clause on page 94. Or the security issue on line 31,847. Or the spike in error rate at 3:17 AM on the 14th.
The model wasn't lying. It was doing its best. But it was doing its best while holding an entire library in its working memory and trying to find Waldo. Researchers call this context rot — the empirical finding that LLM performance degrades significantly as context length grows, even within the claimed context window. The model processes every token, but attention degrades. The farther away the relevant information is from the question, the more likely the model is to miss it, misweight it, or quietly hallucinate something plausible instead.
It's the digital equivalent of asking someone to read an encyclopedia and then immediately answer a specific question. They technically processed all the words. They retained approximately none of the useful information. This is fine. This is normal. This is why we're here.
For a long time, the answer to this was RAG: chunk the document, embed the chunks, retrieve the relevant ones, stuff them into a smaller context. That works well enough when you know what to retrieve. But what about when you don't? What about when the answer requires comparing section 3.2 against section 11.7, or aggregating information across a thousand log lines, or finding a pattern that only emerges when you look at the whole thing?
This is the problem that Recursive Language Models (RLMs) were designed to solve. And in DSPy 3.x, dspy.RLM ships as an experimental module that brings this capability to your programs with a single class.
The insight behind RLMs comes from a 2025 Stanford/MIT paper by Zhang, Kraska, and Khattab. Instead of stuffing the entire context into the LLM's token space, RLMs treat the context as external data — stored in a Python variable, not in the prompt. The LLM sees metadata about the data (its type, length, a preview) and then writes Python code to programmatically explore it.
Think of it like the difference between reading a city's entire phone book (impossible, exhausting, mostly irrelevant) versus knowing the phone book exists and writing code to search it. The information is all there. The LLM just doesn't have to hold it in its head at once.
That code runs in a sandboxed REPL. The LLM sees the output. It writes more code. It can call llm_query(snippet) to semantically analyze specific chunks it has extracted. When it has the answer, it calls SUBMIT(answer) and execution stops.
The loop looks like this:
LLM sees: "You have a variable `contract` — str, 487,234 chars. Here's a 500-char preview..."
LLM writes: print(contract[:3000])
REPL runs it, returns output
LLM sees the output, writes: import re; matches = re.findall(r'termination.*?(\d+) days', contract, re.IGNORECASE | re.DOTALL); print(matches)
REPL runs it, returns output
LLM sees: ['30 days', '90 days', '30 days']
LLM writes: result = llm_query(f"What does this termination clause mean: {contract[45230:46800]}"); print(result)
REPL runs the sub-LLM call, returns semantic analysis
LLM writes: SUBMIT(answer="Either party may terminate with 30 days written notice, or 90 days for convenience.")
The LLM never tries to read 487,000 characters at once. It navigates the document like a detective — peek at the structure, search for what matters, zoom into specific sections with semantic understanding, aggregate the findings. The context window stays small. The quality stays high.
See it happen. Click through the four steps below to watch an RLM work through a contract — notice how only a handful of lines illuminate at each step, never the full document:
RLM REPL EXPLORER
The LLM never reads the whole document — it navigates it
THE CONTEXT
487,234 charsRun the steps →
watch the LLM
navigate the doc
Click a step to see the REPL loop in action.
Before anything else: dspy.RLM requires Deno to run the sandboxed Python interpreter (Pyodide over WASM). This is the one setup step that's different from the rest of the book.
# macOS / Linux
curl -fsSL https://deno.land/install.sh | sh
# Then restart your shell (important!)
# Verify:
deno --versionIf you're on Windows, check the Deno installation docs — the process is similar but uses a PowerShell command.
🚨 Gotcha: Restart Your Shell. After installing Deno, you must restart your shell before DSPy can find it. If you hit a
Deno not foundor cache error, confirm withwhich denothat it's on yourPATH. Users on some systems have also reported Pyodide cache discovery failures — if every REPL iteration errors, rundspy install pyodideto pre-warm the cache manually.
Once Deno is installed, no other dependencies are needed. The WASM sandbox is self-contained.
$ mkdir ch08_rlm && cd ch08_rlm
$ poetry init --name "ch08-rlm" \
--description "Chapter 8: The Infinite Improbability Context" \
--python ">=3.10,<3.15" \
--no-interaction
$ poetry add dspy python-dotenvYour .env file from Chapter 1 still applies — same ANTHROPIC_API_KEY. Nothing new to configure. Deno handles its own sandbox setup the first time an RLM runs.