Karpathy’s shift: why the highest-leverage agent use is not writing code, it’s building knowledge
TL;DR
Karpathy stopped using AI to write code and started using it to compile knowledge. His personal wiki, grown from raw research dumps, has reached about 100 linked articles and 400,000 words. The April 4, 2026 “Idea Files” post (246 upvotes, 77 comments on Hacker News) reframed agent workflow design around spec-first knowledge synthesis, not prompt-driven code generation. This post unpacks why that reframing matters and what to build with it. For the judgment skills side of the same shift, see the agentic engineering judgment post.

What did Karpathy actually stop doing?
On April 3, 2026, Karpathy posted: “A large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge.” The next day he followed up with a GitHub gist he called an Idea File: plain markdown describing the system, no code attached. The Hacker News thread reached 246 points and 77 comments in a day.
The shift is not cosmetic. Karpathy described a working pipeline where he dumps raw sources (papers, transcripts, bookmarks) into a folder, points an LLM at it, and the model builds an interlinked wiki from scratch. Articles get written. Backlinks get drawn. Concepts get categorized. A periodic linting pass scans for contradictions and stale entries. His wiki on a single research topic now runs about 100 articles and 400,000 words, and he says he rarely touches it directly. The agent maintains it.
I think the contrarian read on this is the important one. The dominant framing in 2025 was that senior engineers should delegate more code to agents. Karpathy’s framing in 2026 is that the highest-leverage delegation is not code at all. It is the research artifacts that make the code worth writing.
Why does spec-first beat code-first for long-horizon work?
Code rots. Knowledge compounds. That is the whole argument in one line, and it is worth stopping on because most agent workflows ignore it.
Code you shipped six months ago is probably deprecated, refactored, or sitting behind a feature flag you forgot about. The agent runtime that produced it is a sunk cost. Knowledge you synthesized six months ago keeps paying dividends every time someone revisits the decision. The comparison of three vector database options. The notes on why your team ruled out event sourcing. The wiki page on how your auth system actually works. None of that rots the way a React component does.
The Idea Files pattern formalizes this. Instead of asking an agent “write me a Redis cache layer,” you write a markdown spec describing the caching behavior, invalidation rules, and failure modes, and hand the spec to the agent. The spec survives the code. When the code gets rewritten in a new framework next year, the spec still works. When a new engineer joins, they read the spec, not the 800-line diff. When you later ask the agent to extend the system, the spec is the shared substrate, not a half-remembered prompt.
The implication for agent workflow design is direct. Every significant agent invocation should produce two artifacts, not one: the code (or the answer, or the refactor) and a durable knowledge artifact describing what was done and why. Run the second track long enough and the code becomes the cheap, regenerable thing.
What does “knowledge that compounds” actually mean?
Here is the architecture from Karpathy’s gist, stripped to three layers and three operations:
raw/ (immutable sources: PDFs, transcripts, notes, gists)
wiki/ (LLM-generated markdown: entity pages, concepts, cross-refs)
CLAUDE.md (schema: how to ingest, query, and lint)
Operations:
ingest : read new raw/ material, update wiki/, draw wikilinks
query : answer questions using wiki/ as primary context
lint : scan wiki/ for contradictions, stale entries, orphan concepts
The pattern bypasses retrieval-augmented generation in an interesting way. RAG re-derives an answer from chunks of raw documents on every query. The wiki compiles the answer once at ingestion, stores it as interlinked markdown, and reuses it forever. Traditional RAG is stateless retrieval. The wiki is a persistent compiled artifact.
flowchart LR
A[Raw sources<br/>PDFs, transcripts, notes] -->|ingest| B[LLM Wiki<br/>linked markdown]
B -->|query| C[Answer with<br/>compiled context]
B -->|lint| D[Consistency pass<br/>flag contradictions]
D -->|revise| B
E[New material] -->|append| A
The compounding effect comes from two places. First, every new source gets connected to prior sources through wikilinks. The tenth paper on the same topic updates twenty existing articles instead of living in isolation. Second, the lint pass surfaces structural holes the human would not notice: concepts mentioned but never defined, articles that contradict each other, index entries that drifted stale. Nathan Lambert and Chip Huyen both picked up on the same point in commentary: the wiki pattern makes reading other people’s research durable in a way that RAG over PDFs never did.
| Dimension | Code-first workflow | Knowledge-first workflow |
|---|---|---|
| Primary artifact | Code diffs | Linked markdown |
| Shelf life | Months | Years |
| Compounds? | No (code rots) | Yes (wiki accretes) |
| Shareable unit | PR / repo | Spec / idea file |
| Failure mode | Rewrites on framework change | Drift without lint passes |
| Agent runtime use | Ephemeral generation | Persistent curation |
How do the community implementations reveal the pattern?
The strongest signal that the shift is real is not the HN thread. It is the velocity of community distillations. The forrestchang/andrej-karpathy-skills repo, a single CLAUDE.md file that encodes Karpathy’s principles for Claude Code, gained 9,263 stars in one day on April 14, reaching 48,602 total. It is structured as a Claude Code plugin with guidelines like “don’t assume, don’t hide confusion, surface tradeoffs” and “define success criteria and loop until verified.” Spec-first, explicit, declarative.
The second signal is thedotmack/claude-mem, a Claude Code plugin that automatically captures session observations, compresses them with Anthropic’s agent SDK, and injects relevant context into future sessions. At 46,100 stars, 3,500 forks, and 92 contributors, it solves the other half of the Idea Files pattern: how do you avoid re-teaching the agent your project context on every session? Combined, the two repos sketch a working stack. Skills provide the agent’s prior. Claude-mem provides persistent cross-session memory. The Idea Files pattern provides the knowledge artifact the agent reads from and writes to.
These are not research projects. They are widely-installed tools that reveal what senior practitioners actually want: an agent that remembers, a spec it can read, and a wiki it maintains. The stars are a signal that the code-first framing has hit its ceiling.
Deep dive: how the lint pass works
The lint operation in the Karpathy gist is underspecified on purpose. You tune it to your domain. A typical lint prompt looks like: ``` Read every article in wiki/. For each, check: 1. Does it reference a concept that lacks its own article? Flag for creation. 2. Does it contradict any claim in another article? Flag both with a note. 3. Are its citations in raw/ still present? Flag broken references. 4. Has it been updated in the last N days since related source ingests? Flag stale. Write the flags to wiki/_lint_report.md. Do not modify articles directly. ``` Running this weekly keeps a 100-article wiki coherent. Running it never produces the same rot code has, just slower. The human decision is when to trust the lint pass enough to let the agent apply fixes automatically versus surfacing them for review.What should you build this week?
The cheapest experiment is a single-topic wiki. Pick one research area you keep revisiting. Mine is speech agent latency budgets. Yours might be RAG evaluation, agent memory, or payments infrastructure.
Create three folders in a git repo: raw/, wiki/, and a root CLAUDE.md describing the ingest and lint rules. Dump five or six sources into raw/: a paper, two blog posts, a podcast transcript, a documentation page. Run the ingest with Claude Code. Review the wiki output in Obsidian, where wikilinks render natively. Then run the lint pass and see what the agent flags.
Three observations from running this on my own research:
The first ingest feels rough. The agent’s article structure is generic. The cross-references are shallow. This is fine. It is a first pass. The artifact gets better with every subsequent ingest as the agent has more existing articles to link into. The first ten sources are the hardest.
The schema matters more than the prompt. A well-designed CLAUDE.md with explicit rules for article length, link density, and naming conventions produces dramatically better wiki output than a vague instruction. This echoes the judgment shift: specification writing is the high-leverage skill.
Pair it with persistent memory. Running the wiki pattern in isolation is useful. Running it alongside hierarchical memory across sessions and something like claude-mem makes the agent aware of past wiki ingests, past queries, and past decisions. That is when the compounding really starts.
The goal is not to build Karpathy’s wiki. It is to stop asking your agent the wrong question. “Write this code” produces a diff you throw away. “Compile this knowledge” produces an artifact you return to. For senior engineers who have felt the ceiling of code-gen agents, this is what sits above it.
Key takeaways
- Karpathy’s April 2026 shift: from code generation to knowledge compilation, with a 100-article personal wiki as the working artifact.
- Idea Files (markdown specs shared instead of code) reached 246 HN points on April 4. Spec-first beats code-first on long-horizon work because specs outlive implementations.
- The LLM Wiki pattern (raw/, wiki/, schema) compiles knowledge once and keeps it current via lint passes, bypassing the per-query rediscovery cost of RAG.
- Community velocity confirms the shift: forrestchang/andrej-karpathy-skills (+9,263 stars in a day), thedotmack/claude-mem (46.1k stars). The stack for agent-authored knowledge is assembling in public.
- The practical experiment: pick one topic, build a three-folder wiki, run ingest and lint, observe what compounds after ten cycles.
Further reading
- LLM Wiki gist by Karpathy: the original architecture.
- forrestchang/andrej-karpathy-skills: Claude Code plugin distilling the principles.
- thedotmack/claude-mem: persistent cross-session memory for Claude Code.
- Karpathy’s autoresearch framework: the experiment-loop companion to the knowledge-synthesis loop.
- Persistent agent memory via Memori: the storage layer underneath the wiki pattern.
Want to work together?
I take on projects, advisory roles, and fractional CTO engagements in AI/ML. I also help businesses go AI-native with agentic workflows and agent orchestration.
Get in touch