What are Karpathy's Idea Files?

Idea Files are structured markdown specifications that describe a system's architecture without implementation code. You hand the file to an agent, which then builds the system for your specific context. Karpathy proposed them in an April 4, 2026 post that reached 246 points and 77 comments on Hacker News. The core claim: sharing specifications beats sharing code because the agent customizes the build to the reader's stack.

What is the LLM Wiki pattern?

A three-layer architecture from Karpathy's April 2026 gist: raw sources (immutable documents), the wiki (LLM-generated interlinked markdown pages), and a schema file that tells the agent how to ingest new material, answer queries, and run linting passes for consistency. Karpathy's personal wiki grew to ~100 articles and ~400,000 words. The knowledge is compiled once and kept current, not re-derived on every query.

How is this different from RAG?

RAG retrieves chunks of raw documents per query and lets the model synthesize an answer each time. The LLM Wiki synthesizes once at ingestion, stores the result as durable markdown with explicit cross-references, and reuses it forever. RAG is stateless retrieval. The wiki is a compounding artifact. The trade-off: RAG stays fresh automatically, the wiki needs ingestion passes but delivers denser context per token.

Is this a replacement for writing code with agents?

No. It is a second track. Code generation handles the production layer (the stuff that lands in a repo). Knowledge synthesis handles the research and design layer: the specs, comparisons, and investigations that inform the code. Senior engineers who only use agents for code hit diminishing returns fast. The ones who run both tracks in parallel get compounding leverage on long-horizon work.

What should I build this week to try this?

Pick one research topic you keep revisiting. Create three folders: raw/ (dump PDFs, gists, transcripts), wiki/ (empty), and a CLAUDE.md that tells the agent to read raw/, produce linked markdown in wiki/, and flag contradictions. Run it on five sources. Review the output in Obsidian. After ten ingestion cycles you will have a working second brain and concrete intuitions about what the wiki pattern does well and where it breaks.

Karpathy’s shift: why the highest-leverage agent use is not writing code, it’s building knowledge

10 minute read

TL;DR

Karpathy stopped using AI to write code and started using it to compile knowledge. His personal wiki, grown from raw research dumps, has reached about 100 linked articles and 400,000 words. The April 4, 2026 “Idea Files” post (246 upvotes, 77 comments on Hacker News) reframed agent workflow design around spec-first knowledge synthesis, not prompt-driven code generation. This post unpacks why that reframing matters and what to build with it. For the judgment skills side of the same shift, see the agentic engineering judgment post.

An illuminated open volume on a reading stand with glowing cross-reference threads linking to surrounding library shelves, symbolizing a compounding knowledge wiki.

What did Karpathy actually stop doing?

On April 3, 2026, Karpathy posted: “A large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge.” The next day he followed up with a GitHub gist he called an Idea File: plain markdown describing the system, no code attached. The Hacker News thread reached 246 points and 77 comments in a day.

The shift is not cosmetic. Karpathy described a working pipeline where he dumps raw sources (papers, transcripts, bookmarks) into a folder, points an LLM at it, and the model builds an interlinked wiki from scratch. Articles get written. Backlinks get drawn. Concepts get categorized. A periodic linting pass scans for contradictions and stale entries. His wiki on a single research topic now runs about 100 articles and 400,000 words, and he says he rarely touches it directly. The agent maintains it.

I think the contrarian read on this is the important one. The dominant framing in 2025 was that senior engineers should delegate more code to agents. Karpathy’s framing in 2026 is that the highest-leverage delegation is not code at all. It is the research artifacts that make the code worth writing.

Why does spec-first beat code-first for long-horizon work?

Code rots. Knowledge compounds. That is the whole argument in one line, and it is worth stopping on because most agent workflows ignore it.

Code you shipped six months ago is probably deprecated, refactored, or sitting behind a feature flag you forgot about. The agent runtime that produced it is a sunk cost. Knowledge you synthesized six months ago keeps paying dividends every time someone revisits the decision. The comparison of three vector database options. The notes on why your team ruled out event sourcing. The wiki page on how your auth system actually works. None of that rots the way a React component does.

The Idea Files pattern formalizes this. Instead of asking an agent “write me a Redis cache layer,” you write a markdown spec describing the caching behavior, invalidation rules, and failure modes, and hand the spec to the agent. The spec survives the code. When the code gets rewritten in a new framework next year, the spec still works. When a new engineer joins, they read the spec, not the 800-line diff. When you later ask the agent to extend the system, the spec is the shared substrate, not a half-remembered prompt.

The implication for agent workflow design is direct. Every significant agent invocation should produce two artifacts, not one: the code (or the answer, or the refactor) and a durable knowledge artifact describing what was done and why. Run the second track long enough and the code becomes the cheap, regenerable thing.

What does “knowledge that compounds” actually mean?

Here is the architecture from Karpathy’s gist, stripped to three layers and three operations:

raw/        (immutable sources: PDFs, transcripts, notes, gists)
wiki/       (LLM-generated markdown: entity pages, concepts, cross-refs)
CLAUDE.md   (schema: how to ingest, query, and lint)

Operations:
  ingest : read new raw/ material, update wiki/, draw wikilinks
  query  : answer questions using wiki/ as primary context
  lint   : scan wiki/ for contradictions, stale entries, orphan concepts

The pattern bypasses retrieval-augmented generation in an interesting way. RAG re-derives an answer from chunks of raw documents on every query. The wiki compiles the answer once at ingestion, stores it as interlinked markdown, and reuses it forever. Traditional RAG is stateless retrieval. The wiki is a persistent compiled artifact.

flowchart LR
    A[Raw sources<br/>PDFs, transcripts, notes] -->|ingest| B[LLM Wiki<br/>linked markdown]
    B -->|query| C[Answer with<br/>compiled context]
    B -->|lint| D[Consistency pass<br/>flag contradictions]
    D -->|revise| B
    E[New material] -->|append| A

The compounding effect comes from two places. First, every new source gets connected to prior sources through wikilinks. The tenth paper on the same topic updates twenty existing articles instead of living in isolation. Second, the lint pass surfaces structural holes the human would not notice: concepts mentioned but never defined, articles that contradict each other, index entries that drifted stale. Nathan Lambert and Chip Huyen both picked up on the same point in commentary: the wiki pattern makes reading other people’s research durable in a way that RAG over PDFs never did.

Dimension	Code-first workflow	Knowledge-first workflow
Primary artifact	Code diffs	Linked markdown
Shelf life	Months	Years
Compounds?	No (code rots)	Yes (wiki accretes)
Shareable unit	PR / repo	Spec / idea file
Failure mode	Rewrites on framework change	Drift without lint passes
Agent runtime use	Ephemeral generation	Persistent curation

How do the community implementations reveal the pattern?

The strongest signal that the shift is real is not the HN thread. It is the velocity of community distillations. The forrestchang/andrej-karpathy-skills repo, a single CLAUDE.md file that encodes Karpathy’s principles for Claude Code, gained 9,263 stars in one day on April 14, reaching 48,602 total. It is structured as a Claude Code plugin with guidelines like “don’t assume, don’t hide confusion, surface tradeoffs” and “define success criteria and loop until verified.” Spec-first, explicit, declarative.

The second signal is thedotmack/claude-mem, a Claude Code plugin that automatically captures session observations, compresses them with Anthropic’s agent SDK, and injects relevant context into future sessions. At 46,100 stars, 3,500 forks, and 92 contributors, it solves the other half of the Idea Files pattern: how do you avoid re-teaching the agent your project context on every session? Combined, the two repos sketch a working stack. Skills provide the agent’s prior. Claude-mem provides persistent cross-session memory. The Idea Files pattern provides the knowledge artifact the agent reads from and writes to.

These are not research projects. They are widely-installed tools that reveal what senior practitioners actually want: an agent that remembers, a spec it can read, and a wiki it maintains. The stars are a signal that the code-first framing has hit its ceiling.

Deep dive: how the lint pass works

The lint operation in the Karpathy gist is underspecified on purpose. You tune it to your domain. A typical lint prompt looks like: ``` Read every article in wiki/. For each, check: 1. Does it reference a concept that lacks its own article? Flag for creation. 2. Does it contradict any claim in another article? Flag both with a note. 3. Are its citations in raw/ still present? Flag broken references. 4. Has it been updated in the last N days since related source ingests? Flag stale. Write the flags to wiki/_lint_report.md. Do not modify articles directly. ``` Running this weekly keeps a 100-article wiki coherent. Running it never produces the same rot code has, just slower. The human decision is when to trust the lint pass enough to let the agent apply fixes automatically versus surfacing them for review.

What should you build this week?

The cheapest experiment is a single-topic wiki. Pick one research area you keep revisiting. Mine is speech agent latency budgets. Yours might be RAG evaluation, agent memory, or payments infrastructure.

Create three folders in a git repo: raw/, wiki/, and a root CLAUDE.md describing the ingest and lint rules. Dump five or six sources into raw/: a paper, two blog posts, a podcast transcript, a documentation page. Run the ingest with Claude Code. Review the wiki output in Obsidian, where wikilinks render natively. Then run the lint pass and see what the agent flags.

Three observations from running this on my own research:

The first ingest feels rough. The agent’s article structure is generic. The cross-references are shallow. This is fine. It is a first pass. The artifact gets better with every subsequent ingest as the agent has more existing articles to link into. The first ten sources are the hardest.

The schema matters more than the prompt. A well-designed CLAUDE.md with explicit rules for article length, link density, and naming conventions produces dramatically better wiki output than a vague instruction. This echoes the judgment shift: specification writing is the high-leverage skill.

Pair it with persistent memory. Running the wiki pattern in isolation is useful. Running it alongside hierarchical memory across sessions and something like claude-mem makes the agent aware of past wiki ingests, past queries, and past decisions. That is when the compounding really starts.

The goal is not to build Karpathy’s wiki. It is to stop asking your agent the wrong question. “Write this code” produces a diff you throw away. “Compile this knowledge” produces an artifact you return to. For senior engineers who have felt the ceiling of code-gen agents, this is what sits above it.

Key takeaways

Karpathy’s April 2026 shift: from code generation to knowledge compilation, with a 100-article personal wiki as the working artifact.
Idea Files (markdown specs shared instead of code) reached 246 HN points on April 4. Spec-first beats code-first on long-horizon work because specs outlive implementations.
The LLM Wiki pattern (raw/, wiki/, schema) compiles knowledge once and keeps it current via lint passes, bypassing the per-query rediscovery cost of RAG.
Community velocity confirms the shift: forrestchang/andrej-karpathy-skills (+9,263 stars in a day), thedotmack/claude-mem (46.1k stars). The stack for agent-authored knowledge is assembling in public.
The practical experiment: pick one topic, build a three-folder wiki, run ingest and lint, observe what compounds after ten cycles.

Karpathy’s shift: why the highest-leverage agent use is not writing code, it’s building knowledge

TL;DR

What did Karpathy actually stop doing?

Why does spec-first beat code-first for long-horizon work?

What does “knowledge that compounds” actually mean?

How do the community implementations reveal the pattern?

What should you build this week?

Key takeaways

Further reading

Related across topics

Share on

TL;DR

What did Karpathy actually stop doing?

Why does spec-first beat code-first for long-horizon work?

What does “knowledge that compounds” actually mean?

How do the community implementations reveal the pattern?

What should you build this week?

Key takeaways

Further reading

Related across topics

NanoQuant: sub-1-bit quantization shrinks a 70B model from 138GB to 5.35GB

The judgment shift: what engineering looks like when agents write 80% of the code

Share on