What rebuilding AlphaGo teaches us about self-play, RL, and future of LLMs - Eric Jang
Eric Jang walks through how to rebuild AlphaGo from scratch using contemporary AI tooling. The talk uses AlphaGo as a clear, worked example of core intelligence primitives—search, learning from experience, and self-play—to show why Monte Carlo Tree Search (MCTS) provides stronger training signals than naive policy-gradient RL and to explain what that implies for future LLM training and automated research loops.
Linked assets
RL — Framework and technical implications about RL, self-play, and LLM training; not a company or stock recommendation.
Technical synthesis of AlphaGo’s components (MCTS, policy/value networks, self-play) and their relevance to reinforcement learning practices used with LLMs.
Eric Jang reconstructs AlphaGo using modern tooling to illustrate why MCTS offers a clearer training target than naive policy-gradient RL: MCTS suggests a strictly better action at each move, reducing credit assignment difficulty that plagues long-token trajectories in LLM training. The talk also documents an Autoresearch loop and shows where LLMs can already automate research tasks (running experiments, coding, hyperparameter optimization) versus where they currently fail (choosing productive research directions and escaping dead ends). Links provided include flashcards and transcript; errata and technical appendices are available at evjang.com/alphago-tutorial/llm_rl_variance.pdf.
Source proof
Source proof: Strong source proof | 1 extracted claim | 1 directional asset | 1 supporting author | headline-like title review
Primary source is Eric Jang’s lecture/podcast that reconstructs AlphaGo with modern tools, includes links to transcript, flashcards, and errata (evjang.com/alphago-tutorial/llm_rl_variance.pdf), and references an Autoresearch experiment and use of Cursor’s agent SDK.
Lecture-level geopolitical framework contrasting continental land powers and maritime trading powers, with a brief mention of Russia/Putin targeting global agriculture. Conceptual material only; implications for markets are indirect and would be second-order (defense spending, supply-chain resilience, agriculture/food security).
Podcast discussing economics of AGI: taxation and redistribution of AI-generated wealth, distribution of gains across countries, and inequality risks. Contains sponsor mentions (Jane Street; Google Gemini). No concrete near-term catalysts or firm-specific fundamentals.
Entry contains only a title with no substantive body text. No details on companies, products, demand drivers, competitive dynamics, or time-bound catalysts for tradable theses.
Eric Jang walks through rebuilding AlphaGo with modern AI tools to expose the primitives of intelligence: search, learning from experience, and self-play. He explains why AlphaGo’s MCTS yields clearer training targets and discusses how RL applies (and often fails to apply) to LLMs, how LLMs can automate parts of research, and which research tasks remain hard for current models. Supporting materials include flashcards, a transcript, and errata.
Skipped non-finance YouTube video; content does not contain a clear market or investable-stock discussion.
Skipped non-finance YouTube video; content does not contain a clear market or investable-stock discussion.
Teaser of a conversation with Nvidia’s CEO on the durability of Nvidia’s AI-chip moat, covering competition from TPUs/hyperscaler accelerators, chip supply-chain bottlenecks, and geopolitics of selling chips to China. No new quantitative disclosures or time-bound catalysts in the excerpt.
Skipped non-finance YouTube video; content does not contain a clear market or investable-stock discussion.
Supporting authors
Single-author analysis based on Eric Jang’s presentation and associated materials (transcript and flashcards).
Unlock full thesis monitoring
Read the full transcript and supporting flashcards to follow the technical walkthrough and the Autoresearch examples; useful background for understanding RL limitations in LLMs and where automation can already help research.