activesellyoutube

What rebuilding AlphaGo teaches us about self-play, RL, and future of LLMs - Eric Jang

Eric Jang walks through how to rebuild AlphaGo from scratch using contemporary AI tooling. The talk uses AlphaGo as a clear, worked example of core intelligence primitives—search, learning from experience, and self-play—to show why Monte Carlo Tree Search (MCTS) provides stronger training signals than naive policy-gradient RL and to explain what that implies for future LLM training and automated research loops.

Confidence
60 / 100
Assets
1
Authors
1
Outcome
open

Linked assets

RL — Framework and technical implications about RL, self-play, and LLM training; not a company or stock recommendation.

RLsellopen

Technical synthesis of AlphaGo’s components (MCTS, policy/value networks, self-play) and their relevance to reinforcement learning practices used with LLMs.

Confidence: 60 / 100Start: $326.81Latest: $363.92Return: -11.36%

Eric Jang reconstructs AlphaGo using modern tooling to illustrate why MCTS offers a clearer training target than naive policy-gradient RL: MCTS suggests a strictly better action at each move, reducing credit assignment difficulty that plagues long-token trajectories in LLM training. The talk also documents an Autoresearch loop and shows where LLMs can already automate research tasks (running experiments, coding, hyperparameter optimization) versus where they currently fail (choosing productive research directions and escaping dead ends). Links provided include flashcards and transcript; errata and technical appendices are available at evjang.com/alphago-tutorial/llm_rl_variance.pdf.

Source proof

Source proof: Strong source proof | 1 extracted claim | 1 directional asset | 1 supporting author | headline-like title review

Primary source is Eric Jang’s lecture/podcast that reconstructs AlphaGo with modern tools, includes links to transcript, flashcards, and errata (evjang.com/alphago-tutorial/llm_rl_variance.pdf), and references an Autoresearch experiment and use of Cursor’s agent SDK.

Sarah Paine - Why Russia and China can't escape geography
Dwarkesh Patel · Jun 9, 2026, 2:14 PM EDT

Lecture-level geopolitical framework contrasting continental land powers and maritime trading powers, with a brief mention of Russia/Putin targeting global agriculture. Conceptual material only; implications for markets are indirect and would be second-order (defense spending, supply-chain resilience, agriculture/food security).

View source
What remains scarce after AGI? – Alex Imas and Phil Trammell
Dwarkesh Patel · Jun 4, 2026, 12:37 PM EDT

Podcast discussing economics of AGI: taxation and redistribution of AI-generated wealth, distribution of gains across countries, and inequality risks. Contains sponsor mentions (Jane Street; Google Gemini). No concrete near-term catalysts or firm-specific fundamentals.

View source
How do AI chips actually work? – Reiner Pope
Dwarkesh Patel · May 22, 2026, 12:11 PM EDT

Entry contains only a title with no substantive body text. No details on companies, products, demand drivers, competitive dynamics, or time-bound catalysts for tradable theses.

View source
What rebuilding AlphaGo teaches us about self-play, RL, and future of LLMs - Eric Jang
Dwarkesh Patel · May 15, 2026, 12:20 PM EDT

Eric Jang walks through rebuilding AlphaGo with modern AI tools to expose the primitives of intelligence: search, learning from experience, and self-play. He explains why AlphaGo’s MCTS yields clearer training targets and discusses how RL applies (and often fails to apply) to LLMs, how LLMs can automate parts of research, and which research tasks remain hard for current models. Supporting materials include flashcards, a transcript, and errata.

View source
David Reich – Bronze Age shock, the Neanderthal puzzle, & farming’s sudden spread
Dwarkesh Patel · May 8, 2026, 1:09 PM EDT

Skipped non-finance YouTube video; content does not contain a clear market or investable-stock discussion.

View source
The math that explains AI lab economics – Reiner Pope
Dwarkesh Patel · Apr 29, 2026, 1:20 PM EDT

Skipped non-finance YouTube video; content does not contain a clear market or investable-stock discussion.

View source
Jensen Huang – Will Nvidia’s moat persist?
Dwarkesh Patel · Apr 14, 2026, 8:00 PM EDT

Teaser of a conversation with Nvidia’s CEO on the durability of Nvidia’s AI-chip moat, covering competition from TPUs/hyperscaler accelerators, chip supply-chain bottlenecks, and geopolitics of selling chips to China. No new quantitative disclosures or time-bound catalysts in the excerpt.

View source
Michael Nielsen – Why aliens will have a different tech stack than us
Dwarkesh Patel · Apr 6, 2026, 8:00 PM EDT

Skipped non-finance YouTube video; content does not contain a clear market or investable-stock discussion.

View source

Supporting authors

Single-author analysis based on Eric Jang’s presentation and associated materials (transcript and flashcards).

Unlock full thesis monitoring

Read the full transcript and supporting flashcards to follow the technical walkthrough and the Autoresearch examples; useful background for understanding RL limitations in LLMs and where automation can already help research.