Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 15: Mid/Post-Training
Lecture 15 of Stanford CS336 emphasizes mid- and post-training (SFT → RLHF) as a key driver of large language model quality. The lecture highlights that instruction data is trending longer and more interactive (chatty, tool-using), which increases compute, memory‑bandwidth, and inference complexity—supporting infrastructure and semiconductor suppliers.
Linked assets
Key hardware and infrastructure suppliers that benefit if post-training and inference intensity remain elevated: NVDA (data-center AI accelerators), AMD (alternative accelerators), TSM (foundry capacity for leading nodes), AVGO (networking/interconnect), MU (DRAM/HBM memory bandwidth), and ASML (leading-edge lithography).
NVIDIA Corporation operates as a data center scale AI infrastructure company.
Direct beneficiary of sustained post-training cycles and rising inference cost from longer, tool-using interactions → more GPU demand.
Its products are used in high performance computing, smartphones, Internet of things, automotive, and digital consumer electronics.
Foundry bottleneck for leading-edge AI accelerators; benefits from sustained capacity demand.
Advanced Micro Devices, Inc.
Alternative accelerator supplier; benefits if overall accelerator TAM stays elevated from post-training/inference intensity.
Broadcom Inc.
Scaling AI clusters to support heavier inference/training increases networking/interconnect needs; AVGO leveraged.
Micron Technology, Inc.
Memory-bandwidth intensity rises with long context and training/inference workloads → supports DRAM/HBM demand.
ASML Holding N.V.
Long-run capex cycle support if scaling/post-training keeps driving demand for leading-edge nodes.
Source proof
Source proof: Strong source proof | 4 extracted claims | 6 directional assets | 1 supporting author | headline-like title review
Lecture notes and fragments identify SFT and RLHF as central techniques, note an evolution toward longer instruction/chatty/tool-using outputs, and reference open-source SFT efforts. The discussion is technical/academic with no company announcements; investable signal is thematic: higher training/inference intensity supports picks-and-shovels suppliers.
Stanford seminar framing an “AI supercycle” centered on hyperscaler AI capex and the buildout of gigawatt-scale “AI factories” (data centers + power + cooling + networking). While the excerpt is introductory (few concrete numbers/ticker mentions), the investable implication is continued, multi-year demand for GPU/accelerator supply chains, AI networking, data-center power/cooling equipment, engineering & construction, and select data-center REITs/utilities—offset by cyclical/valuation and power-availability constraints.
Only a title/body were provided; no transcript, link, speaker names, or concrete technical claims to verify. From the topic (“AI in healthcare,” “open evidence,” “cyber risks”), the most plausible tradable implications are: (1) increased adoption of AI/LLMs in clinical workflow and imaging, (2) stronger demand for healthcare data infrastructure/interop tooling, and (3) heightened healthcare cybersecurity spend due to AI-enabled attack surface and regulatory scrutiny. All conclusions are high-uncertainty pending the actual video content.
Lecture summary (Altman @ Stanford CS153): argues scaling laws continue to deliver emergent capabilities; AI development pipeline (pre-train/post-train/RL) likely needs a rewrite potentially designed by AI; intelligence becomes a utility (like electricity); key risk fork is democratization vs concentration (~20% chance of concentrated outcome); near-term binding constraint is an underappreciated compute shortage, implying structurally rising demand for GPUs/ASICs, networking, data center buildouts, and power/grid capacity.
Transcript fragments from a Stanford HCI seminar discussion about modern “play” motivators in games: relaxation, immersion, PvP, and monetization mechanics (skins, XP boosts, optional single‑player purchases). Also touches on UX misconceptions and longitudinal/user understanding. No concrete technical breakthroughs in AI/robotics/semis/biotech/energy; the only investable angle is gaming UX-driven monetization and live-services design.
Transcript fragment discusses an “AI going to hyperscalers” thesis: enterprises prefer AWS/GCP/Azure-managed AI stacks vs building on newer GPU-cloud providers (e.g., CoreWeave, Nebius) where customers must solve integration/ops and margin structure themselves. It also implies strong forward demand for NVIDIA Blackwell B200 (mention of ~150k units needed in ~12–15 months) and highlights Google’s TPU path plus strong TSMC relationship. Content is noisy/partial; actionable signal mainly around hyperscaler capture vs GPU-neocloud margin risk, and continued NVDA/TSMC demand strength.
Lecture snippet focuses on LLM inference mechanics—especially KV-cache growth during long-context + tool-call workflows—and the resulting systems bottlenecks. Key technical signal: inference scaling is increasingly constrained by memory capacity/bandwidth and storage hierarchy (GPU HBM → CPU DRAM → SSD), not just raw GPU FLOPs. Mentions industry “rumblings” (unverified) about OpenAI buying up SSD/DRAM, and references Nvidia plus emerging inference-focused chips (e.g., Groq, which is private).
Stanford robotics seminar discusses geometric inductive biases (SE(3)/SO(3)/SO(2) equivariance, discrete rotation subgroups like C4) applied to robot learning/vision-language-action (VLA) style models and diffusion-policy/transformer approaches using RGB inputs and rotation-equivariant convolutions. Content is academic/architectural; no explicit commercialization timeline or company/product link is given, so tradability is indirect via enabling compute (GPUs), edge inference silicon, and robotics stacks.
Stanford CS25 seminar discusses the evolution from text-only LLMs to *native multimodal* models (text+vision+audio/video), focusing on transferable LLM training/architecture principles, plus emerging directions like *sparsity* (e.g., MoE/conditional compute) and *modality specialization*. While not a company-specific catalyst, it reinforces a medium-term technical direction: more multimodal data + larger context + higher throughput inference, with an increasing need for efficient routing (sparsity) and specialized encoders—supportive of compute, memory bandwidth, networking, and inference-serving infrastructure. Actionability is moderate-low (academic, non-catalyst), but the thesis maps cleanly to public “picks-and-shovels.”
Supporting authors
Academic lecture material from Stanford CS336 and related Stanford courses and seminars. Content is technical and educational rather than company-specific; authorship reflects course instructors and seminar presenters.
Unlock full thesis monitoring
Consider thematic exposure to AI infrastructure and semiconductor suppliers tied to training and inference intensity. This is a medium-term, thematic insight rather than an event-driven trade—align position sizing and time horizon accordingly.