Links • Some things I like

Continuous batching from first principles

26 Nov 2025
Olmo 3 Technical Report

20 Nov 2025
Weights & Biases gets a new terminal UI

20 Nov 2025
sws - Minimal, predictable, footgun-free config library - lucasb-eyer

20 Nov 2025
RL Learning with LoRA: A Diverse Deep Dive | kalomaze's kalomazing blog

9 Nov 2025
OlmoEarth: A new state-of-the-art Earth observation foundation model family | Ai2

5 Nov 2025
The Smol Training Playbook: The Secrets to Building World-Class LLMs - a Hugging Face Space by HuggingFaceTB

4 Nov 2025
NeurIPS 2025 Papers

4 Nov 2025
the bug that taught me more about PyTorch than years of using it | Elana Simon

26 Oct 2025
Evaluating Long Context (Reasoning) Ability | wh

16 Oct 2025
State of Vision-Language-Action (VLA) Research at ICLR 2026 – Moritz Reuss

15 Oct 2025
State of AI Report 2025

9 Oct 2025
Maintain the unmaintainable - a Hugging Face Space by transformers-community

9 Oct 2025
LoRA Without Regret - Thinking Machines Lab

2 Oct 2025
How to Detect, Track, and Identify Basketball Players with Computer Vision

2 Oct 2025
Astronaut Photo Interactive Map

27 Sept 2025
Online versus Offline RL for LLMs

17 Sept 2025
What is a color space? | Making Software

17 Sept 2025
AI just Broke Trackmania's most Legendary Record

13 Sept 2025
Defeating Nondeterminism in LLM Inference - Thinking Machines Lab

11 Sept 2025
Attention Is All You Need | Why Self-Attention

6 Sept 2025
Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences

6 Sept 2025
Inside vLLM: Anatomy of a High-Throughput LLM Inference System - Aleksa Gordić

4 Sept 2025
FineVision: Open Data is All You Need - a Hugging Face Space by HuggingFaceM4

4 Sept 2025
How To Become A Mechanistic Interpretability Researcher

4 Sept 2025
Big O

27 Aug 2025
PrimeIntellect | Environments Hub

27 Aug 2025
Adventures in State Space

24 Aug 2025
Do LLMs Have Good Music Taste?

20 Aug 2025
How to Think About GPUs | How To Scale Your Model

19 Aug 2025
How Social Media Shortens Your Life - by Gurwinder

14 Aug 2025
The Circuits Research Landscape: Results and Perspectives - August 2025 ｜ Neuronpedia

12 Aug 2025
How Attention Sinks Keep Language Models Stable

12 Aug 2025
avatarl: training language models from scratch with pure reinforcement learning

12 Aug 2025
blogs and resources 101 | by @himanshustwts

12 Aug 2025
How Does A Blind Model See The Earth? - by henry

12 Aug 2025
There Are No New Ideas in AI… Only New Datasets

22 Jul 2025
Efficient MultiModal Data Pipeline

22 Jul 2025
All AI Models Might Be The Same - by Jack Morris

20 Jul 2025
Word Embeddings

15 Jul 2025
The Era of Exploration | Yiding's blog

15 Jul 2025
Foundations of Computer Vision

15 Jul 2025
Reinforcement Learning of Large Language Models

15 Jul 2025
The Case for More Ambition - Jack Morris

11 Jun 2025
DJing and its potential Neurophysiological Implications

1 Jun 2025
Open-sourcing circuit-tracing tools \ Anthropic

1 Jun 2025
Dummy's Guide to Modern Samplers

24 May 2025
Why We Think | Lil'Log

24 May 2025
DumPy: NumPy except it's OK if you're dum

24 May 2025
On the speed of ViTs and CNNs

18 May 2025
How To Scale

18 May 2025
Multimodal Dataloaders go brrrrrrr - by Haoli Yin

18 May 2025
Vision Language Models (Better, faster, stronger)

18 May 2025
Neel Nanda - How I Think About My Research Process: Explore, Understand, Distill

27 Apr 2025
Is Gemini now better than Claude at Pokémon?

26 Apr 2025
My dream VLM

25 Apr 2025
torch.compile, the missing manual - Documentos de Google

25 Apr 2025
Dario Amodei - The Urgency of Interpretability

25 Apr 2025
Prof. Judy Fan: Cognitive Tools for Making the Invisible Visible

11 Apr 2025
The Colors Of Her Coat - by Scott Alexander

7 Apr 2025
attention is logarithmic, actually

24 Mar 2025
Factorio Learning Environment

13 Mar 2025
The Genius of DeepSeek’s 57X Efficiency Boost [MLA]

6 Mar 2025
Learning Pokémon With Reinforcement Learning | Pokémon RL

5 Mar 2025
Feather - lightweight, efficient, and locally hosted YouTube Music TUI built with Rust

4 Mar 2025
GRPO Judge Experiments: Findings & Empirical Observations | kalomaze's kalomazing blog

4 Mar 2025
Attention Is Off By One - Evan Miller

27 Feb 2025
darkspark

27 Feb 2025
geohints

27 Feb 2025
Removing Jeff Bezos From My Bed

21 Feb 2025
Being a High-Leverage Generalist - char.blog

20 Feb 2025
The Ultra-Scale Playbook - a Hugging Face Space by nanotron

20 Feb 2025
kudzueye/boreal-hl-v1 · Hugging Face

17 Feb 2025
What if Eye...?

17 Feb 2025
A calculator app? Anyone could make that.

17 Feb 2025
The Breakthrough Behind Modern AI Image Generators | Diffusion Models Part 1

14 Feb 2025
Everyone knows your location

8 Feb 2025
WikiTok

8 Feb 2025
All the Transformer Math You Need to Know | How To Scale Your Model

4 Feb 2025
I’m Lovin’ It: Exploiting McDonald’s APIs to hijack deliveries and order food for a penny

2 Feb 2025
A reinforcement learning guide

1 Feb 2025
Attribution-based parameter decomposition

1 Feb 2025
Mapping the Latent Space of Llama 3.3 70B - Goodfire Papers

1 Feb 2025
NCCL: ACCELERATED MULTI-GPU COLLECTIVE COMMUNICATIONS

23 Jan 2025
Learning CUDA by optimizing softmax: A worklog | Maharshi's blog

23 Jan 2025
Understanding LSTM Networks -- colah's blog

23 Jan 2025
Dino-V2 Large Microscope

23 Jan 2025
AI and Stress

11 Jan 2025
model merging

11 Jan 2025
The Best Tacit Knowledge Videos on Every Subject

11 Jan 2025
Long-Term Thinking, 2nd Order Consequences & Effect Horizons

11 Jan 2025
Weighted Skip Connections are Not Harmful for Deep Nets

11 Jan 2025
2024 letter | Zhengdong

1 Jan 2025
Things we learned about LLMs in 2024

1 Jan 2025
Building Machine Learning Systems for a Trillion Trillion Floating Point Operations

1 Jan 2025
Visualizing transformers and attention | Talk for TNG Big Tech Day '24

29 Dec 2024
The Octalysis Framework for Gamification & Behavioral Design

29 Dec 2024
You could have designed state of the art positional encoding

29 Dec 2024
GPU Glossary

29 Dec 2024
Can we control AI?

29 Dec 2024
Building effective agents \ Anthropic

29 Dec 2024