skip to content
Home
Links
Papers
Music
Resources
Close
Dark Theme
Blogs
Latent Space
Writing • Eugene Yan
Dwarkesh Patel | Substack
Writing and mumblings - Jason Liu
Chip Huyen
Lilian Weng's Blog
Ludwig Abap's Blog
MLAbonne's Blog
CV
History of Residuals and a Word of Caution
YT - Advanced CV - UCF Center
Distributed Training
CMU Advanced NLP Spring 2025 (16): Parallelism and Scaling
The Ultra-Scale Playbook - a Hugging Face Space by nanotron
NCCL: ACCELERATED MULTI-GPU COLLECTIVE COMMUNICATIONS
Communication Patterns
Everything about Distributed Training and Efficient Finetuning
FSDP & CUDACachingAllocator: an outsider newb perspective
ML Engineering: Model Parallelism
NERSC SC23 DL Tutorial
Interpretability
Learning Multi-Level Features with Matryoshka SAEs — AI Alignment Forum
Lens
SAE Explorer
Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers — LessWrong
Transformer Circuits
Latent taxonomy
dev.log - Gazing in the Latent Space with Sparse Autoencoders
LLM
FP64, FP32, FP16, BFLOAT16, TF32, and other members of the ZOO
Fast LLM Inference From Scratch
How to make LLMs go fast
Physics of Language Models
Where do LLMs spend their FLOPs?
You could have designed state of the art positional encoding
Others
umarjamilai
WelchLabsVideo
GPU MODE
CUDA Tutorial
Building Blocks for Theoretical Computer Science
GPU Glossary
GenAI Handbook
How computers work - Building Scott's CPU
ML Resources
My ML Resources! | Water
YT - Kay Lach
YT - Quantum Sense
rsrch space
RL
Reinforcement Learning of Large Language Models
a reinforcement learning guide
Repositories
VizTracer
GitHub - huggingface/nanoVLM: The simplest, fastest repository for training/finetuning small-sized VLMs.
LLM Training Puzzles
Instructor: Caching
ML Engineering
Transformers
The Genius of DeepSeek’s 57X Efficiency Boost [MLA]
Transformers and Self-Attention (DL 19)
Intro to Transformers
A Mathematical Framework for Transformer Circuits
Attention in transformers, visually explained | Chapter 6, Deep Learning
LLM Visualization
Random Transformer
The Transformer Blueprint: A Holistic Guide
The annotated transformer
The illustrated Transformer
The math behind Attention: Keys, Queries, and Values matrices
Transformer Explainer: LLM Transformer Model Visually Explained
VLM
SkalskiP - VLMs zero to hero
Minimind-V
moondream/moondream/torch at main · vikhyat/moondream · GitHub
Categories
Blogs
CV
Distributed Training
Interpretability
LLM
Others
RL
Repositories
Transformers
VLM