my ai reading list
2025-06-21
llm foundations
-
papers
- Understanding LSTM Networks - Christopher Olah (blog) - [paper]
- The Unreasonable Effectiveness of Recurrent Neural Networks - Andrej Karpathy (blog) - [paper]
- Sequence to Sequence Learning with Neural Networks - (Google) - [paper]
- Neural Machine Translation by Jointly Learning to Align and Translate - (Google/NYU) - [paper]
- Word2Vec: Efficient Estimation of Word Representations - (Google) - [paper]
- GloVe: Global Vectors for Word Representation - (Stanford) - [paper]
- ELMo: Deep Contextualized Word Representations - (Allen Institute) - [paper]
- The Illustrated Transformer - Jay Alammar (blog) - [paper]
- A Primer in BERTology: What we know about how BERT works - Rogers et al. - [paper]
- A Survey of Transformers - Khan et al. - [paper]
-
books
Language Models and stuff
-
papers
- Attention Is All You Need - (Google) - [paper]
- BERT: Pre-training of Deep Bidirectional Transformers - (Google) - [paper]
- Improving Language Understanding by Generative Pre-Training - (OpenAI) - [paper]
- Language Models are Unsupervised Multitask Learners (GPT‑2) - (OpenAI) - [paper]
- Language Models are Few-Shot Learners (GPT‑3) - (OpenAI) - [paper]
- Scaling Laws for Neural Language Models - (OpenAI) - [paper]
- Training Compute-Optimal Language Models (Chinchilla) - (DeepMind) -[paper]
- PaLM: Scaling Language Models with Pathways - (Google) - [paper]
- LLaMA: Open and Efficient Foundation Language Models - (Meta) - [paper]
- The LLaMa 3 Herd of Models - (Meta) - [paper]
- DeepSeek‑V3 Technical Report - (DeepSeek) - [paper]
- Tülu 3: Pushing Frontiers in Open Language Model Post-Training - (Allen Institute for AI) - [paper]
- Large Concept Models: Language Modeling in a Sentence Representation Space - (Meta) - [paper]
advanced language models & RL training
-
papers
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning - (DeepSeek) - [paper]
- Kimi k1.5: Scaling Reinforcement Learning with LLMs - (Kimi AI) - [paper]
- Self-Rewarding Language Models - (Meta) - [paper]
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models - (Google) - [paper]
- Training language models to follow instructions with human feedback - (Open AI) - [paper]
- RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback - (Google) - [paper]
- DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models - (Deepseek) - [paper]
- The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits - (Microsoft) - [paper]
-
blogs
- MagiAttention - (Sandai Org) - [blog]
diffusion & discrete generation models
-
papers
architectural advancement & fast inference
-
papers
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness - (Stanford) - [paper] (also read FlashAttention2 & 3).
- LoRA: Low-Rank Adaptation of Large Language Models - (Microsoft) - [paper]
- Mixtral of Experts - (Mistral AI) - [paper]
- Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention - (Deepseek) - [paper]
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces - (CMU) - [paper] [blog]
- s1: Simple test-time scaling - (Stanford / AIAI) - [paper]
model behaviour & new insights
-
papers
- Alignment Faking in Large Language Models - (Anthropic) - [paper]
- On the Biology of a Large Language Model - (Anthropic series) - [paper]
- How Much Do Large Language Models Memorize? - (Meta / Google / CU / NVIDIA) - [paper]
- When Can Transformers Reason with Abstract Symbols? - (Apple / MIT) - [paper]
- Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning - (Qwen) - [paper]
- Let the Code LLM Edit Itself When You Edit the Code — (ByteDance) — [paper]