Yao et al. · ICLR 2023 · 2023
Interleaving chain-of-thought reasoning with action execution gave LLMs a practical agentic loop. This became the skeleton of how I think about tool-using agents.
Research papers I've read over the years — with notes on what I took away and why it mattered.
Yao et al. · ICLR 2023 · 2023
Interleaving chain-of-thought reasoning with action execution gave LLMs a practical agentic loop. This became the skeleton of how I think about tool-using agents.
Dosovitskiy et al. · ICLR 2021 · 2021
Bridged NLP and vision by treating image patches as tokens. The simplicity is surprising — it required scale to work, which raised important questions about inductive biases vs data.
Carion et al. · ECCV 2020 · 2020
Eliminated the need for hand-designed anchors and NMS. Directly relevant to my thesis work on attention-based detection architectures at TUM.
Faysse et al. · arXiv 2024 · 2024
Multi-vector late interaction over page-level visual embeddings — a clean solution to the problem of searching inside PDFs without any text extraction step.
Dao et al. · NeurIPS 2022 · 2022
IO-awareness turned a theoretical bottleneck into a practical one and solved it elegantly. A great example of systems thinking applied to ML.
Frantar & Alistarh · ICML 2023 · 2023
One-shot unstructured pruning at 50% sparsity with minimal accuracy loss. Important for understanding how to deploy large models in resource-constrained environments.
Kaplan et al. · arXiv 2020 · 2020
Empirical power-law relationships between compute, data, and model size. Informed how I reason about the cost-performance tradeoff when choosing models for production.
Radford et al. · ICML 2021 · 2021
Contrastive pre-training on image-text pairs produced remarkably general representations. Foundational to my work on document retrieval and OCR pipelines.
Vaswani et al. · NeurIPS 2017 · 2017
The paper that reshaped the field. Self-attention replacing recurrence was the key insight. I re-read this every time I need to reason about sequence models from first principles.
Lewis et al. · NeurIPS 2020 · 2020
Grounding generation in retrieved documents was an elegant fix to hallucination. The combine-then-generate approach is still the dominant RAG paradigm.