Papers

Research papers I've read over the years — with notes on what I took away and why it mattered.

AI AgentsComputer VisionDocument AIEfficient MLLLMsMultimodalNLP / ArchitecturesNLP / Retrieval

AI Agents

ReAct: Synergizing Reasoning and Acting in Language Models

Yao et al. · ICLR 2023 · 2023

Interleaving chain-of-thought reasoning with action execution gave LLMs a practical agentic loop. This became the skeleton of how I think about tool-using agents.

AgentsReasoningLLMs

Computer Vision

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy et al. · ICLR 2021 · 2021

Bridged NLP and vision by treating image patches as tokens. The simplicity is surprising — it required scale to work, which raised important questions about inductive biases vs data.

Vision TransformersImage Recognition

End-to-End Object Detection with Transformers

Carion et al. · ECCV 2020 · 2020

Eliminated the need for hand-designed anchors and NMS. Directly relevant to my thesis work on attention-based detection architectures at TUM.

Object DetectionTransformersDETR

Document AI

ColPali: Efficient Document Retrieval with Vision Language Models

Faysse et al. · arXiv 2024 · 2024

Multi-vector late interaction over page-level visual embeddings — a clean solution to the problem of searching inside PDFs without any text extraction step.

VLMsDocument RetrievalColBERT

Efficient ML

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Dao et al. · NeurIPS 2022 · 2022

IO-awareness turned a theoretical bottleneck into a practical one and solved it elegantly. A great example of systems thinking applied to ML.

AttentionEfficiencyHardware-Aware

SparseGPT: Massive Language Models Can be Accurately Pruned in One Shot

Frantar & Alistarh · ICML 2023 · 2023

One-shot unstructured pruning at 50% sparsity with minimal accuracy loss. Important for understanding how to deploy large models in resource-constrained environments.

PruningLLMsCompression

LLMs

Scaling Laws for Neural Language Models

Kaplan et al. · arXiv 2020 · 2020

Empirical power-law relationships between compute, data, and model size. Informed how I reason about the cost-performance tradeoff when choosing models for production.

ScalingLLMsEmpirical

Multimodal

Learning Transferable Visual Models From Natural Language Supervision

Radford et al. · ICML 2021 · 2021

Contrastive pre-training on image-text pairs produced remarkably general representations. Foundational to my work on document retrieval and OCR pipelines.

CLIPVision-LanguageContrastive Learning

NLP / Architectures

Attention Is All You Need

Vaswani et al. · NeurIPS 2017 · 2017

The paper that reshaped the field. Self-attention replacing recurrence was the key insight. I re-read this every time I need to reason about sequence models from first principles.

TransformersAttentionNLP

NLP / Retrieval

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Lewis et al. · NeurIPS 2020 · 2020

Grounding generation in retrieved documents was an elegant fix to hallucination. The combine-then-generate approach is still the dominant RAG paradigm.

RAGRetrievalLLMs