Lovish Madaan

email address

twitter

Hi, I'm Lovish! I am pursuing my PhD at MSL, Meta and UCL NLP. My current focus is on thinking models, reinforcement learning, and self-improvement.

In the past, I worked at Google Research India in the Machine Learning and Optimization Team with Prateek Jain and Srinadh Bhojanapalli on inference-efficient machine learning and natural language processing.

I graduated with a B.Tech and M.Tech in Computer Science and Engineering from the Indian Institute of Technology, Delhi. I worked with Parag Singla, Sayan Ranu, and Aaditeshwar Seth on a variety of research projects during my time there.

The Art of Scaling Reinforcement Learning Compute for LLMs
Devvrit Khatri*, Lovish Madaan*, Rishabh Tiwari, Rachit Bansal, Sai Surya Duvvuri, Manzil Zaheer, Inderjit S. Dhillon, David Brandfonbrener, and Rishabh Agarwal
ICLR 2026

Rethinking Thinking Tokens: LLMs as Improvement Operators
Lovish Madaan, Aniket Didolkar, Suchin Gururangan, John Quan, Ruan Silva, Ruslan Salakhutdinov, Manzil Zaheer, Sanjeev Arora, and Anirudh Goyal
arXiv Preprint

Beyond Verifiable Rewards: Scaling Reinforcement Learning for Language Models to Unverifiable Data
Yunhao Tang, Sid Wang, Lovish Madaan, and Rémi Munos
NeurIPS 2025

MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Deepak Nathani, Lovish Madaan, Nicholas Roberts, Nikolay Bashlykov, Ajay Menon, Vincent Moens, Amar Budhiraja, Despoina Magka, Vladislav Vorotilov, Gaurav Chaurasia, Dieuwke Hupkes, Ricardo Silveira Cabral, Tatiana Shavrina, Jakob Foerster, Yoram Bachrach, William Yang Wang, and Roberta Raileanu
COLM 2025

HARP: A challenging human-annotated math reasoning benchmark
Albert S. Yue, Lovish Madaan, Ted Moskovitz, DJ Strouse, and Aaditya K. Singh
arXiv Preprint

Lost in Inference: Rediscovering the Role of Natural Language Inference for Large Language Models
Lovish Madaan, David Esiobu, Pontus Stenetorp, Barbara Plank, and Dieuwke Hupkes
NAACL 2025

The Llama 3 Herd of Models
(Core Contributor) Llama Team, AI @ Meta
Research Paper Link

Quantifying Variance in Evaluation Benchmarks
Lovish Madaan, Aaditya K. Singh, Rylan Schaeffer, Andrew Poulton, Sanmi Koyejo, Pontus Stenetorp, Sharan Narang, and Dieuwke Hupkes
Regulatable ML, NeurIPS 2024

Treeformer: Dense Gradient Trees for Efficient Attention Computation
Lovish Madaan, Srinadh Bhojanapalli, Himanshu Jain, and Prateek Jain
ICLR 2023
Google AI Blog Coverage

Field Study in Deploying Restless Multi-Armed Bandits: Assisting Non-profits in Improving Maternal and Child Health
Aditya Mate*, Lovish Madaan*, Aparna Taneja, Neha Madhiwalla, Shresth Verma, Gargi Singh, Aparna Hegde, Pradeep Varakantham, and Milind Tambe
AAAI 2022
MLPH Workshop, NeurIPS 2021. Best Paper Award
Google AI Blog Coverage

* - indicates equal contribution.