Iβm a Research Scientist at Meta FAIR in Menlo Park. I study how to make language models reason better while using less compute. My current work focuses on two questions: how can we teach models to reflect on their own reasoning process using RL, and how can we serve capable models efficiently through quantization, compression, and compact architectures.
These questions are grounded in my PhD work at NYU, where I studied generalization through the lens of information theory and compression. I showed that compression is not just a practical tool for efficiency but can also explain when and how deep learning models generalize, producing the first non-vacuous generalization bounds for billion-parameter LLMs.
My research was recognized with an ICML 2022 Outstanding Paper Award for my work on Bayesian model selection and a Best Paper Award at the ICML 2024 Theoretical Foundations Workshop for my work on understanding generalization in LLMs through the lens of compression. I was distinguished as a Rising Star in EECS by MIT and a Rising Star in Machine Learning by UMD.
I completed my PhD at NYU with Andrew Gordon Wilson, supported by the Microsoft Research PhD Fellowship and the Google DeepMind Fellowship. Prior to NYU, I worked with Andrea Lodi and Dominique Orban at Polytechnique Montreal on optimization for large-scale machine learning. I received the Best Masterβs Thesis Award for this work.
You can contact me at sanaelotfi[at]meta[dot]com
Recent News
π’ October 2025: I gave a talk on Understanding Generalization through the Lens of Compression at the Princeton Alg-ML Seminar.
π₯³ September 2025: Small Batch Size Training for Language Models got accepted to NeurIPS!
π₯³ July 2025: I joined Meta Superintelligence Labs as a Research Scientist, working in the Fundamental AI Research (FAIR) team.
π July 2025: Small Batch Size Training for Language Models is now on arxiv!
β May 2025: I gave a Rising Star talk at the International Symposium on Trustworthy Foundation Models @ MBZUAI.
π₯³ May 2025: Customizing the Inductive Biases of Softmax Attention using Structured Matrices got accepted to ICML!
π©βπ April 2025: I successfully defended my Ph.D. thesis on Understanding Generalization in Deep Learning Through Occamβs Razor!
π’ December 2024: Iβm a keynote speaker and panelist at the Machine Learning and Compression Workshop @ NeurIPS 2024.
π December 2024: Iβm organizing the Scientific Methods for Understanding Neural Networks Workshop @ NeurIPS 2024.
π₯³ September 2024: Our latest work Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models got accepted to NeurIPS as a spotlight!
β August 2024: I was selected as a Rising Star in EECS by MIT.
π July 2024: Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models won the Best Paper Award at the ICML Theoretical Foundations Workshop.
π’ June 2024: I gave a talk on Non-Vacuous Generalization Bounds for Large Language Models at ML Collective.
π©βπ» June 2024: I started my summer internship at Microsoft Research NYC, where I will be working on large language model merging for multi-task learning.
π₯³ May 2024: Non-Vacuous Generalization Bounds for Large Language Models got accepted to ICML!
π’ May 2024: I gave a talk on Non-Vacuous Generalization Bounds for Large Language Models at Cohere for AI and UIUC ML Reading group.
Selected Publications
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
Martin Marek, Sanae Lotfi, Aditya Somasundaram, Andrew Gordon Wilson, Micah Goldblum
NeurIPS 2025
[arxiv, code]
Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models
Sanae Lotfi*, Yilun Kuang*, Brandon Amos, Micah Goldblum, Marc Finzi, Andrew Gordon Wilson
NeurIPS 2024
π Spotlight Presentation
ICML Workshop on Theoretical Foundations of Foundation Models, 2024
π Best Paper Award
[arxiv, code]
Non-Vacuous Generalization Bounds for Large Language Models
Sanae Lotfi*, Marc Finzi*, Yilun Kuang*, Tim G. J. Rudner, Micah Goldblum, Andrew Gordon Wilson
ICML 2024
[arxiv, code]
Bayesian Model Selection, the Marginal Likelihood, and Generalization
Sanae Lotfi, Pavel Izmailov, Gregory Benton, Micah Goldblum, Andrew Gordon Wilson
ICML 2022, JMLR 2023
π ICML Outstanding Paper Award, JMLR Best Papers Track
[arxiv, code, poster, talk, slides]
PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization
Sanae Lotfi*, Marc Finzi*, Sanyam Kapoor*, Andres Potapczynski*, Micah Goldblum, Andrew Gordon Wilson
NeurIPS 2022
[arxiv, code]
Dangers of Bayesian Model Averaging under Covariate Shift
Pavel Izmailov, Patrick Nicholson, Sanae Lotfi, Andrew Gordon Wilson
NeurIPS 2021
[arxiv, code, poster]
Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling
Gregory W. Benton, Wesley J. Maddox, Sanae Lotfi, Andrew Gordon Wilson
ICML 2021
π Spotlight Presentation
[arxiv, code, slides]
Stochastic First and Second Order Optimization Methods for Machine Learning
Sanae Lotfi
Masterβs Thesis, Polytechnique Montreal 2020
π Best Thesis Award
