I am a PhD candidate at NYU advised by Andrew Gordon Wilson. I work on the foundations of deep learning and focus on understanding the generalization properties of deep neural networks using notions that relate to generalization such as model compression, the marginal likelihood, PAC-Bayes bounds, and loss surface analysis. Using insights about generalization, my goal is to build more robust and reliable machine learning models.

My PhD research has been recognized with an ICML Outstanding Paper Award and is generously supported by the Microsoft Research PhD Fellowship, the Google DeepMind Fellowship, and the Meta AI Mentorship Program. I was recently distinguished as a Rising Star in Machine Learning by the University of Maryland Center for Machine Learning.

I am currently interning at Microsoft Research, where I work with Miro Dudik and Jordan Ash to build novel methods for efficient large language model merging for mutli-task learning. In 2022-2023, I was a Visiting Researcher at Meta FAIR, where I worked with Brandon Amos to derive generalization bounds for LLMs and understand the benefits of input-dependent augmentations in image classification. In summer 2022, I worked with Bernie Wang and Richard Kurle at Amazon to understand and quantify distribution shift in time series.

Prior to NYU, I worked with Andrea Lodi and Dominique Orban at Polytechnique Montreal to design stochastic algorithms with compelling theoretical and empirical properties for large-scale optimization. I received the Best Master’s Thesis Award for this work.

I will graduate in Spring 2025 and will start looking for postdoc and research scientist positions in Fall 2024. Feel free to reach out if you see a fit!

You can contact me at sl8160[at]nyu[dot]edu

Recent News

Selected Publications

Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models
Sanae Lotfi*, Yilun Kuang*, Brandon Amos, Micah Goldblum, Marc Finzi, Andrew Gordon Wilson
ICML Workshop on Theoretical Foundations of Foundation Models, 2024
Oral Presentation

Non-Vacuous Generalization Bounds for Large Language Models
Sanae Lotfi*, Marc Finzi*, Yilun Kuang*, Tim G. J. Rudner, Micah Goldblum, Andrew Gordon Wilson
ICML 2024
[arxiv, code]

Bayesian Model Selection, the Marginal Likelihood, and Generalization
Sanae Lotfi, Pavel Izmailov, Gregory Benton, Micah Goldblum, Andrew Gordon Wilson
ICML 2022, JMLR 2023
ICML Outstanding Paper Award, JMLR Best Papers Track
[arxiv, code, poster, talk, slides]

PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization
Sanae Lotfi*, Marc Finzi*, Sanyam Kapoor*, Andres Potapczynski*, Micah Goldblum, Andrew Gordon Wilson
NeurIPS 2022
[arxiv, code]

Dangers of Bayesian Model Averaging under Covariate Shift
Pavel Izmailov, Patrick Nicholson, Sanae Lotfi, Andrew Gordon Wilson
NeurIPS 2021
[arxiv, code, poster]

Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling
Gregory W. Benton, Wesley J. Maddox, Sanae Lotfi, Andrew Gordon Wilson
ICML 2021
Spotlight Presentation
[arxiv, code, slides]