I am a PhD candidate at NYU advised by Andrew Gordon Wilson. I work on the science of deep learning and focus on understanding the generalization properties of deep neural networks using notions that relate to generalization such as model compression and loss surface analysis. Using insights about generalization, my goal is to build improved, scalable and robust deep learning models.
My PhD research has been recognized with an ICML 2022 Outstanding Paper Award for my work on Bayesian model selection and a Best Paper Award at the ICML 2024 Theoretical Foundations Workshop for my work on understanding generalization in LLMs through the lens of compression. My research is generously supported by the Microsoft Research PhD Fellowship, the Google DeepMind Fellowship, and the Meta AI Mentorship Program. I was recently distinguished as a Rising Star in EECS by MIT and a Rising Star in Machine Learning by the University of Maryland.
I am currently interning at Microsoft Research, where I work with Miro Dudik and Jordan Ash to build novel methods for efficient large language model merging for mutli-task learning. In 2022-2023, I was a Visiting Researcher at Meta FAIR, where I worked with Brandon Amos to derive generalization bounds for LLMs and understand the benefits of input-dependent augmentations in image classification. In summer 2022, I worked with Bernie Wang and Richard Kurle at Amazon to understand and quantify distribution shifts in time series.
Prior to NYU, I worked with Andrea Lodi and Dominique Orban at Polytechnique Montreal to design stochastic algorithms with compelling theoretical and empirical properties for large-scale optimization. I received the Best Masterβs Thesis Award for this work.
I will be on the job market in Fall 2024. Feel free to reach out if you see a fit!
You can contact me at sl8160[at]nyu[dot]edu
Recent News
π₯³ September 2024: Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models got accepted to NeurIPS as a spotlight!
β August 2024: I was selected as a Rising Star in EECS by MIT.
π July 2024: Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models won the Best Paper Award at the ICML Theoretical Foundations Workshop.
π’ July 2024: I will be a keynote speaker at the Machine Learning and Compression Workshop @ NeurIPS 2024.
π July 2024: Iβm co-organizing the Scientific Methods for Understanding Neural Networks Workshop @ NeurIPS 2024.
π’ June 2024: I gave a talk on Non-Vacuous Generalization Bounds for Large Language Models at ML Collective.
π©βπ» June 2024: I started my summer internship at Microsoft Research NYC, where I will be working on large language model merging for multi-task learning.
π₯³ May 2024: Non-Vacuous Generalization Bounds for Large Language Models got accepted to ICML!
π’ May 2024: I gave a talk on Non-Vacuous Generalization Bounds for Large Language Models at Cohere for AI and UIUC ML Reading group.
Selected Publications
Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models
Sanae Lotfi*, Yilun Kuang*, Brandon Amos, Micah Goldblum, Marc Finzi, Andrew Gordon Wilson
NeurIPS 2024
π Spotlight Presentation
ICML Workshop on Theoretical Foundations of Foundation Models, 2024
π Best Paper Award
[arxiv]
Non-Vacuous Generalization Bounds for Large Language Models
Sanae Lotfi*, Marc Finzi*, Yilun Kuang*, Tim G. J. Rudner, Micah Goldblum, Andrew Gordon Wilson
ICML 2024
[arxiv, code]
Bayesian Model Selection, the Marginal Likelihood, and Generalization
Sanae Lotfi, Pavel Izmailov, Gregory Benton, Micah Goldblum, Andrew Gordon Wilson
ICML 2022, JMLR 2023
π ICML Outstanding Paper Award, JMLR Best Papers Track
[arxiv, code, poster, talk, slides]
PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization
Sanae Lotfi*, Marc Finzi*, Sanyam Kapoor*, Andres Potapczynski*, Micah Goldblum, Andrew Gordon Wilson
NeurIPS 2022
[arxiv, code]
Dangers of Bayesian Model Averaging under Covariate Shift
Pavel Izmailov, Patrick Nicholson, Sanae Lotfi, Andrew Gordon Wilson
NeurIPS 2021
[arxiv, code, poster]
Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling
Gregory W. Benton, Wesley J. Maddox, Sanae Lotfi, Andrew Gordon Wilson
ICML 2021
π Spotlight Presentation
[arxiv, code, slides]
Stochastic First and Second Order Optimization Methods for Machine Learning
Sanae Lotfi
Masterβs Thesis, Polytechnique Montreal 2020
π Best Thesis Award