Contact us if you are interested in giving a talk to our community.
January 18, 2023; [1 pm EST; Joint with the Theory lunch]: Christopher Tosh (Columbia)
Title: Simple and near-optimal algorithms for hidden stratification and multi-group learning
Abstract: Multi-group agnostic learning is a formal learning criterion that is concerned with the conditional risks of predictors within subgroups of a population. This criterion addresses recent practical concerns in machine learning such as subgroup fairness and hidden stratification. In this talk, we will discuss the structure of solutions to the multi-group learning problem and provide corresponding simple and near-optimal algorithms.
Bio: Christopher Tosh is an associate research scientist in the Department of Epidemiology and Biostatistics at Memorial Sloan Kettering Cancer Center. His current interests are on problems arising in interactive learning, representation learning, and robust learning, particularly with applications to cancer research. Before coming to MSK, he was a postdoc at Columbia University, supervised by Daniel Hsu; and before that, he received his PhD in Computer Science from UC San Diego, where he was advised by Sanjoy Dasgupta.
Nov. 16, 2022 [1 pm EST; Joint with the Theory lunch]: Daniel Reichman (WPI)
Title: Computational problems arising from stopping contagious processes over networks
Abstract: Networks can be conductive to the spread of undesirable phenomena from false information to bankruptcy of financial institutions and contagious disease. How can we leverage algorithms to stop or slow down contagious processes? I will survey some of the literature focusing mostly on
the bootstrap percolation model which is a simple contagion model where every uninfected node is infected if it has at least r infected neighbors for some r greater than 1. I will present some improved algorithms for the binomial random graph and conclude with several open questions. Based on joint works with Hermish Mehta, Uri Feige, Michael Krivelevich and Amin Coja-Oghlan.
Bio: Daniel Reichman is an assistant professor at the department of computer science at WPI. Daniel received his PhD at the Weizmann institute and was a postdoc at Cornell University, UC Berkeley and Princeton University.
His research interests are in theoretical machine learning (in particular theoretical questions related to neural networks), AI and the study of algorithmic approaches that go beyond worst case analysis.
Nov. 9, 2022 [4 pm EST]: Margalit Glasgow (Stanford University)
Title: Max Margin Works while Large Margin Fails: Generalization without Uniform Convergence
Abstract: A major challenge in modern machine learning is theoretically understanding the generalization properties of overparameterized models. Many existing tools rely on uniform convergence (UC), a property that, when it holds, guarantees that the test loss will be close to the training loss, uniformly over a class of candidate models. However, prior work has shown that in certain simple linear and neural-network settings, any uniform convergence bound will be vacuous, leaving open the question of how to prove generalization in settings where UC fails. I'll talk about a novel way to prove generalization in two such classification settings: one with a linear model, and one in a 2-layer neural network. Our technique is a new type of margin bound showing that above a certain signal-to-noise threshold, any near-max-margin classifier will achieve almost no test loss in these two settings. Our results show that near-max-margin is important: while any model that achieves at least a (1 - epsilon)-fraction of the max-margin generalizes well, a classifier achieving half of the max-margin may fail terribly.
Oct. 26, 2022 [1 pm EST; Joint with the Theory lunch]: Pratik Chaudhari (University of Pennsylvania)
Title: Does the Data Induce Capacity Control in Deep Learning?
Abstract: Accepted statistical wisdom suggests that larger the model class, the more likely it is to overfit the training data. And yet, deep networks generalize extremely well. The larger the deep network, the better its accuracy on new data. This talk seeks to shed light upon this apparent paradox. We will argue that deep networks are successful because of a haracteristic structure in the space of learning tasks. The input correlation matrix for typical tasks has a peculiar (“sloppy”) eigenspectrum where, in addition to a few large eigenvalues (salient features), there are a large number of small eigenvalues that are distributed uniformly over a very large range. This structure in the input data is strongly mirrored in the epresentation learned by the network. A number of quantities such as the Hessian, the Fisher Information Matrix, as well as others such as correlations of activations or Jacobians, are also sloppy. Even if the model class for deep networks is very large, there is only a tiny subset of models that fit such sloppy tasks. Using these ideas, this talk will demonstrate an analytical non-vacuous generalization bound for deep networks that does not use compression. It will also discuss how these ideas can be harnessed into algorithms that learn from unlabeled data optimally.
References: 1. Does the data induce capacity control in deep learning? Rubing Yang, Jialin Mao, and Pratik Chaudhari. [ICML'22] 2. Deep Reference Priors: What is the best way to pretrain a model? Yansong Gao, Rahul Ramesh, Pratik Chaudhari. [ICML'22]
Bio: Pratik Chaudhari is an Assistant Professor in Electrical and Systems Engineering and Computer and Information Science at the University of
Pennsylvania. He is a member of the GRASP Laboratory. From 2018-19, he was a Senior Applied Scientist at Amazon Web Services and a Postdoctoral Scholar in Computing and Mathematical Sciences at Caltech. Pratik received his PhD (2018) in Computer Science from UCLA, his Master's (2012) and Engineer's (2014) degrees in Aeronautics and Astronautics from MIT. He was a part of NuTonomy Inc. (now Hyundai-Aptiv Motional) from 2014—16. He received the NSF CAREER award and the Intel Rising Star Faculty Award in 2022.
.