Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
The empirical success of Generative Adversarial Networks (GANs) caused an increasing interest in theoretical research. The statistical literature is mainly focused on Wasserstein GANs and generalizations thereof, which especially allow for good dimension reduction properties.Statistical results for Vanilla GANs, the original optimization problem, are still rather limited and require assumptions such as smooth activation functions and equal dimensions of the latent space and the ambient space. To bridge this gap, we draw a connection from Vanilla GANs to the Wasserstein distance. By doing so, problems caused by the Jensen-Shannon divergence can be avoided and existing results for Wasserstein GANs can be extended to Vanilla GANs. In particular, we obtain an oracle inequality for Vanilla GANs in Wasserstein distance. The assumptions of this oracle inequality are designed to be satisfied by network architectures commonly used in practice, such as feedforward ReLU networks. Using Hölder-continuous ReLU networks we conclude a rate of convergence for estimating probability distributions.
9:25am - 9:50am
Asymptotic Theory for Constant Step Size Stochastic Gradient Descent
Jiaqi Li1, Zhipeng Lou2, Stefan Richter3, Wei Biao Wu4
1Washington University in St. Louis; 2University of Pittsburgh; 3Heidelberg University; 4University of Chicago
We investigate the statistical behavior of Stochastic Gradient Descent (SGD) with constant step size under the framework of iterated random functions. Unlike previous studies establishing the convergence of SGD in probability measure, e.g., Wasserstein distance, our approach provides the convergence in Euclidean distance by showing the Geometric Moment Contraction (GMC) of SGD. This new convergence can address the non-stationarity of SGD due to fixed initial points and can provide a more refined asymptotic analysis of SGD. Specifically, we prove a quenched central limit theorem and a quenched invariance principle for averaged SGD (ASGD) regardless of the initial points. Furthermore, we provide a novel perspective to understand the impact of step sizes in SGD by studying its derivative with respect to the step size. The existence of stationary solutions for the first and second derivative processes are shown under mild conditions. Subsequently, we utilize multiple step sizes and show an enhanced Richardson-Romberg extrapolation with improved bias representation, which brings ASGD estimates closer to the global optimum. Finally, we propose a new online inference method and a bias-reduced variant for the extrapolated ASGD. Empirical confidence intervals are constructed and the coverage probabilities are shown to be asymptotically correct by numerical experiments.
9:50am - 10:15am
A Continuous-time Stochastic Gradient Descent Method for Continuous Data
Kexin Jin1, Jonas Latz2, Chenguang Liu3, Carola-Bibiane Schönlieb4
1Princeton University; 2University of Manchester; 3Tu Delft; 4University of Cambridge
Optimization problems with continuous data appear in, e.g., robust machine learning, functional data analysis, and variational inference. Here, the target function is an integral over a family of (continuously) indexed target functions—integrated concerning a probability measure. Such problems can often be solved by stochastic optimization methods: performing optimization steps for the indexed target function with randomly switched indices. In this talk, we will discuss a continuous-time variant of the stochastic gradient descent algorithm for optimization problems with continuous data. The stochastic gradient process consists of a gradient flow minimizing an indexed target function coupled with a continuous-time process determining the index. Index processes are, e.g., reflected diffusions, and pure jump processes in compact spaces. Thus, we study multiple sampling patterns for the continuous data space and allow for data simulated or streamed at runtime of the algorithm. We analyze the approximation properties of the stochastic gradient process and study its long-time behavior and ergodicity under constant and decreasing learning rates.