Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
Inference via robust optimal transportation: theory and methods
Davide La Vecchia, Hang Liu, Matthieu Lerasle, Yiming Ma
University of Geneva, Switzerland
Optimal transportation theory and the related $p$-Wasserstein distance ($W_p$, $pgeq 1$) are widely-applied in statistics and machine learning. In spite of their popularity, inference based on these tools has some issues. For instance, it is sensitive to outliers and it may not be even defined when the underlying model has infinite moments. To cope with these problems, first we consider a robust version of the primal transportation problem and show that it defines the {robust Wasserstein distance}, $W^{(lambda)}$, depending on a tuning parameter $lambda > 0$. Second, we illustrate the link between $W_1$ and $W^{(lambda)}$ and study its key measure theoretic aspects. Third, we derive some concentration inequalities for $W^{(lambda)}$. Fourth, we use $W^{(lambda)}$ to define minimum distance estimators, we provide their statistical guarantees and we illustrate how to apply the derived concentration inequalities for a data driven selection of $lambda$. Fifth, we provide the {dual} form of the robust optimal transportation problem and we apply it to machine learning problems (generative adversarial networks and domain adaptation). Numerical exercises %(on simulated and real data) provide evidence of the benefits yielded by our novel methods.
4:55pm - 5:20pm
Inference for topological data analysis
Wolfgang Polonik1, Johannes Krebs2, Benjamin Roycraft3
1University of California, Davis, United States of America; 2Catholic University of Eichstätt, Germany; 3University of California, Davis, United States of America
This talk presents some novel contributions to persistence homology based statistical inference for Topological Data Analysis (TDA). Along the way, we are also discussing, on a more general level, statistical challenges underlying the construction of such inference methods. The presented novel inference methods consist of bootstrap based confidence regions for (persistent) Betti numbers and Euler characteristic curves. In contrast to most of the other existing inference methods for TDA, our methods are based on one data set of size n, and large sample guarantees are thus established for n tending to infinity. On a technical level, the presented results depend critically on the notion of stabilization that has been developed in geometric probability theory.
5:20pm - 5:45pm
Multi-study learning approaches
Roberta De Vito
Brown University, United States of America
Biostatistics increasingly face the urgent challenge of efficiently dealing with extensive experimental data. In particular, high-throughput assays are transforming the study of biology as they generate a complex and diverse collection of high-dimensional data sets. Through compelling statistical analysis, these extensive data sets lead to inaccessible discoveries and knowledge. Building such systematic knowledge is a cumulative process requiring analyses integrating multiple sources, studies, and technologies. The increased availability of studies on related clinical populations poses two important multi-study statistical questions: 1) To what extent is biological signal reproducibly shared across different studies? 2) How can we detect and quantify local signals that may be masked by global solid signals? We will answer these questions by introducing a novel class of methodologies for the joint analysis of different studies. The goal is to identify and estimate separately common factors reproduced across multiple studies and study-specific factors. We present different medical and biological applications. In all the cases, we clarify the benefits of a joint analysis compared to the standard methods. Our method could accelerate the pace at which we can combine unsupervised analysis across different studies and understand the cross-study reproducibility of signals in multivariate data.