Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
Signpost testing to navigate the high-dimensional parameter space of the linear regression model
Wessel N. van Wieringen
Amsterdam UMC, Netherlands, The
We present a hypothesis test to guide the search for the location of the parameter of the linear regression model in the high-dimensional setting. This parameter value may be unknown but often (part of) it is known in a different but similar context. Such external information can serve as a textit{signpost} in the vast parameter space. Our statistical hypothesis test evaluates the signpost's direction of the true parameter's location. Our test statistic measures the relevance of the signpost's direction. We derive the test statistic's limiting distribution and provide approximations to other cases. The signpost's significance is assessed by comparing the signpost's direction to that of randomly rotations of this direction. We present an Bayesian interpretation of the signpost test and its connection to the global test. In simulation we investigate the signpost test's type I error and power, with particular interest in the effect of regularization and high-dimensionality in finite samples on these properties, and under misspecification of the alternative hypothesis.
We close with an application of the signpost test to a breast cancer study, which shows that the regression parameter estimate of a more prevalent subtype is informative for learning the same parameter in a less prevalent one.
1:55pm - 2:20pm
High-dimensional vine copula regression for mixed continuous-ordinal features
Özge Şahin
Delft University of Technology, Netherlands, The
Vine copulas are a flexible class of multivariate distributions beyond normality and allow for asymmetrical dependence structures in data. Their conditional distributions, important for prediction tasks, provide a flexible data representation. However, as the number of features in data increases, the computational complexity of estimating model parameters increases, and an overfitting problem arises. Hence, we propose methods to select relevant, irrelevant, and redundant variables for estimating conditional distributions of vine copulas in the presence of mixed continuous-ordinal features. We provide some empirical results to compare vine copula and machine learning models for prediction.
2:20pm - 3:10pm
Reviving pseudo-inverses: Asymptotic properties of large dimensional Moore-Penrose and Ridge-type inverses with applications
Taras Bodnar1, Nestor Parolya2
1Stockholm University, Sweden; 2Delft University of Technology, The Netherlands
In this paper, we derive high-dimensional asymptotic properties of the Moore-Penrose inverse and the ridge-type inverse of the sample covariance matrix. In particular, the analytical expressions of the weighted sample trace moments are deduced for both generalized inverse matrices and are present by using the partial exponential Bell polynomials which can easily be computed in practice. The existent results are extended in several directions: (i) First, the population covariance matrix is not assumed to be a multiplier of the identity matrix; (ii) Second, the assumption of normality is not used in the derivation; (iii) Third, the asymptotic results are derived under the high-dimensional asymptotic regime. Our findings are used in constructing improved shrinkage estimators of the precision matrix, which asymptotically minimize the quadratic loss with probability one. Also, shrinkage estimators for the weights of the global minimum variance portfolio are obtained. Finally, the finite sample properties of the derived theoretical results are investigated via an extensive simulation study.