Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
Joint empirical risk minimization for instance-dependent positive-unlabeled data
Jan Mielniczuk1, Wojciech Rejchel2, Paweł Teisseyre3
1Warsaw University of Technology, Poland; 2Nicolaus Copernicus University, Poland; 3Institute of Computer Science, PAS, Poland
Learning from positive and unlabeled data is actively researched machine learning task. The goal is to train a binary classification model based on a training dataset containing part of positives which are labeled, and unlabeled instances. Unlabeled set includes remaining part of positives and all negative observations. Unlike in many prior works, we consider a realistic setting for which probability of label assignment, i.e. propensity score, is instance-dependent. In proposed approach we investigate minimizer of an empirical counterpart of a joint risk which depends on both posterior probability of inclusion in a positive class as well as on a propensity score. The non-convex empirical risk is alternately optimised with respect to parameters of both functions. In the theoretical analysis we establish risk consistency of the minimisers using recently derived methods from the theory of empirical processes. Besides the important development here is a proposed novel implementation of an optimisation algorithm, for which sequential approximation of a set of positive observations among unlabeled ones is crucial. Experiments conducted on 20 data sets for various labeling scenarios show that the proposed method works on par or more effectively than state-of-the-art methods based on propensity function estimation.
1:10pm - 1:40pm
Analysis of the rate of convergence of an over-parametrized convolutional neural network image classifier learned by gradient descent
Michael Kohler1, Adam Krzyzak2
1Technical University of Darmstadt, Germany; 2Concordia University, Canada
In deep learning, the task is to estimate the functional relationship between input and output using deep neural networks. In image classification, the input data consists of observed images and the output data represents classes of the corresponding images that describe what kind of objects are present in the images. The most successful methods, especially in the area of image classification can be attributed to deep learning approaches and, in particular, to convolutional neural networks (CNNs). Recently, Kohler, Krzyzak and Walter have shown that CNN image classifiers that minimize empirical risk are able to achieve dimension reduction, however, in practice, it is not possible to compute the empirical risk minimizer. Instead, gradient descent methods are used to obtain a small empirical risk. Furthermore, the network topologies used in practice are over-parameterized, i.e., they have many more trainable parameters than training samples. The goal of this work is to derive the rate of convergence results for over-parameterized CNN image classifiers, which are trained by gradient descent. Thus this work should provide a better theoretical understanding of the empirical success of CNN image classifiers.
Bayes Risk Consistency of Nonparametric Classification Rules for Spike Trains Data
Miroslaw Pawlak
University of Manitoba, Canada
Spike trains data find a growing list of applications in computational neuroscience,
imaging, streaming data and finance. Machine learning strategies for spike
trains are based on various neural network and probabilistic models. The probabilistic
approach is relying on parametric or nonparametric specifications of the underlying spike
generation model. In this paper we
consider the two-class statistical classification problem for a class of spike train data
characterized by nonparametrically specified intensity functions. We derive the optimal
Bayes rule and next form the plug-in nonparametric kernel classifier. Asymptotical
properties of the rules are established including the limit with respect to the
increasing recording time interval and the size of a training set.
In particular the convergence of the kernel classifier to the Bayes rule is proved.
The obtained results are supported by a finite sample simulation studies.