


SS12  L'interface entre les probabilités et la statistique / SS12  The Probability/Statistics Interface Org: P. Besse (Toulouse) et/and L. Devroye (McGill)
 PETER BARTLETT, University of California at Berkeley
Statistical Properties of Large Margin Classifiers

Many successful methods for classification, including AdaBoost,
support vector machines and neural networks, can be viewed as convex
relaxations of an intractable empirical risk minimization problem. We
consider the statistical properties of these large margin classifiers,
and in particular what convex optimization problems lead to accurate
classifiers. We also study the relationship between these methods and
probability models, and characterize when these methods lead to
asymptotic estimates of conditional probabilities. For methods that
predict using elements of a reproducing kernel Hilbert space, such as
support vector machines, we show that there is a tight tradeoff
between the ability to estimate conditional probabilities and the
sparseness of the solution.
Joint work with Mike Jordan, Jon McAuliffe, and Ambuj Tewari.
 YOSHUA BENGIO, Université de Montréal, Montréal, Québec, Canada
Learning the Density Structure of HighDimensional Data

Research in statistical learning algorithms aims at discovering
computer methods to learn from examples so as to generalize well to
new cases. Many nonparametric methods have recently been proposed to
discover salient features of the unknown data distribution, such as
nonlinear manifolds. We first show how many of them can be unified
under a shared framework, as estimators of particular eigenfunctions.
We then show a fundamental difficulty with such models, when the
structure of the density is too complex (or the data too noisy) given
the amount and the dimensionality of the data. We propose a generic
approach to address this problem based on sharing information about
the distribution across large subsets of the data range.
 STEPHANE BOUCHERON, CNRS Orsay

 OLIVIER BOUSQUET, Max Planck Institute for Biological Cybernetics, Tuebingen, Germany
Localized Rademacher Averages

Rademacher averages are important quantities in empirical process
theory. They also have nice concentration properties. Recently they
have been exploited in statistics in order to obtain risk bounds that
can be computed from the data. This allows to design model selection
procedures with empirical penalties.
When one considers Mestimators, it has been shown that tighter
error bounds can be obtained when the Rademacher averages are
localized, that is, when they are computed for a small ball around the
best function in the model. We give general results of this type and
show in particular some consequences for popular classification
algorithms such as Support Vector Machines and related kernel
methods.
 NICOLA CESABIANCHI, University of Milan, via Comelico 39, 20135 Milano, Italy
Selective sampling and label efficient learning

In statistical binary classification, an unknown i.i.d. source
generates a stream of data elements (e.g., vectors of features) whose
binary labels must be predicted. A classification system may improve
its performance by occasionally asking whether it made a prediction
mistake or not. In learning, this process is known as selective
sampling. In first part of the talk we will introduce a selective
sampling algorithm for simple linear i.i.d. sources and prove a bound
on its theoretical performance (mistake probability versus number of
queried labels).
In the second part of the talk we will show that a simple randomized
variants of the selective sampling algorithm works well even in the
case the unknown source generating the data is deterministic (or, in
other words, the data is generated adversarially).

