Statistics
Org: Yannick Baraud (Nice) and Boris Levit (Queen's) [PDF]
 SYLVAIN ARLOT, Université ParisSud 11, Orsay F91405, France
Vfold crossvalidation improved: Vfold penalization
[PDF] 
We investigate the efficiency of Vfold crossvalidation (VFCV) for model selection from the nonasymptotic viewpoint, and suggest an improvement on it, which we call "Vfold penalization".
First, considering a particular (though simple) regression problem, we will show that VFCV with a bounded V is suboptimal for model selection. The main reason for this is that VFCV "overpenalizes" all the more that V is large. Hence, asymptotic optimality requires V to go to infinity. However, when the signaltonoise ratio is low, it appears that overpenalizing is necessary, so that the optimal V is not always the larger one, despite of the variability issue. This is confirmed by some simulated data.
In order to improve on the prediction performance of VFCV, we propose a new model selection procedure, called "Vfold penalization" (penVF). It is a Vfold subsampling version of Efron's bootstrap penalties, so that it has the same computational cost as VFCV, while being more flexible. In a heteroscedastic regression framework, assuming the models to have a particular structure, penVF is proven to satisfy a nonasymptotic oracle inequality with a leading constant almost one. In particular, this implies adaptivity to the smoothness of the regression function, even with a highly heteroscedastic noise. Moreover, it is easy to overpenalize with penVF, independently from the V parameter. As shown by a simulation study, this results in a significant improvement on VFCV in several nonasymptotic situations.
 PROF. MIKLÓS CSÖRGÖ, Carleton University, 1125 Colonel By Drive, Ottawa, ON, Canada K1S 5B6
Longrange dependent vs. i.i.d.r.v.'s based BahadurKiefer, Vervaat and Vervaat error processes
[PDF] 
This talk will review recent developments on longrange dependent sequences based processes as in M. Csörgö, B. Szyszkowicz and L. Wang, Strong invariance principles for sequential BahadurKiefer and Vervaat error processes for longrange dependent sequences, Ann. Statist. 34(2006), 10131044; ibid., (6) 35(2007); M. Csörgö and R. Kulik, Reduction principles for quantile and BahadurKiefer processes of longrange dependent linear sequences, Probab. Theory Rel., to appear; M. Csörgö and R. Kulik, Weak convergence of Vervaat and Vervaat error processes of longrange dependent sequences, J. Theoret. Probab., to appear; and their references. The paper by E. Csáki, M. Csörgö, A. Földes, Z. Shi and R. Zitikis, Pointwise and uniform asymptotics of the Vervaat error process, J. Theoret. Probab. 15(2002), 845875, constitutes an i.i.d. backdrop for the recent developments as in the just mentioned papers on longrange dependent sequences based processes. For example, it is well known that the BahadurKiefer and Vervaat error processes cannot converge weakly in the i.i.d. case. In contrast to this, when these processes are based on certain longrange dependent sequences, then we, for example, conclude that they do converge weakly to appropriate DehlingTaqqu type limit processes that are based on H. Dehling, M.S. Taqqu, The empirical process of some long range dependent sequences with an application to Ustatistics, Ann. Statist. 17(1989), 17671783.
 ARNAK DALALYAN, Université Pierre et Marie Curie, Paris 6
Sparsity and sharp PACBayesian bounds for the EWaggregate
[PDF] 
We study the problem of aggregation under the squared loss in the model of regression with deterministic design. We obtain sharp PACBayesian risk bounds for aggregates defined via exponential weights (EWaggregate), under general assumptions on the distribution of errors and on the functions to aggregate. We then apply these results to derive sparsity oracle inequalities.
This is a joint work with Alexandre Tsybakov.
 ELISABETH GASSIAT, Université ParisSud 11, Batiment 425, 91405 Orsay Cedex, France
A non parametric BersteinVon Mises theorem
[PDF] 
Given a probability mass function q = ( q(i) )_{i Î N*} on the set of positive integers, let the observations be independent and identically distributed according to q. We consider the asymptotic behaviour of posterior distributions of ( q(i) )_{1 £ i £ kn} where k_{n} is an increasing sequence tending to infinity, n being the number of observations. Under suitable assumptions on the prior (smoothness and prior mass put on Fisher balls) and on the true probability q_{0}, we prove a non parametric BernsteinVon Mises theorem: the posterior distribution concentrates around the true q_{0} at speed Ön and its variation distance to the Gaussian distribution with variance the inverse of the Fisher information tends to zero in probability.
We use this result for the construction of confidence intervals for functionals of q such as the Shanonn entropy or Renyi entropies. Indeed, BernsteinVon Mises theorems hold for the posterior distribution of the plugin estimator.
 CHRISTOPHE GIRAUD, Université de Nice, laboratoire J.A. Dieudonné, Parc Valrose, 06108 Nice, France; et INRA, laboratoire MIA, Parc Vilvert, 78352 JouyenJosas, France
Estimation of Gaussian Graphs by Model Selection
[PDF] 
Biological systems involve complex networks of interactions between entities such as genes or proteins. One of the challenge of the postgenomic is to infer these networks from highthroughoutput data produced by recent biotechnological tools. The task is challenging for the statistician due to the very highdimensional nature of the data. For example, microarrays measure the expression level of a few thousand genes (typically 4000) whereas the sample size n is no more than a few tens.
Valuable tools for analyzing these network of interactions are the Gaussian Graphical Models. The vector X = (X_{1},¼,X_{p}) of the expression levels of the p genes is modeled by a Gaussian variable in R^{p}. Then, the Gaussian Graph has an edge between the genes i and j if and only if X_{i} is not independent of X_{j} conditionally on the other variables. The goal of the statistician is to infer these edges from a nsample of the variables X.
We propose a statistical procedure to estimate the graph of conditional dependences from X. We first introduce a collection of candidate graphs and then select one of them by minimizing a penalized empirical risk. The performance of the procedure is assessed in a nonasymptotic setting without any hypotheses on the covariance matrix. These good theoretical properties of the procedure are confirmed by numerical results. We pay a special attention to the maximal degree D of the graphs that we can handle, which turns to be roughly n/( 2log(p/D) ).
 HANNA JANKOWSKI, York University, Toronto
Nonparametric Estimation of a BathtubShaped Hazard
[PDF] 
In the analysis of lifetime data, a key object of interest is the hazard function, or instantaneous failure rate. One natural assumption is that the hazard is bathtub, or Ushaped (i.e., first decreasing, then increasing). In particular, this is often the case in reliability engineering or human mortality.
In this talk I will discuss the nonparametric MLE and LSE of the hazard function under the additional assumption that it is also convex. These estimators are consistent and converge locally at a rate of n^{2/5}.
Both iid sampling and iid sampling under right censoring will be considered.
This is joint work with Jon Wellner.
 JOSEPH JOHNSON, Queen's

 PETER KIM, University of Guelph, Guelph, Ontario N1G 2W1
Multivariate Topological Data Analysis
[PDF] 
Assume that a finite set of points is randomly sampled from a subspace of a metric space. Recent advances in computational topology have provided several approaches to recovering the geometric and topological properties of the underlying space. In this talk we take a nonparametric statistical approach to this problem. We assume that the data is randomly sampled from an unknown probability distribution. We define two filtered complexes with which we can calculate the persistent homology of a probability distribution. Using nonparametric statistical density estimators, we show that we can recover the persistent homology of the underlying distribution.
 KEITH KNIGHT, University of Toronto
Some asymptotics for elemental regression estimators
[PDF] 
Consider a linear regression model Y_{i} = x_{i}^{T} b+ e_{i} (i = 1,...,n). Elemental regression estimators are defined to be estimators based on exactly p cases where p is the dimension of the predictors {x_{i}}. Some estimators (for example, L_{1} estimators) are exactly elemental estimators while in other cases, estimators can be wellapproximated by elemental estimators. In this paper, we will consider the asymptotic distribution for the approximation error for a certain class of estimators as well as the asymptotic distribution of the predictors in the best elemental set.
 OLEG LEPSKI, Université de Provence, CMI 53, rue F. Joliot Curie, 13453 Marseille, France
Universal estimation routine
[PDF] 
We develop a general estimation procedure that is based on selection of estimators from a large collection. An upper bound on the risk described by an arbitrary seminorm is established, and it is shown that the proposed selection procedure specialized for different collections of estimators leads to minimax and adaptive minimax estimators in various settings.
 BORIS LEVIT, Queen's University, Department of Mathematics and Statistics, Kingston, ON K7L 3N6
A fresh look at the minimax
[PDF] 
Some new nonasymptotic upper and lower bounds for the minimax quadratic risk will be presented both in the problem of estimating restricted normal means and for the White Gaussian Noise model on the real line.
In the latter case, some general ellipsoidal functional classes will be introduced, including classes of entire functions of exponential type, PaleyWiener classes of analytic functions, and Sobolev classes. A comparison of the minimax risks for these classes will be discussed, based on the proposed risk bounds, as well as some numerical results.
These results demonstrate that the commonly perceived notion of a relation between the smoothness of unknown function and the accuracy of estimation can be misguided. In particular, the notion of optimal rates of convergence, dominating asymptotic statistics for the last three decades, can be misleading and indeed counterproductive.
 JIM RAMSEY, McGill University, 1205 Dr. Penfield Ave., Montreal
Parameter Estimation for Differential Equations: A Generalized Smoothing Approach
[PDF] 
We propose a new method for estimating parameters in models defined by a system of nonlinear differential equations from noisy measurements on a subset of variables. The approach is based on a modification of data smoothing methods along with a generalization of profiled estimation, and we refer to it as parameter cascading. We derive parameter estimates and confidence intervals for these estimates, and show that these have low bias and good coverage properties, respectively, for data simulated from models in chemical engineering and neurobiology. The performance of the method is demonstrated using realworld data from chemistry and from the progress of the autoimmune disease lupus.
 PATRICIA REYNAUDBOURET, CNRS Ecole Normale Superieure (DMA)
Threshold methods for Poisson processes with unknown or infinite support
[PDF] 
The first problem is to estimate the intensity of a Poisson process with unknown or infinite support. The intensity is assumed to be L^{1} and L^{2}. We propose a complete datadriven threshold that we calibrate in an almost optimal way and this from a theoretical and practical point of view. The maxiset of the procedure is the intersection of a classical Besov space and a weak Besov space. The weak Besov space corresponds to sparse functions: a very small number of wavelet coefficients are significant but their location is not fixed. Our procedure is minimax on these sets with the correct power of the logarithmic term.
The second problem is to test whether a Poisson process is homogeneous or not on a finite interval. Let us suppose that the alternative belongs to a weak Besov space. This corresponds, roughly speaking, to the case where there are a small number of locations where the alternative is different from the uniform, but the position of these locations are unknown. Then, surprisingly, the separation rate is the same as the minimax estimation rate, meaning that it is as difficult to test as to estimate. Moreover a threshold procedure achieves this rate.
The first problem is a joint work with V. Rivoirard. The second problem is a joint work with M. Fromont and B. Laurent.
 MUNI SHANKER SRIVASTAVA, University of Toronto, Department of Statistics, 100 St. George Street, Toronto, Ontario
Analyzing High Dimensional Data with Fewer Observations
[PDF] 
In this talk, I consider the problem of analyzing datasets obtained on several characteristics m of an individual, such as in DNA microarrays in which such characteristics are typically in several thousands. The number, however, of subjects N in which these observations can be taken is typically small, often less than 50. Standard multivariate theory requires that m be substantially smaller than N, and very little theory is available to analyze the case when N is less than m. In the last decade or so, results have been obtained for this case also. I will present some of these results.
Using these results, I will analyze two datasets from microarrays. Both datasets require the comparison of two groups, the socalled twosample test. Often, it is required to find the components (or genes) that may have caused the rejection of the equality of two mean vectors. An alternative to the currently used False Discovery rate (FDR) will be presented.
 GILLES STOLTZ, CNRS; Ecole Normale Supérieure, Paris; HEC Paris
Strategies for prediction under imperfect monitoring
[PDF] 
We propose simple randomized strategies for sequential prediction under imperfect monitoring, that is, when the forecaster does not have access to the past outcomes but rather to a feedback signal. The proposed strategies are consistent in the sense that they achieve, asymptotically, the best possible average reward. It was Rustichini
(1999) who first proved the existence of such consistent predictors.
The forecasters presented here offer the first constructive proof of consistency. Moreover, the proposed algorithms are computationally efficient. We also establish upper bounds for the rates of convergence.
In the case of deterministic feedback, these rates are optimal up to logarithmic terms.
Joint work with Shie Mannor, McGill University, Montreal, and Gabor Lugosi, ICREA and Universitat Pompeu Fabra, Barcelona.
 JEANPHILIPPE VERT, Mines ParisTech and Institut Curie, Paris, France
Collaborative filtering in Hilbert spaces with spectral regularization
[PDF] 
Collaborative Filtering (CF) refers to the task of learning preferences of customers for products, such as books or movies, from a set of known preferences. More formally, this can be seen as the task of filling missing entries in a matrix where some entries are known. A standard approach to CF is to find a low rank approximation to the matrix. This problem is computationally difficult and some authors have proposed recently to search instead for a low trace norm matrix, which results in a convex optimization problem. We generalize this approach to the estimation of a compact operator, of which matrix estimation is a special case. We develop a notion of spectral regularization which captures both rank constraint and trace norm regularization, as well as many others. The major advantage of this approach is that it provides a natural method of utilizing sideinformation, such as age and gender, about the customers (or objects) in questiona formerly challenging limitation of the lowrank approach. We provide a number of algorithms, and test our results on a standard CF dataset with promising results.
This is a joint work with Jacob Abernethy (UC Berkeley), Francis Bach (INRIA), and Theodoros Evgeniou (INSEAD).
