
Recently, generalized inference has become an efficient and useful tool which gives more accurate intervals for a variety of intractable complex problems such as the BehrensFisher problem. In this talk, we will present a generalized inference solution of typical BehrensFisher problem in general locationscale families. The proposed solution is based on the minimum risk equivariant estimators and thus, the underlying approach is an extension of the methods based on maximum likelihood estimators and conditional inference, which have been so far, applied to some specific distributions. Finally, we will present some simulation results as well as analysis results of two real data sets.
Empirical likelihood is a popular nonparametric or semiparametric statistical method with many nice statistical properties. Yet when the sample size is small, or the dimension of the accompanying estimating function is high, the application of the empirical likelihood method can be hindered by low precision of the chisquare approximation and by nonexistence of solutions to the estimating equations. In this paper, we show that the adjusted empirical likelihood is effective at addressing both problems. With a specific level of adjustment, the adjusted empirical likelihood achieves the highorder precision of the Bartlett correction, in addition to the advantage of a guaranteed solution to the estimating equations. Simulation results indicate that the confidence regions constructed by the adjusted empirical likelihood have coverage probabilities comparable to or substantially more accurate than the original empirical likelihood enhanced by the Bartlett correction.
We study estimation and variable selection problems in mixtureofexperts (MOE) models. A new modified maximum likelihood estimation (MMLE) method is proposed. It is shown that the MMLE is rootn consistent, and simulations indicate its better finite sample behavior compared to the ordinary MLE. For variable selection, we apply two penalty functions to the modified likelihood.The method is computationally efficient, and theoretically it is shown to be consistent in variable selection. Two Bayesian information criteria are suggested for data adaptive choice of tuning parameters. A modified EMNewtonRaphson algorithm is developed for numerical computations. The performance of the method is also studied through simulations. A real data analysis is presented.
There is a large amount of publicly available financial information on publicly traded corporations, usually on a quarterly year time period. These same corporations also undergo bankruptcy or acquisition through merger. It is natural to model these in a discrete time framework due to the nature of the data. We consider a bivariate discrete time hazard model. The framework is similar to that in classical biostatistics modeling, where one treats the two forms of exit from the system, namely bankruptcy and merger/acquisition, but with additional information on the type of exit. In biostatistics the cause of exit (usually death) is not known explicitly.
Such models are constructed and fit to a data base of some 12,000 publicly traded US corporations. With a large number of covariates some data reduction is needed. Both in and out of sample prediction is considered. A constant baseline hazard model does not fit well, so a smooth baseline hazard model is considered. This later model seems to give a reasonable fit in terms of prediction, and has a nice robustness property. Some tools for model assessment are developed. One useful tool for this is a limit theorem on rare multinomials which is originally due to McDonald (1980).
This is joint work with Dr Taehan Bae.
In Bayesian analysis of multiway contingency tables, the selection of a prior distribution for either the loglinear parameters or the cell probabilities parameters is a major challenge. In this talk we define a flexible family of conjugate priors for the wide class of discrete hierarchical loglinear models which includes the class of graphical models. These priors are defined as the DiaconisYlvisaker conjugate priors on the loglinear parameters subject to "baseline constraints" under multinomial sampling. We also derive the induced prior on the cell probabilities and show that the induced prior is a generalization of the hyper Dirichlet prior. We show that this prior has several desirable properties and illustrate its usefulness by identifying the most probable decomposable, graphical and hierarchical loglinear models for a sixway contingency table.
This work has been done in cooperation with Jinnan Liu and Adrian Dobra.
It is often reasonable to assume that the dependence structure of a bivariate continuous distribution belongs to the class of extremevalue copulas. The latter are characterized by their Pickands dependence function. The talk is concerned with a procedure for testing whether this function belongs to a given parametric family. The test is based on a Cramérvon Mises statistic measuring the distance between an estimate of the parametric Pickands dependence function and either one of two nonparametric estimators thereof studied by Genest and Segers (2009). As the limiting distribution of the test statistic depends on unknown parameters, it must be estimated via a parametric bootstrap procedure, whose validity is established. Monte Carlo simulations are used to assess the power of the test, and an extension to dependence structures that are lefttail decreasing in both variables is considered.
In this talk, we study the detection of the multiple change points of parameters of generalized lambda distributions (GLD). The advantage of studying the GLD is that the GLD family is broad and flexible, compared to the other distributions. There are fewer restrictions on the distribution while fitting the data. We combine the binary segmentation procedure together with Schwarz information criterion (SIC) to search all the possible change points in the data. The method is applied on fibroblast cancer cell line data which is publicly available, and change points are successfully located.
Repeated measurements are collected in a variety of situations and are generally characterized by a mixed model where the correlation within the subject is specified by the random effects. In such a mixed model, we propose a multiple comparison procedure based on a variant of the Schwarz information criterion (SIC, Schwarz, 1978). The derivation of SIC indicates that SIC serves as an asymptotic approximation to a transformation of the Bayesian posterior probability of a candidate model. Therefore, an approximated posterior probability for a candidate model can be calculated based upon SIC. We suggest a variant of SIC which includes the terms which are asymptotically negligible in the derivation of SIC. The variant improves upon the performance of SIC in small and moderate samplesize applications. Based upon the proposed variant, the corresponding posterior probability is calculated for each candidate model. A hypothesis testing for multiple comparisons involves one or more models in the candidate class, the posterior probability of the hypothesis testing is therefore evaluated as the sum of the posterior probabilities for the models associated with the testing. The approximated posterior probability based on the variant accommodates the effect of the prior on each model in the candidate class, and therefore is more effectively approximated than that based on SIC for conducting multiple comparisons. We derive the computational formula of the approximated posterior probability based on the variant in the mixed model. The applications demonstrate that the proposed procedure based on the SIC variant can perform effectively in multiple comparisons.
In recent decades stochastic graphs have been used in many fields to explain the evolution of a set of random objects (vertices),along with a relationship structure (edges). We consider statistical inference in a dynamic random graph, in the absence of edge information. It is shown that the dynamic behavior of the graph, accompanied with the vertex information is a useful in making inference about the edges. The problem is motivated by the footandmouth disease (FMD) outbreak in the UK in 2001. A stochastic Euclidean graph model with Markov property, is introduced to model this epidemic. In addition, it is shown that the existing information, is sufficient to draw inference about the model and hence the missing edges.
In this talk I will discuss Bayesian hypothesis testing in the two sample problem. I will introduce some procedures based on a Bayesian nonparameteric formulation, and examine their performance in comparison to classical nonparametric procedures.
This is joint work with Chris Holmes (Oxford), François Caron (Bordeaux) and Jim Griffin (Kent).
Estimation of the incidence rate of a disease generally entails the followup of a diseasefree cohort until a sufficient number of incident cases of the disease have been observed. Sometimes it is possible, however, to avoid the time and cost of carrying out an incidence study by following prevalent cases with the disease forward for a relatively short time period. That is, we may identify prevalent cases through a crosssectional survey and follow them forward as part of what is known as a prevalent cohort study with followup. In this presentation we show how one may find the maximum likelihood estimator of the agespecific constant incidence rate from a prevalent cohort study with followup. Our key expression is related to the wellknown epidemiological relationship between incidence, prevalence and disease duration. We apply our results to estimate the incidence rate of dementia in Canada.
Joint work with Victor Addona (Macalester College, St. Paul, MN) and Masoud Asgharian (McGill University).
Suppose that Y = (Y_{i}) is a normal random matrix with mean Xb and covariance s^{2} I_{n}, where b is a pdimensional vector (b_{j}), X = (X_{ij}) is an n×p matrix with X_{ij} Î {1,1}; this corresponds to a factorial design with 1,1 representing low or high level respectively, or corresponds to a weighing design with 1,1 representing an object j with weight b_{j} placed on the left and right of a chemical balance respectively. Eoptimal designs Z are chosen that are robust in the sense that they remain Eoptimal when the covariance of Y_{i},Y_{i¢} is r > 0 for i ¹ i¢. Within a smaller class of designs similar results are obtained with respect to a general class of optimality criteria which include the A and Dcriteria.
The talk is based on my three joint papers with Joe Masaro published in 2008 in JSPI and LAA.
In this talk, the mode estimator based on the ParzenRosenblatt kernel estimator is considered (Parzen, 1962). In light of Shi et al. (2009), under mild conditions, we establish the relationship between the convergence rate of the mode estimator and the window width. In this way, we obtain a better convergence rate of the mode estimator.
This is joint work with X. Shi and B. Miao.
The composite likelihood method has been proposed and systematically discussed by Besag (1974), Lindsay (1988), and Cox and Reid (2004). The approach based on using the composite likelihood, especially the pairwise likelihood, has received increasing attention in recent years due to the simplicity in defining the objective function and computational advantages when dealing with data with complex structures. In this talk, I will discuss some modeling issues concerning the composite likelihood formulation.
This is joint work with Nancy Reid.
The geometric downweighting method can be applied to enlarge an existing discrete distribution family. The enlarged family has one more parameter which regulates the decreasing rate of probability mass function, thus, yielding new moment features. Applying the geometric downweighting method to a family with infinite mean, we can obtain an enlarged family which can have both finite and infinite means. Such an enlarged family can accommodate for the heavytailed count data, because it allows various tail heaviness. Particularly, when applying this method to the twoparameter discrete stable family which has infinite mean, we obtain a threeparameter discrete distribution family called the generalized Poissoninverse Gaussian (GPIG). Apart from the extremely heavytailed discrete stable, the GPIG family extends the overdispersed Poissoninverse Gaussian (PIG) and also includes the equallydispersed Poisson. Therefore, the GPIG family is flexible in handling less or more overdispersed count data. We illustrate the GPIG family by the application of the citation counts of published articles in 1990 in JASA and JSPI respectively.