Antedependence Models for Longitudinal Data. Dale Zimmerman and Vicente Núñez Antón. Chapman & Hall/CRC, 2010, 270 pages.
Obra ressenyada: Dale ZIMMERMAN and Vicente NÚÑEZ ANTÓN, Antedependence Models for Longitudinal Data. Chapman & Hall/CRC, 2010. Machiavelli, Raúl E.Thu, 13 Jul 2017 04:06:55 GMThttp://ddd.uab.cat/record/1781692010La práctica del análisis de correspondencias. Michael Greenacre. Fundación BBVA, Rubes Editorial, 2008. ISBN: 978-84-96515-71-0
Obra ressenyada: Michael GREENACRE, La práctica del análisis de correspondencias. Fundación BBVA, Rubes Editorial, 2008. Valls Marsal, JoanThu, 13 Jul 2017 04:06:53 GMThttp://ddd.uab.cat/record/1781572008Statistical modeling of warm-spell duration series using hurdle models
Regression models for counts could be applied to the earth sciences, for instance when studying trends of extremes of climatological quantities. Hurdle models are modified count models which can be regarded as mixtures of distributions. In this paper, hurdle models are applied to model the sums of lengths of periods of high temperatures. A modification to the common versions presented in the literature is presented, as left truncation as well as a particular treatment of zeros is needed for the problem. The outcome of the model is compared to those of simpler count models. Rydén, JesperFri, 23 Jun 2017 04:14:27 GMThttp://ddd.uab.cat/record/1761512017A Bayesian stochastic SIRS model with a vaccination strategy for the analysis of respiratory syncytial virus
Our objective in this paper is to model the dynamics of respiratory syncytial virus in the region of Valencia (Spain) and analyse the effect of vaccination strategies from a health-economic point of view. Compartmental mathematical models based on differential equations are commonly used in epidemiology to both understand the underlying mechanisms that influence disease transmission and analyse the impact of vaccination programs. However, a recently proposed Bayesian stochastic susceptible-infected-recovered-susceptible model in discrete-time provided an improved and more natural description of disease dynamics. In this work, we propose an extension of that stochastic model that allows us to simulate and assess the effect of a vaccination strategy that consists on vaccinating a proportion of newborns. Jornet-Sanz, MarcFri, 23 Jun 2017 04:14:27 GMThttp://ddd.uab.cat/record/1761502017A quadtree approach based on European geographic grids :: reconciling data privacy and accuracy
Methods to preserve confidentiality when publishing geographic information conflict with the need to publish accurate data. The goal of this paper is to create a European geographic grid frame- work to disseminate statistical data over maps. We propose a methodology based on quadtree hierarchical geographic data structures. We create a varying size grid adapted to local area densities. High populated zones are disaggregated in small squares to allow dissemination of accurate data. Alternatively, information on low populated zones is published in big squares to avoid identification of individual data. The methodology has been applied to the 2014 population register data in Catalonia. Lagonigro, RaymondFri, 23 Jun 2017 04:14:27 GMThttp://ddd.uab.cat/record/1761492017Goodness-of-fit test for randomly censored data based on maximum correlation
In this paper we study a goodness-of-fit test based on the maximum correlation coefficient, in the context of randomly censored data. We construct a new test statistic under general right- censoring and prove its asymptotic properties. Additionally, we study a special case, when the censoring mechanism follows the well-known Koziol-Green model. We present an extensive simulation study on the empirical power of these two versions of the test statistic, showing their ad- vantages over the widely used Pearson-type test. Finally, we apply our test to the head-and-neck cancer data. Strzalkowska-Kominiak, EwaFri, 23 Jun 2017 04:14:27 GMThttp://ddd.uab.cat/record/1761482017Corrigendum to "Transmuted geometric distribution with applications in modelling and regression analysis of count data"
Chakraborty, SubrataFri, 23 Jun 2017 04:14:27 GMThttp://ddd.uab.cat/record/1761472017Bayesian correlated models for assessing the prevalence of viruses in organic and non-organic agroecosystems
Cultivation of horticultural species under organic management has increased in importance in recent years. However, the sustainability of this new production method needs to be supported by scientific research, especially in the field of virology. We studied the prevalence of three important virus diseases in agroecosystems with regard to its management system: organic versus non-organic, with and without greenhouse. Prevalence was assessed by means of a Bayesian correlated binary model which connects the risk of infection of each virus within the same plot and was defined in terms of a logit generalized linear mixed model (GLMM). Model robustness was checked through a sensitivity analysis based on different hyperprior scenarios. Inferential results were examined in terms of changes in the marginal posterior distributions, both for fixed and for random effects, through the Hellinger distance and a derived measure of sensitivity. Statistical results suggested that organic systems show lower or similar prevalence than non-organic ones in both single and multiple infections as well as the relevance of the prior specification of the random effects in the inferential process. Lázaro, ElenaFri, 23 Jun 2017 04:14:27 GMThttp://ddd.uab.cat/record/1761462017Comparison of two discrimination indexes in the categorisation of continuous predictors in time-to-event studies
The Cox proportional hazards model is the most widely used survival prediction model for analysing time-to-event data. To measure the discrimination ability of a survival model the concordance probability index is widely used. In this work we studied and compared the performance of two different estimators of the concordance probability when a continuous predictor variable is categorised in a Cox proportional hazards regression model. In particular, we compared the c-index and the concordance probability estimator. We evaluated the empirical performance of both estimators through simulations. To categorise the predictor variable we propose a methodology which considers the maximal discrimination attained for the categorical variable. We applied this methodology to a cohort of patients with chronic obstructive pulmonary disease, in particular, we categorised the predictor variable forced expiratory volume in one second in percentage. Barrio Beraza, IrantzuFri, 23 Jun 2017 04:14:27 GMThttp://ddd.uab.cat/record/1761452017On a property of Lorenz curves with monotone elasticity and its application to the study of inequality by using tax data
The Lorenz curve is the most widely used graphical tool for describing and comparing inequality of income distributions. In this paper, we show that the elasticity of this curve is an indicator of the effect, in terms of inequality, of a truncation of the income distribution. As an application, we consider tax returns as equivalent to the truncation from below of a hypothetical income distribution. Then, we replace this hypothetical distribution by the income distribution obtained from a general household survey and use the dual Lorenz curve to anticipate this effect. Sordo Díaz, Miguel ÁngelFri, 23 Jun 2017 04:14:27 GMThttp://ddd.uab.cat/record/1761442017Thirty years of progeny from Chao's inequality :: estimating and comparing richness with incidence data and incomplete sampling
In the context of capture-recapture studies, Chao (1987) derived an inequality among capture frequency counts to obtain a lower bound for the size of a population based on individuals' capture/non-capture records for multiple capture occasions. The inequality has been applied to obtain a non-parametric lower bound of species richness of an assemblage based on species incidence (detection/non-detection) data in multiple sampling units. The inequality implies that the number of undetected species can be inferred from the species incidence frequency counts of the uniques (species detected in only one sampling unit) and duplicates (species detected in exactly two sampling units). In their pioneering paper, Colwell and Coddington (1994) gave the name "Chao2" to the estimator for the resulting species richness. (The "Chao1" estimator refers to a similar type of estimator based on species abundance data). Since then, the Chao2 estimator has been applied to many research fields and led to fruitful generalizations. Here, we first review Chao's inequality under various models and discuss some related statistical inference questions: (1) Under what conditions is the Chao2 estimator an unbiased point estimator? (2) How many additional sampling units are needed to detect any arbitrary proportion (including 100%) of the Chao2 estimate of asymptotic species richness? (3) Can other incidence frequency counts be used to obtain similar lower bounds? We then show how the Chao2 estimator can be also used to guide a non-asymptotic analysis in which species richness estimators can be compared for equally-large or equally-complete samples via sample-size-based and coverage-based rarefaction and extrapolation. We also review the generalization of Chao's inequality to estimate species richness under other sampling-without-replacement schemes (e. g. a set of quadrats, each surveyed only once), to obtain a lower bound of undetected species shared between two or multiple assemblages, and to allow inferences about undetected phylogenetic richness (the total length of undetected branches of a phylogenetic tree connecting all species), with associated rarefaction and extrapolation. A small empirical dataset for Australian birds is used for illustration, using online software SpadeR, iNEXT, and PhD. Chao, AnneFri, 23 Jun 2017 04:14:26 GMThttp://ddd.uab.cat/record/1761432017Editor report
Guillén, MontserratThu, 15 Jun 2017 04:36:06 GMThttp://ddd.uab.cat/record/1759242008The asymptotic relative efficiency and the ratio of sample sizes when testing two different null hypotheses
Composite endpoints, consisting of the union of two or more outcomes, are often used as the primary endpoint in time-to-event randomized clinical trials. Previously, Gómez and Lagakos provided a method to guide the decision between using a composite endpoint instead of one of its components when testing the effect of a treatment in a randomized clinical trial. Consider the problem of testing the null hypotheses of no treatment effect by means of either the single component or the composite endpoint. In this paper we prove that the usual interpretation of the asymptotic relative efficiency as the reciprocal ratio of the sample sizes required for two test procedures, for the same null and alternative hypothesis, and attaining the same power at the same significance level, can be extended to the test procedures considered here for two different null and alternative hypotheses. A simulation to study the relationship between asymptotic relative efficiency and finite sample sizes is carried out. Gómez Melis, GuadalupeFri, 02 Jun 2017 04:09:21 GMThttp://ddd.uab.cat/record/1749282014Modelling extreme values by the residual coefficient of variation
The possibilities of the use of the coefficient of variation over a high threshold in tail modelling are discussed. The paper also considers multiple threshold tests for a generalized Pareto distribution, together with a threshold selection algorithm. One of the main contributions is to extend the methodology based on moments to all distributions, even without finite moments. These techniques are applied to euro/dollar daily exchange rates and to Danish fire insurance losses. Castillo, Joan delTue, 10 Jan 2017 07:36:39 GMThttp://ddd.uab.cat/record/1688542016Smoothed landmark estimators of the transition probabilities
One important goal in clinical applications of multi-state models is the estimation of transition probabilities. Recently, landmark estimators were proposed to estimate these quantities, and their superiority with respect to the competing estimators has been proved in situations in which the Markov condition is violated. As a weakness, it provides large standard errors in estimation in some circumstances. In this article, we propose two approaches that can be used to reduce the variability of the proposed estimator. Simulations show that the proposed estimators may be much more efficient than the unsmoothed estimator. A real data illustration is included. Meira-Machado, LuísTue, 10 Jan 2017 07:36:39 GMThttp://ddd.uab.cat/record/1688532016Log-ratio methods in mixture models for compositional data sets
When traditional methods are applied to compositional data misleading and incoherent results could be obtained. Finite mixtures of multivariate distributions are becoming increasingly important nowadays. In this paper, traditional strategies to fit a mixture model into compositional data sets are revisited and the major difficulties are detailed. A new proposal using a mixture of distributions defined on orthonormal log-ratio coordinates is introduced. A real data set analysis is presented to illustrate and compare the different methodologies. Comas-Cufí, MarcTue, 10 Jan 2017 07:36:39 GMThttp://ddd.uab.cat/record/1688522016Using robust FPCA to identify outliers in functional time series, with applications to the electricity market
This study proposes two methods for detecting outliers in functional time series. Both methods take dependence in the data into account and are based on robust functional principal component analysis. One method seeks outliers in the series of projections on the first principal component. The other obtains uncontaminated forecasts for each data set and determines that those observations whose residuals have an unusually high norm are considered outliers. A simulation study shows the performance of these proposed procedures and the need to take dependence in the time series into account. Finally, the usefulness of our methodology is illustrated in two real datasets from the electricity market: daily curves of electricity demand and price in mainland Spain, for the year 2012. Vilar, Juan M.Tue, 10 Jan 2017 07:36:39 GMThttp://ddd.uab.cat/record/1688512016A construction of continuous-time ARMA models by iterations of Ornstein-Uhlenbeck processes
We present a construction of a family of continuous-time ARMA processes based on p iterations of the linear operator that maps a Lévy process onto an Ornstein-Uhlenbeck process. The construction resembles the procedure to build an AR(p) from an AR(1). We show that this family is in fact a subfamily of the well-known CARMA(p,q) processes, with several interesting advantages, including a smaller number of parameters. The resulting processes are linear combinations of Ornstein-Uhlenbeck processes all driven by the same Lévy process. This provides a straightforward computation of covariances, a state-space model representation and methods for estimating parameters. Furthermore, the discrete and equally spaced sampling of the process turns to be an ARMA(p, p−1) process. We propose methods for estimating the parameters of the iterated Ornstein-Uhlenbeck process when the noise is either driven by a Wiener or a more general Lévy process, and show simulations and applications to real data. Arratia, ArgimiroTue, 10 Jan 2017 07:36:39 GMThttp://ddd.uab.cat/record/1688502016Kernel-based estimation of P(X >Y) in ranked set sampling
This article is directed at the problem of reliability estimation using ranked set sampling. A nonparametric estimator based on kernel density estimation is developed. The estimator is shown to be superior to its analog in simple random sampling. Monte Carlo simulations are employed to assess performance of the proposed estimator. Two real data sets are analysed for illustration. Mahdizadeh, MahdiTue, 10 Jan 2017 07:36:39 GMThttp://ddd.uab.cat/record/1688492016Improving the resolution of the simple assembly line balancing problem type E
The simple assembly line balancing problem type E (abbreviated as SALBP-E) occurs when the number of workstations and the cycle time are variables and the objective is to maximise the line efficiency. In contrast with other types of SALBPs, SALBP-E has received little attention in the literature. In order to solve optimally SALBP-E, we propose a mixed integer liner programming model and an iterative procedure. Since SALBP-E is NP-hard, we also propose heuristics derived from the aforementioned procedures for solving larger instances. An extensive experimentation is carried out and its results show the improvement of the SALBP-E resolution. Corominas, AlbertTue, 10 Jan 2017 07:36:39 GMThttp://ddd.uab.cat/record/1688482016A statistical learning based approach for parameter fine-tuning of metaheuristics
Metaheuristics are approximation methods used to solve combinatorial optimization problems. Their performance usually depends on a set of parameters that need to be adjusted. The selectionof appropriate parameter values causes a loss of efficiency, as it requires time, and advanced analytical and problem-specific skills. This paper provides an overview of the principal approaches to tackle the Parameter Setting Problem, focusing on the statistical procedures employed so far by the scientific community. In addition, a novel methodology is proposed, which is tested using an already existing algorithm for solving the Multi-Depot Vehicle Routing Problem. Calvet, LauraTue, 21 Jun 2016 06:15:02 GMThttp://ddd.uab.cat/record/1583132016Compound distributions motivated by linear failure rate
Motivated by three failure data sets (lifetime of patients, failure time of hard drives and failure timeof a product), we introduce three different three-parameter distributions, study basic mathematical properties, address estimation by the method of maximum likelihood and investigate finite sample performance of the estimators. We show that one of the new distributions provides a better fit toeach data set than eight other distributions each having three parameters and three distributions each having two parameters. Gitifar, NarjesTue, 21 Jun 2016 06:15:02 GMThttp://ddd.uab.cat/record/1583122016Transmuted geometric distribution with applications in modeling and regression analysis of count data
A two-parameter transmuted geometric distribution is proposed as a new generalization of the geometric distribution by employing the quadratic transmutation techniques of Shaw and Buckley. The additional parameter plays the role of controlling the tail length. Distributional properties of the proposed distribution are investigated. Maximum likelihood estimation method is discussed along with some data fitting experiments to show its advantages over some existing distributions in literature. The tail flexibility of density of aggregate loss random variable assuming the proposed distribution as primary distribution is outlined and presented along with a illustrative modelling of aggregate claim of a vehicle insurance data. Finally, we present a count regression model based on the proposed distribution and carry out its comparison with some established models. Chakraborty, SubrataTue, 21 Jun 2016 06:15:02 GMThttp://ddd.uab.cat/record/1583112016Exploring Bayesian models to evaluate control procedures for plant disease
Tigernut tubers are the main ingredient in the production of orxata in Valencia, a white soft sweet popular drink. In recent years, the appearance of black spots in the skin of tigernuts has led to important economic losses in orxata production because severely diseased tubers must be discarded. In this paper, we discuss three complementary statistical models to assess the disease incidence of harvested tubers from selected or treated seeds, and propose a measure of effectiveness for different treatments against the disease based on the probability of germination and the incidence of the disease. Statistical methods for these studies are approached from Bayesian reasoning and include mixed-effects models, Dirichlet-multinomial inferential processes and mixed-effects logistic regression models. Statistical analyses provide relevant information to carry out measures to palliate the black spot disease and achieve a high-quality production. For instance, the study shows that avoiding affected seeds increases the probability of harvesting asymptomatic tubers. It is also revealed that the best chemical treatment, when prioritizing germination, is disinfection with hydrochloric acid while sodium hypochlorite performs better if the priority is to have a reduced disease incidence. The reduction of the incidence of the black spots syndrome by disinfection with chemical agents supports the hypothesis that the causal agent is a pathogenic organism. Alvares, DaniloTue, 21 Jun 2016 06:15:02 GMThttp://ddd.uab.cat/record/1583102016A goodness-of-fit test for the multivariate Poisson distribution
Bivariate count data arise in several different disciplines and the bivariate Poisson distribution is commonly used to model them. This paper proposes and studies a computationally convenient goodness-of-fit test for this distribution, which is based on an empirical counterpart of a system ofequations. The test is consistent against fixed alternatives. The null distribution of the test can be consistently approximated by a parametric bootstrap and by a weighted bootstrap. The goodness of these bootstrap estimators and the power for finite sample sizes are numerically studied. It is shown that the proposed test can be naturally extended to the multivariate Poisson distribution. Novoa-Muñoz, FranciscoTue, 21 Jun 2016 06:15:02 GMThttp://ddd.uab.cat/record/1583092016