Seminar Details
Title:
Confidence Regions for the Intensity Function of a Cyclic Poisson Process
Speaker: Professor Roelof Helmers, Centre of Mathematics and Computer Science, Amsterdam, The Netherlands
Date: 15 December 2008 (Monday)
Time: 4:00pm - 5:00pm
Venue:
S16-06-118, DSAP Seminar Room
Abstract
We construct and investigate various confidence bands for the intensity function of a cyclic Poisson process via extreme value type asymptotic results for the appropriately normalized supremum of the difference between the intensity function and its empirical estimator of non parametric kernel type.
We formulate our results separately for two cases: the case of known period and the (more general) case of an unknown period. In the case that the period is unknown we need to impose certain assumptions that are somewhat stronger that those needed in the case of known period.
This is joint work with Qiying Wang(Sydney) and Ricardas Zitikis( London, Ont.)
Title:
Model Selection Methods and Their Applications In Genome-Wide Association Studies
Speaker: Ms Zhao Jingyuan, Department of Statistics and Applied Probability, National University of Singapore
Date: 27 November 2008 (Thursday)
Time: 3:00pm - 4:00pm
Venue:
S16-06-118, DSAP Seminar Room
Abstract
As the data where the number of features greatly exceeds the number of observations frequently appear in many areas, high dimensional model selection problem has become common and imminent. We propose the generalized tournament approach cum EBIC in the context of generalized linear models with both main effects and interactions. In the screening step, main effects and interactions are screened in consecutive stages until the dimension of features is reduced to a desirable level. In the final selection step, the modified SCAD method combined with EBIC is developed to choose the causal features. It is shown that the modified SCAD method guarantees finite parameter estimates in case of separation phenomenon. In genome-wide association studies, there is a growing demand for statistical methods to identify causal genes with interaction structure. The generalized tournament approach cum EBIC is applied in genome-wide association studies to detect genetic variants associated with some common diseases. In a number of simulated data, we demonstrate that the generalized tournament approach cum EBIC enjoys high positive selection rate and lower false discovery rate in comparison with other approaches.
Title:
Quantile Regression with Time-Varying Regressors
Speaker: Professor Chen Songnian, Department of Economics, National University of Singapore
Date: 20 November 2008 (Thursday)
Time: 4:00pm - 5:00pm
Venue:
S16-06-118, DSAP Seminar Room
Abstract
Since the seminal work of Koenker and Bassett (1978), quantile regression has become a widely used tool in econometric duration analysis and statistical survival analysis. The existing literature, however, has focused on cases with time-invariant regressors. In this paper, we introduce a quantile regression framework with time-varying regressors, thus providing an attractive alternative to the Cox regression model with time-varying regressors. Our approach is motivated by the maximum score estimation of Manski (1975) for the binary choice model and its link to the censored regression model. We show that the proposed estimator is consistent and asymptotically normal under some regularity conditions.
Title:
TDT for Human QTL Mapping and Genome-wide Association Study
Speaker: Ms Hao Ying, Department of Statistics and Applied Probability, National University of Singapore
Date: 11 November 2008 (Tuesday)
Time: 2:00pm - 3:00pm
Venue:
S16-06-118, DSAP Seminar Room
Abstract
An efficient and economical sampling approach ERS is proposed to extend TDT to QTL mapping, and the effect on the power of various TDT is studied. The simulation study is carried out to compare the power of TDT with ERS and with conventional truncation approach. We also introduce a generalized TDT approach by penalized logistic model and a new variable selection criterion EBIC to apply TDT in genome-wide association study. The validation and the advantages of this approach are demonstrated by comparison of PSR and FDR with multiple-comparison method. Our numerical studies also illustrate that compared to EBIC, the traditional variable selection criterion BIC tends to select too many spurious variables in high dimensional space. Handling the data in a certain way, we have a logistic model with grouped covariates. An efficient algorithm for sparse solution is proposed based on a series of optimality conditions of the optimization programming.
Title:
Nonparametric Components in Discrete Choice Models
(with an Application to Credit Scoring)
Speaker: Professor Marlene Muller, Fraunhofer Institute of Industrial Mathematics, Department of Financial Mathematics, Germany
Date: 05 November 2008 (Wednesday)
Time: 4:00pm - 5:00pm
Venue:
S16-06-118, Seminar Room
Abstract
The talk reviews semiparametric extensions to the generalized linear regression model (GLM). Nonparametric components can be incorporated into the GLM in different ways. A wide class of models is given by using nonparametric function estimates within the argument of the link function. This class includes generalized additive and generalized partial linear models as well as the combination of these components.
The aim of this talk is to introduce and to compare different estimation approaches that have been proposed for this class. This covers in particular backfitting and marginal integration techniques which may lead to different results if the data are not consistent with the underlying model. A focus is given to applicable and easily available techniques.
Title:
Penalized Spline Regression and Mixed Models - a Promising Alliance
Speaker: Professor Göran Kauermann, University of Bielefeld, Germany
Date: 31 October 2008 (Friday)
Time: 4:00pm - 5:00pm
Venue:
S16-05-102, DSAP Computer Lab2
Abstract
Penalized spline fitting as smoothing method has achieved recognisable popularity over the last years. Tracing back to Eilers und Marx (Statistical Science, 1996) the book by Ruppert, Wand & Carroll (2003, Cambridge University Press) shows the versatility and flexibility of the approach.
The presentation starts with an introduction to penalized spline smoothing. Basic ideas are illuminated and supplemented by data examples. In particular the link between penalised spline smoothing and linear mixed models will be exhibited. This allows to select the smoothing parameters based on the maximum likelihood principle. The idea is extended with a number of research results from spatial statistics and supported with economic data examples.
Title:
Efficient Estimation in Semiparametric
Transformation Models for Current Status Data
Speaker: Professor Zeng Donglin, University of North Carolina, Chapel Hill
Date: 31 October 2008 (Friday)
Time: 12:00noon - 1:00pm
Venue:
S16-06-118, Seminar Room
Abstract
Current status data arise if actual occurrence of event time is only observed either before or after some monitoring time. Examples of such events include onset of cancer in screening study, onset of seroconversion in AIDS patients and etc. We propose semiparametric transformation hazards models for analyzing current status data and our models incorporate time-dependent covariates. The nonparametric maximum likelihood estimation is used for inference. The estimators are shown to be consistent, asymptotically normal and efficient. Additionally, we propose a simple algorithm for computing the estimates. Simulation studies and real data analysis are used to illustrate our approach.
Title:
Tournament Screening cum EBIC for Feature
Selection with High Dimensional Feature Spaces
Speaker: A/Prof Chen Zehua, Department of Statistics and Applied Probability, National University of Singapore
Date: 22 October 2008 (Wednesday)
Time: 4:00pm - 5:00pm
Venue:
S16-06-118, Seminar Room
Abstract
The feature selection characterized by relatively small sample size and extremely
high dimensional feature space is common in many areas of contemporary
statistics. The high dimensionality of the feature space causes serious
difficulties: (i) the sample correlations between features become high even if
the features are stochastically independent; (ii) the computation becomes intractable.
These difficulties make conventional approaches either inapplicable
or inefficient. The reduction of dimensionality of the feature space followed by
low dimensional approaches appears the only feasible way to tackle the problem.
Along this line, we developed a tournament screening cum EBIC approach for
feature selection with high dimensional feature space. The procedure of tournament
screening mimics that of a tournament. We shown that the tournament
screening has a sure screening property, a necessary property which should be
satisfied by any valid screening procedure. The EBIC is an extended Bayesian
information criterion which incorporates the model complexity, a property of
the model space, into the criterion. We also shown that the EBIC is consistent
in the case of extremely high dimensional feature spaces. It is demonstrated by
numerical studies that the tournament screening cum EBIC approach enjoys
desirable properties such as having higher positive selection rate and lower false
discovery rate than other approaches.
1.
Title:
Modeling Longitudinal Semicontinuous Emesis Volume
Data with Application to Acupuncture Clinical Trial
- update
Speaker: Professor Pulak Ghosh, Department of Biostatistics and Winship Cancer Institute, Emory University, Atlanta, USA
Date: 15 October 2008 (Wednesday)
Time: 4:00pm - 5:00pm
Venue:
S16-06-118, Seminar Room changed to S16-05-102, Computer Lab2
Abstract
In many biomedical applications, researchers encounter semicontinuous data whereby data are either continuous positive values or zero. When the data are collected over time the observations may be correlated. Analysis of these kinds of longitudinal data is challenging, due to the presence of strong skewness and large proportion of zeros in the data. In this talk, we develop a flexible class of zero-inflated models in a longitudinal setting. We use the method to analyze longitudinal data from an acupuncture clinical trial, in which we compare the effects of active acupuncture, sham acupuncture and standard medical care on chemotherapy-induced nausea in patients being treated for advanced stage breast cancer. A spline model is introduced into the linear predictor of the model to explore the possibility of nonlinear treatment effect. We also account for possible serial correlation between successive observations using Brownian motion resulting in a more flexible modeling framework for semicontinuous data. We illustrate the Bayesian methodology with the acupuncture clinical trial data.
Title:
The Simple Random Sample
Speaker: Dr Yap Von Bing, Department of Statistics and Applied Probability, National University of Singapore
Date: 08 October 2008 (Wednesday)
Time: 4:00pm - 5:00pm
Venue:
S16-06-118, Seminar Room
Abstract
When you sample at random without replacement from a finite population, you get a simple random sample. This model is immediately applicable to the simplest surveys. The associated random variables are perhaps the simplest example of exchangeable random variables. The simple random sample is equivalent to a single random sample from all subpopulations of a given size. By leaning on the uniform label distribution, moment calculations are easy, and the hypergeometric distribution is derived. In a randomised experiment (Fisher 1935) where the responses are as modelled by Neyman (1923), we show that the large-sample randomisation test can be conservatively approximated by the 2-sample t-test. This is an elegant application of probability on an important scientific problem; pharmaceutical products among others are tested in randomised trials. It offers a valuable perspective on modern approaches such as regression, the applicability of which is not as obvious as commonly assumed (Freedman 2008). This talk is targeted at senior undergraduate and graduate students, but all are welcome.
Title:
Feature Selection in High Dimensional Regression:
A look at LASSO and Correlation Screening
Speaker: Mr Lim Chinghway, University of California, Berkeley and Department of Statistics and Applied Probability, National University of Singapore
Date: 21 August 2008 (Thursday)
Time: 4:00pm - 5:00pm
Venue:
S16-06-118, Seminar Room
Abstract
Abstract
In the modern information age, it is increasingly common to find statistical applications with a large number of covariates. Further complicated when there are fewer observations than covariates, this presents a new challenge for the problem of feature selection. In the realm of linear models, the Lasso has emerged as a popular technique to recover sparsity. There has been considerable amount of work on its consistency results as well as extensions to the technique.
Separately, correlation screening, as simple as it sounds, has gathered renewed interest in recent work. In this talk, I will give an overview of the two methods and present some of their consistency results. I will discuss their limitations and how using both together can harvest their individual merits. I will also give a brief summary of their extensions to the generalized linear model and highlight some problems of interest.
Title:
On a Constrained ARCH Model for the Prediction of VaR
Speaker: Mr Wang Mengxi, Department of Statistics and Applied Probability, National University of Singapore
Date: 15 August 2008 (Friday)
Time: 3:00pm - 4:00pm
Venue:
S16-07-107, DSAP Reading Room
Abstract
Risk Management is an important aspect of financial industry and has gained increasing attention after recent financial turmoil triggered by insufficient or ineffective risk management practice. Value-at-Risk (VaR), an advanced technique of modeling risk of assets has been widely accepted and received increasing popularity of not only financial firm, but also industrial corporations over the past two decades. This method is appealing because of its ability to integrate several market risk factors into one single measure, often as a dollar term or a percentage of the asset value to express the potential loss over a specific period of time. As the most important input variable for estimating VaR, financial volatility forecast has become the center of the problem and an important field in the statistics research.
It is often recognized that financial time series have some prominent characteristics, including volatility clustering, i.e. large changes tend to be followed by large changes and small changes by small changes; Leptokurtosis, or fat tailed distribution of the financial returns; and lastly they often show leverage effect, that is the changes in stock pries tend to be negatively correlated with changes in volatility, meaning volatility is higher after negative shocks than after positive shocks of the same magnitude. Many researches have been done in this area and it has been proved that some popular models like GARCH and ARCH have the ability to model the non-normality of the returns as well as changing conditional volatilities, hence provide good future estimation. Some improved models such as GJR and EGARCH have been developed to capture the leverage effect of the financial returns. In this project, a new model namely Constrained Volatility ARCH (CARCH) model was proposed to provide better balance between model flexibility and stability by imposing constraints on the coefficient constraint. This constraint results in a natural selection process driven by the data itself to achieve a more parsimonious model with better flexibility than the GARCH (1,1) model, yet with better prediction, measured by smaller MAD (Mean Absolute Deviation). Some financial time series data was tested in this study and the results suggested the new CARCH model outperformed the other conventional GARCH models with significant superiority in terms of quartile prediction and risk management perspective.
Title:
Integration of Heterogeneous Datasets for the
Prediction of Directly Regulated Genes
Speaker: Mr Deng Niantao, Department of Statistics and Applied Probability, National University of Singapore
Date: 01 August 2008 (Friday)
Time: 4:00pm - 5:00pm
Venue:
S16-06-118, Seminar Room
Abstract
Estrogen Receptor is a master transcriptional regulator in
breast cancer and is an archetype of a molecular therapeutic target. Experiments have been performed to map ER binding sites
on a genome-wide basis using various chromatin immunoprecipi-
tation (ChIP) techniques. Lin and Vega (2007) applied the ChIP-
PET strategy to map ER binding sites in MCF-7 cancer cells and
found that only 5% of the ER binding sites were within the proximal gene promoter regions, while the majority were mapped further away from genes. In order to understand the ER impact on
regulation, we integrated various datasets and explore the association between binding sites and regulated genes from the aspects
of their distance, the binding strength and the concentration of
the binding regions. We identified some important factors which
contribute to the direct regulation and tentatively proposed a
score function for genes to measure their potential to be directly
regulated. The numerical results have been shown between control gene group and expressed gene group and are compared by
the Receiver Operating Characteristic (ROC) curve analysis.
Title:
A Moment Substitution Approach to Fitting Linear Regression Models
withCategorical Covariates Subject to Randomized Response
Speaker: Mr Wang Zijian, Gerald, Department of Statistics and Applied Probability, National University of Singapore
Date: 28 July 2008 (Monday)
Time: 3:00pm - 4:00pm
Venue:
S16-06-118, Seminar Room
Abstract
In this paper, we present an alternative approach to Van den Hout and
Kooiman (2006) for estimating the linear regression model with categorical
covariates subject to randomized response (RR). Specifically, we consider
Warner's (1965) scheme of randomization. Our approach essentially consists
of moment substitution, where we estimate the latent first, second and
cross product moments in the usual least squares estimator for the centred
model with their associated observed unbiased estimates. For the problem
of estimating subgroup means in a dichotomous population, we show that
this moment substitution approach is equivalent to Selen's (1986) estimator
under appropriate distributional assumptions. Assuming independent randomizations,
this approach is further adapted to the case of multiple linear
regression, when some or all of the covariates are subject to RR. Ultimately,
it is shown that the estimates yielded by this method are asymptotically
equivalent to the measurement error model estimates of Fuller (1987) under
suitable transformations.
Title:
Statistical Analysis of a Time- Course Nasopharyngeal Carcinoma Gene Expression Data
Speaker: Mr Md. Atikur Rahman Khan, Department of Statistics and Applied Probability, National University of Singapore
Date: 23 July 2008 (Wednesday) changed to 25 July 2008 (Friday)
Time:
1:30pm - 2:30pm changed to 3pm - 4pm
Venue:
S16-06-118, Seminar Room
Abstract
A common goal of microarray is to identify genes that are differentially expressed in
different biological conditions. Time-course microarray experiments can be used to
detect temporal differential gene expressions in these conditions. Our aim in this study is
to investigate the time-course regulation and differential expression of genes cell lines in
a dataset from an in vitro experiment that uses cyclin dependent kinase (CDK) inhibitor
on 3 Nasopharyngeal Carcinoma (NPC) cells. We explored the different aspects of this
dataset: hierarchical clustering based on distance measure and analyzed the data using
principal component analysis. Time-course regulation of genes were studied by using
time-course pattern and profile analysis. We performed gene ontology (GO) category
enrichment together with the differential gene expression analysis and hypothesized some
genes on different pathways which were significantly responded to that CDK inhibitor.
Title:
Bootstrap Methods for Semi-Parametric Goodness of Fit Tests
Speaker: Prof G. Jogesh Babu, Department of Statistics, The Pennsylvania State University
Date: 23 July 2008 (Wednesday)
Time:
4:00pm - 5:00pm
Venue:
S16-06-118, Seminar Room
Abstract
Nonparametric goodness of fit tests are generally based on the empirical distribution function. A well discussed problem of goodness of fit tests when parameters are estimated will be revisited. Bootstrap methods to estimate the null distributions, of various goodness of fit test statistics, will be presented. These results hold not only in the univariate case but also in the multivariate setting. These ideas are taken a step further to develop non-parametric resampling methods for inference, when the data comes from an unknown distribution which may or may not belong to a specified family of distributions.
Title:
Multivariate Linear and Nonlinear Causality Tests with Applications
Speaker: Miss Zhang Bingzhi, Department of Statistics and Applied Probability, National University of Singapore
Date: 21 July 2008 (Monday)
Time:
3:00pm - 4:00pm
Venue:
S16-06-118, Seminar Room
Abstract
The traditional linear Granger test has been widely used to examine the linear
causality between any pair of time series. Hiemstra and Jones (1994) developed
a nonlinear Granger causality test to investigate the nonlinear causality between
stock prices and trading volume. In this thesis, we extend their work by developing
both linear and non-linear causality tests in multivariate settings instead of in
pairwise context. We then apply the tests to identify the linear and non-linear
multivariate causality relationships among the indices of the Chinese segmented
stock markets..
Title:
Option Pricing with Aggregation of Physical Models and Empirical Learning[Joint Seminar Between DSAP and RMI]
Speaker: Prof Jianqing Fan (Princeton University) and Prof Loriano Mancini (Bendheim Centre for Finance -Princeton University)
Date: 10 July 2008 (Thursday)
Time:
3:30pm - 5:30pm
Venue:
S14-03-10, CRA , Department of Mathematics change to S16-06-118 (Seminar Room)
Abstract
Financial mathematical models are useful tools for option pricing. These physical models provide a good first order approximation to the underlying dynamics in the financial market. Their pricing performance can be significantly enhanced when they are combined with statistical learning approaches, which empirically learn and correct pricing errors through estimating state price densities. In this paper, we propose a new semiparametric technique for estimating state price densities and pricing financial derivatives. This method is based on a semiparametric approach to estimating the survivor function of a normalized state variable and is easy to implement. Our method can be combined with any model-based pricing formula to correct the systematic biases of pricing errors and enhance the predictive power. Empirical studies based on S&P 500 index options show that our method outperforms several competing pricing models in terms of predictive and hedging ability.
Title:
Estimation of High-dimensional Covariance Matrix [Joint Seminar Between DSAP and RMI]
Speaker: Prof Jianqing Fan, Princeton University
Date: 10 July 2008 (Thursday)
Time:
2:00pm - 3:00pm
Venue:
S14-03-10, CRA , Department of Mathematics change to S16-06-118 (Seminar Room)
Abstract
High dimensionality comparable to the sample size is a common feature in portfolio allocation, risk management, genetic network and climatology. In this talk, we first use a multi-factor model to reduce the dimensionality and to estimate the covariance matrix for portfolio allocation and risk assessment. The impacts of dimensionality on the estimation of covariance matrix and its inverse are examined. We identify the situations under which the factor approach can gain substantially the performance and the cases where the gains are only marginal, in comparison with the sample covariance matrix. Furthermore, the impacts of the covariance matrix estimation on portfolio allocation and risk management are studied. Viable covariance modeling and sparse and robust portfolio allocations are recommended based on our mathematical results.
In other class of problems such as genetic network or climatology, sparsity of the covariance matrix or its inverse arises naturally. We then estimate high-dimensional covariance matrices using the penalized likelihood method to explore the sparsity. New algorithms are proposed. Optimal rates of convergence, sparsistency, and asymptotic normality are established. Our theoretical results are verified by simulation studies and illustrated by several applications.
Title:
Testing for Interactions in General Semiparametric Analysis
of Repeated Measures Data, With Application to Testing for
Main Effects of Genes with Possible Environmental Applications
Speaker: Prof Raymond J. Carroll, Distinguished Professor, Professor in the Department of Statistics, Professor of Nutrition and Toxicology, Department of Statistics, Texas A&M University, TAMU
Date: 09 July 2008 (Wednesday)
Time:
11:00am - 12:00noon
Venue:
Abstract
This talk considers the general problem where the data for an individual are repeated measures in the most general sense, with a parametric component and a nonparametric component. In gene-environment interaction studies, it is often of interest to test for the main effects of genes (the parametric components) when there might be interactions with the environment (the nonparametric component). Rather than build complex models for the interactions, we use a Tukey-type 1-degree of freedom formulation that has the promise to improve power for testing whether there are any genetic effects. We derive a general profile-type score statistic and show how to implement it, which involves circumventing the need to solve an integral equation. Extensions to semiparametric additive models with repeated measures are described.
Title:
Risk-Adjusted Cumulative Sum Control Charting Procedures
Speaker: Miss Lin Lin, Department of Statistics and Applied Probability, National University of Singapore
Date: 07 July 2008 (Monday)
Time:
4:00pm - 5:00pm
Venue:
Abstract
Risk-adjusted charts for monitoring the performances of a surgeon or a group of surgeons have recently gained their prominence in the literature. It started with the introduction of a chart that plots cumulatively the expected mortality counts minus the observed counts in 1997. The statistic plotted is intuitive and it has gained widespread attention and adoption. However, the run length performance of this chart is still not clearly understood because of the lack of a signalling rule. A cumulative sum chart based on testing the odds ratio that a patient dies was proposed in 2000. The run length performance of this chart is optimal but the interpretation of this chart is not as easy because of the inherent difficulty in inter¬preting odds ratio. Between 2000 and 2008, many papers were published comparing these two charts. Although these two charts look seemingly different, we show that they are in fact mathematically identical and we present a unified approach based on testing the risks directly.
Title:
Pricing and Hedging of Barrier Options under Transaction Costs
Speaker: Miss Lim Pei Ling, Department of Statistics and Applied Probability, National University of Singapore
Date: 07 July 2008 (Monday)
Time:
3:00pm - 4:00pm
Venue:
Abstract
A barrier option is one of the most popular exotic options for structured products. Barrier options can be divided into two categories – the knock-out or knock-in options. The knock-out (resp. knock-in) option is expired (resp. exercisable) automatically when the underlying stock price hits the predetermined barrier level. The problems of pricing and hedging barrier option in the presence of proportional transaction costs can be formulated as singular stochastic control problems. Thus far, the optimal hedging strategies have been computed numerically using the Markov chain approximation and the discrete-time dynamic programming. However, this approach is computationally intensive. Lai and Lim (2006) proposed a
new approach and an efficient backward algorithm to solve the problem of option pricing and hedging by using the equivalence of optimal stopping to the class of singular stochastic control problems. In this paper, we apply this new proposed approach, under the case of negative exponential utility, to study the hedging strategies for the “up-and-out” barrier option in the presence of transaction costs. The technique results in the optimal hedging strategy that involves two optimal buy and sell boundaries. The numerical results are also studied in this paper.
Title:
Regression Spline via Penalizing Derivatives
Speaker: Miss Zhu Yeying, Department of Statistics and Applied Probability, National University of Singapore
Date: 04 July 2008 (Friday)
Time:
3:00pm - 4:00pm
Venue:
Abstract
Regression spline based on a truncated power basis has been proved to be a very useful nonparametric method. One way to implement this method is to approximate the unknown underlying function as a linear combination of the truncated power basis and estimate the coefficient vector appropriately. In situations when the coefficient vector is large-dimensional and sparse, the SCAD method can be used to select and estimate the non-zero components simultaneously. In other cases, when the coefficient vector is not sparse, but the pth times derivatives of the regression spline function are sparse, directly applying the SCAD method is less effective. In this thesis, we attempt to re-parameterize the coefficient vector as a linear function of certain derivative vector, whose last K + 1 components are the pth times derivatives of the regression spline function. We then apply the SCAD method to estimate the new coefficient vector. Numerical results show that the newly proposed method is much more accurate than the usual regression spline methods, especially when the true curve is piecewise with different orders of polynomials at different segments.
Title:
Pattern Theorem on Hexagonal Lattice
Speaker: Mrs Pritha Guha, Department of Statistics and Applied Probability, National University of Singapore
Date: 16 June 2008 (Monday)
Time:
3:00pm - 4:00pm
Venue:
Abstract
Please click here for abstract.
Title:
Insights into the Mammal Radiation from Weird Australian Mammals
Speaker: Dr Gavin Huttley, Australian National University, Australia
Date: 21 May 2008 (Wednesday)
Time:
4:00pm - 5:00pm
Venue:
Abstract
The recent publication of the Platypus genome sequence 'completes' the sampling of all major mammal taxonomic divisions, with genome sequence now available for a monotreme, a marsupial and multiple eutherian lineages. The availability of the marsupial and monotreme lineages combined with an additional bird outgroup, provide the essential references from which to infer the molecular events responsible for the emergence of mammals. They also allow examination of the mode and tempo of evolutionary divergence among eutherian lineages. So can we explain the molecular basis of uniquely mammalian traits such as lactation and mammogenesis; X chromosome inactivation; and homeothermy? Can we identify the genes at which molecular changes arose that underpin these characteristic phenotypes? Can we even resolve the relatively straightforward question of the evolutionary relationships among eutherian lineages? I will illustrate how one of the simpler genomic properties that differs between the sampled mammal genomes -- genomic nucleotide composition -- is confounding efforts to estimate relationships, rates of evolution and estimates of adaptive evolution.
Title:
Estimating Population Size from Multiple Lists
Speaker: Dr Mao Changxuan, Department of Statistics, University of California, Riverside
Date: 28 April 2008 (Monday)
Time:
3:00pm - 4:00pm
Venue:
Abstract
The Rasch model is adopted to estimate the unknown population size in multi-list surveillance studies (disease, drug abuse, etc.) It takes both the list effectiveness and case heterogeneity into account. A stepwise approach is proposed in which optimization problems are solved conveniently.
The sharpest lower bound to the odds that a case is unseen is introduced, which can be calculated by linear programming. There are also some less sharp lower bounds. Estimating a lower bound leads to an estimator for the population size. Real examples are investigated.
Title:
Expected number of real zeros of a random polynomial with independent, identically distributed, symmetric, long-tailed coefficients
Speaker: Prof Larry Shepp, Deaprtment of Statistics, Rutgers University
Date: 24 April 2008 (Thursday)
Time: 11:00am - 12:00pm
Venue:
Abstract
Please click here for abstract.
Title:
Systems Bioinformatic Approaches for Characterizing, Engineering and Designing Complex Biological Systems
Speaker: Dr Lee Dong-Yup, Department of Chemical and Biomolecular Engineering, National University of Singapore
Date: 23 April 2008 (Wednesday)
Time:
4:00pm - 5:00pm
Venue:
Abstract
Recent advances in high-throughput experimental techniques are now allowing us to study various omics data sets for the global understanding of complex biological systems. Concurrently with the high-throughput experiments, it is also increasingly accepted that in silico modeling and simulation improve our capability to elucidate the functions and characteristics of complex cellular systems. Thus, it is highly desirable to establish a systems bioinformatic platform for integrating wet-experiments, concomitant statistical data analysis and in silico modeling at the systems level. My research projects @ NUS & BTI focus on the development of systemic, integrative and bioinformatic approaches and their applications to complex biological systems to understand and characterize such systems, and to effectively achieve desirable properties of the systems by resorting to modeling, control, optimization and data analysis techniques. This talk highlights several on-going projects including statistical analysis of various omics data. Future perspectives on systems bioinformatics and some technical challenges are also discussed.
Dong-Yup Lee is an assistant professor of Dept. of Chemical and Biomoleculer Engineering at National University of Singapore (NUS), with a joint appointment at the Bioprocessing Technology Institute (BTI) of A*STAR. He received his PhD in Chemical and Biomolecular Engineering from KAIST. Prior to joining NUS and BTI in 2005, he was a senior researcher at Bioinformatics Research Center at KAIST. His main research interests are in the application of systems methodologies to understanding and designing biological and biomedical systems in a global scale. Main research fields include Systems Biology/Biotechnology/Bioinformatics, Drug & Disease Modeling and Control, and Supply Chain Management. He has coauthored about 30 research articles on these and other topics.
Title:
Probabilistic and Statistical Study of Markov Models using Regeneration Techniques
Speaker: Prof. Stephan Clemencon, Telecom Paristech - LTCI UMR No. 5141 Institut Telecom/CNRS & Metarisk - INRA
Date: 22 April 2008 (Tuesday)
Time:
3:00pm - 4:00pm
Venue:
Abstract
In this talk, we shall describe new concepts and results in the field of probabilistic and statistical analysis of Markov chains, discrete-time processes widely used in the applications for modelling random phenomena with a causality. The description of the behavior of the chain in terms of renewal processes is used here not only as a tool for proving theoretical results of probabilistic nature (deviation inequalities, Edgeworth expansions, etc.) but also as a practical manner of elaborating statistical procedures, in order to tackle a wide variety of statistical problems such as confidence interval constructions, bootstrap, robust inference or extreme value statistics.
Title:
Monotone Penalised Spline Smoothing
Speaker: A/Prof Turlach Berwin Ashoka, Department of Statistics and Applied Probability, National University of Singapore
Date: 16 April 2008 (Wednesday)
Time:
3:00pm - 4:00pm
Venue:
Abstract
Penalised spline smoothing (Eilers and Marx, 1996; Ruppert and Carroll, 2000) is, arguably, fast becoming the method of choice for non- and semiparametric regression models. The attractiveness of penalised spline smoothers is twofold. First, compared with other smoothing methods, e.g. smoothing splines or kernel smoother, fitting penalised splines smoothers is computationally less complex.
Secondly, the connection between smoothing methods and mixed models (Speed, 1991) is particularly easy to establish for penalised spline smoothers. Thus, it is easy to incorporate a penalised spline smoother into a semiparametric regression model and fit the model using standard software available for fitting (linear) mixed models (Ruppert, Wand and Carroll, 2003).
However, in some situations, one would like to combine the flexibility of nonparametric smoothing techniques with prior knowledge in the form of constraints on the response curve given by, say, a physical or economic theory. In this talk, we discuss how monotonicity constraints can be imposed on penalised spline smoothers.
Title:
Counting Without Sampling: Asymptotics of the Log-Partition Function
Speaker: Professor Antar Bandyopadhyay, Theoretical Statistics and Mathematics Unit, Indian Statistical Institute, New Delhi Centre, New Delhi, India
Date: 02 April 2008 (Wednesday)
Time:
4:00pm - 5:00pm
Venue:
Abstract
In this talk we will propose new methods for computing the asymptotic value for the logarithm of the
partition function for certain statistical physics models on certain type of finite graphs, as the size of the
underlying graph goes to infinity. We will consider two models, namely the hard-core model when the activity
parameter is small, and the model for counting the number of proper q-colorings. And we will only consider
the graphs with large girth. In particular, we will show that asymptotically the logarithm of the number of
independent sets of any r-regular graph with large girth is constant, when r 5. For example, we will show
that every 4-regular n-node graph with large girth has approximately (1.494...)n-many independent sets, for
large n. Similarly we will prove that for every r-regular graph with r 2, with n nodes and large girth, the
number of proper q r + 1 colorings is approximately a constant (which can be explicitly written in terms
of q and r), when n is large. Similar results also hold for random regular graphs.
As a byproduct of our method we will show that one can obtain some simple approximate counting
algorithms for the problem of enumerating the number of independent sets, and proper colorings, in low
degree graphs with large girth. These algorithms will be deterministic as opposed to Markov chain sampling
schemes which are typically used in this context.
Our main approach will be to use a (strong) correlation decay property for the corresponding Gibbs
measure (at certain parameter regime), along with a simple cavity trick which is well known in the physics
literature.
(This is a joint work with David Gamarnik, Sloan School of Management, MIT).
Title:
Semiparametric Regression And The Computer Science Interface
Speaker: Professor Matthew Wand, University of Wollongong, Australia
Date: 26 March 2008 (Wednesday)
Time:
4:00pm - 5:00pm
Venue:
Abstract
Semiparametric regression is concerned with flexible incorporation of nonlinear functional relationships in regression analyses, and is also the title of a 2003 book co-authored by the speaker. Examples of semiparametric regression include generalised additive models and additive mixed models for longitudinal data. The field has evolved almost entirely within the field of Statistics. In this talk we discuss semiparametric regression in light of the dissolving frontier between statistics and Computer Science. In particular we will discuss ways by which semiparametric regression can benefit from, and be beneficial to, Computer Science research.
Title:
Statistical Estimation for Informatively Censored Survival Data
Speaker: Professor Zhang Wenyang, Department of Mathematical Sciences, University of Bath, UK
Date: 19 March 2008 (Wednesday)
Time:
4:00pm - 5:00pm
Venue:
Abstract
Partial likelihood estimation is a common used way to deal with the censored data. The vital assumption for partial likelihood estimation is the censoring is noninformative. However, sometimes, the censoring is indeed informative. One would pay price on efficiency of the obtained estimator if partial likelihood estimation is still used when the censoring is informative. In this talk, I will take a complete likelihood estimation approach, and appeal the local polynomial modelling to deal with the informatively censored survival data. Simulation studies show that the complete likelihood estimation approach indeed improves the efficiency of the estimator. Traditionally, in survival analysis, when complete likelihood estimation approach is taken the baseline function is usually modelled by least informative approach, see Fan and Gijbels (1996). While this approach is very appealing for estimating the coefficients, it doesn't seem working very well on estimating the baseline function itself.
The approach I take in this talk to deal with the baseline function is quite different to the traditional one, though based on local constant. I will show the baseline function can be estimated accurately by the proposed estimation method. I will also show that the directly local linear modelling would not work, the local constant modelling has to be conducted in an indirect way. Finally, I will use the proposed methods to analyse the second birth interval in Bangladesh, which leads to some interesting findings.
Title:
Dealing with Spreadsheet Addiction
Speaker: Professor J. C. Nash, School of Management, University of Ottawa
Date: 05 March 2008 (Wednesday)
Time:
3:00pm - 4:00pm
Venue:
Abstract
Many organizations and managers suffer a quiet addiction to spreadsheets. First turned on through easy availability, they typically get drawn into overuse through the attraction of cells that can be whatever they want them to be, designer macros, and charts in psychedelic colours. Yet all of these attractions hide the propensity of these uncontrolled programming environments to accidentally lose VERY large amounts of money, and to make it difficult or impossible to detect misreporting. Empirical studies demonstrate that the proportion of spreadsheets without serious errors is 0%
(Yes ZERO %).
It seems unlikely we can wrest the quick fix of spreadsheets from the grip of determined users. What, then, can we do to minimize the harm that spreadsheet addicts do to themselves and to their employers?
In particular, we will consider how to ensure: Enforceable audit trails; Better function management - and better functions; More rigorous testing methods; (These items will be expanded for a mathematical audience). Platform independence.
The speaker will present ideas arising from his involvement with two ongoing projects:
1) to provide tests of spreadsheet functions; and,
2) to offer audit trail and collaboration capability for spreadsheets and other office-suite software. He will also touch on a few of the many ideas and projects that have been presented at the European Spreadsheet Risks Interest Group conferences. Despite its name, EuSpRIG has a world-wide participation.
Title:
Parameter Estimation and Bias Correction for Diffusion Processes
Speaker: Dr Tang Cheng Yong, Department of Statistics, Iowa State University
Date: 03 March 2008 (Monday)
Time:
4:00pm - 5:00pm
Venue:
Abstract
This paper considers parameter estimation for continuous-time diffusion processes which are commonly used to model dynamics of financial securities including interest rates. To understand why the drift parameters are more difficult to estimate than the diffusion parameter as observed in many empirical studies, we develop expansions for the bias and variance of parameter estimators for two mostly employed interest rate processes. A parametric bootstrap procedure is proposed to correct bias in parameter estimation of general diffusion processes with a theoretical justification. Simulation studies confirm the theoretical findings and show that the bootstrap proposal can effectively reduce both the bias and the mean square error of parameter estimates for both univariate and multivariate processes. The advantages of using more accurate parameter estimators when calculating various option prices in finance are demonstrated by an empirical study on a Fed fund rate data.
Title:
Reconstructing the Effect of Alternative Intervention
Strategies on Historic Epidemics
Speaker: Dr Alex R. Cook, Department of Plant Sciences, University of Cambridge, England, UK
Date: 20 February 2008 (Wednesday)
Time:
4:00pm - 5:00pm
Venue:
Abstract
Data from historical epidemics provide a vital and sometimes under-used resource from which to devise strategies for future control of disease. Previous methods for retrospective analysis of epidemics, in which alternative interventions are compared, do not make full use of the information; by using only partial information on the historical trajectory, augmentation of control may lead to predictions of a paradoxical increase in disease. Here we introduce a novel statistical approach that takes full account of the available information in constructing the effect of alternative intervention strategies in historic epidemics. The key to the method lies in identifying a suitable mapping between the historic and notional outbreaks, under alternative control strategies. This is done by using the Sellke construction as a latent process linking epidemics. The application of the method is illustrated by two examples. First, using temporal data for the common human cold, the improvement under the new method in the precision of predictions for different control strategies is shown. Secondly, the generality of the method for retrospective analysis of epidemics is shown by applying it to a spatially-extended arboreal epidemic in which the relative effectiveness of host culling strategies that differ in frequency and spatial extent are compared. Some of the inferential and philosophical issues that arise are discussed along with the scope of potential application of the new method.
Title:
Bayesian Hierarchical Modeling for Extreme Values Observed Over Space and Time - Cancelled
Speaker: Dr Sang Huiyan, Department of Statistical Sciences, Duke University, Durham, NC
Date: 13 February 2008 (Wednesday)
Time:
2:00pm - 3:00pm
Venue:
Abstract
In this talk, I will begin with extreme value theory and a discussion on issues in modeling multivariate extremes. I will then present our hierarchical modeling approach for explaining a collection of spatially-referenced time series of extreme values. The univariate distributions of extreme values are extended to higher dimensions using latent multivariate Markov random field models specified through coregionalization, which allows the interpretation of high dimensional extreme value analysis including the nature of spatial association and the nature of temporal trend. By relaxing the assumption of conditional independence in the hierarchical models, we extend our approach to describe extreme values with a smoothed spatial process, which can be used in spatial interpolation with extremes.
Title:
Statistical Issues that Arise in Modeling and Regulating Air Pollution Fields - Joint Seminar With Institute for Mathematical Sciences (IMS)
Speaker: Professor Jim Zidek, Department of Statistics,
University of British Columbia
Date: 23 January 2008 (Wednesday)
Time:
4:00pm - 5:00pm
Venue:
Abstract
The earth's atmosphere is a complex stochastic system which includes amongst other things pollution fields, a part of each deriving from
anthropogenic sources and activities. Because of their negative health
impacts, these fields are now subject to regulation.
However setting the air quality standards needed to regulate them is itself a
complex business and that leads to a need for good models for these fields.
This talk, drawing on the speaker's recent experience and research
connected with ozone, will describe physical, computational and statistical
approaches to modeling pollution fields and how these might be combined.
Finally he will describe some of the ways in which the results of these
models play into the process of developing standards. Although focussing on
random pollution fields, the modeling issues have become quite pervasive in
current research in statistical science.
Title:
Structural Models of Corporate Bond Pricing with Maximum Likelihood Estimation - Joint Seminar With Risk Management Institute (RMI)
Speaker: Dr Hoi Ying Wong, The Chinese University of Hong Kong
Date: 16 January 2008 (Wednesday)
Time:
3:00pm - 4:00pm
Venue:
Abstract
Testing structural models of corporate bond pricing is equivalent to examining the performance of credit risk models in finance. This study empirically examines the proxy, volatility-restriction (VR) and maximum likelihood (ML) approaches to implementing structural corporate bond pricing models, and documents that ML estimation is the best among the three implementation methods. Empirical studies using either the proxy approach or the VR method conclude that barrier-independent models significantly underestimate corporate bond yields. Although barrier-dependent models tend to overestimate the yield on average, they generate a sizable degree of underestimation. The present work shows that the proxy approach is an upwardly biased estimator of the corporate assets and makes the empirical framework work systematically against structural models of corporate bond pricing. The VR approach may generate inconsistent corporate bond prices or may fail to give a positive corporate bond price for some structural models. When the Merton, LS, BD and LT models are implemented with ML estimation, we find substantial improvement in their performances. Our empirical analysis shows that the LT model is very accurate for predicting short-term bond yields, whereas the LS and BD models are good predictors for medium-term and long-term bonds. The Merton model however significantly overestimates short-term bond yields and underestimates long-term bond yields. Unlike empirical studies in the past, the Merton model implemented with ML estimation does not consistently underestimate corporate bond yields. This research gives an example in favor of using statistics in empirical finance rather than using a simplifying accounting rule and spells out the potential proxy risk in empirical studies.
Title:
Statistical Inference for GARCH type Models
Speaker: Dr Chi Tim Ng, Timothy, Department of Statistics, College of National Sciences, Seoul National University Seoul, South Korea
Date: 08 January 2008 (Tuesday)
Time:
4:00pm - 5:00pm
Venue:
S16-06-118 (Seminar Room)
Abstract
Since Engle's work, ARCH models have received considerable attention among economists and various types of generalizations to the ARCH models have been proposed. Among these models, those incorporating the notion of fractional-differencing and non-stationarity are the most interesting ones as they offered many challenging theoretical problems.
One commonly used technique to estimate the parameters in the ARCH type models is quasi-maximum likelihood estimation (QMLE). To establish the asymptotic properties of the QMLE, one usually has to impose stringent assumptions, see Robinson and Zaffaroni (2006) and Straumann (2005). They have to assume that a stationary solution to the true model exists and this solution has some finite moments. These two assumptions are too restrictive to be applied to non-stationary GARCH models exhibiting explosive behavior. Also, there are still controversies over the stationarity of the certain fractional-differencing models.
In this talk, I will give a brief review on the well-established results of stationary GARCH model and present new results of two generalized ARCH-type models, namely the non-stationary GARCH model (see Jensen and Rahbek, 2004) and the fractionally-integrated GARCH model (see Baillie, et al, 1996). The regularity conditions under which the strong consistency and asymptotic normality of the QMLE of the fractionally-integrated GARCH model hold are given in this presentation. In addition, the results of non-stationaryGARCH ($1,1$) models in Jensen and Rahbek (2004) will be extended to the general non-stationary GARCH ($p,q$) models.