Seminar Details
Title:
On The Discounted Penalty Function In A Discrete Time Renewal Risk Model With General Interclaim Times
Speaker: Prof Xueyuan Wu, Centre for Actuarial Studies, Department of Economics, The University of Melbourne
Date: 13 December 2007 (Thursday)
Time:
4:00pm - 5:00pm
Venue:
S16-06-118 (Seminar Room)
Abstract
In this paper a discrete time renewal risk model with arbitrary interclaim times is discussed. We show that the expected discounted penalty function satisfies a recursive formula. In particular, the probability generating function of the time of ruin, as a function of the initial surplus, has a compound geometric tail. When the claim amounts follow a geometric distribution, explicit expression for the Gerber-Shiu function can be obtained for the specially chosen penalty function. The constant claim amounts and mixed geometric claim amounts are also examined.
Title:
Some Bayesian Solutions for Zero-Inflated Poisson Model Selection
Speaker: Prof Gauri S. Datta, University of Georgia
Date: 12 December 2007 (Wednesday)
Time:
4:00pm - 5:00pm
Venue:
S16-06-118 (Seminar Room)
Abstract
Count data are often encountered in agriculture, biology, economics, engineering, health service, industry, meteorology, sociology, to name a few. Number of insurance claims, product defects, traffic fatalities, terrorist incidents, hurricanes, infections, and deaths from AIDS or some other disease are some among many examples dealing with count data. The Poisson distribution, which is usually considered to describe a model for such datasets, sometimes does not work well if there are too many zeros. To account for excessive zeros in count data, a zero-inflated Poisson (ZIP) distribution is suggested in the literature. A ZIP distribution is a mixture of a standard Poisson distribution and a degenerate Poisson distribution with zero mean.
The ZIP distribution has been used both for independent and identically distributed (i.i.d.) observations and for non-i.i.d.
observations where suitable auxiliary variables are available to model the mean. In the latter case, which is referred to as a ZIP regression model, each count is assumed to have a different distribution depending on some explanatory variable(s) and suitable generalized linear models are fitted to the Poisson parameter and/or to the mixing probability. Although there are a number of frequentist solutions discussing statistical inference for such models, Bayesian contribution to this area is rather limited. In this talk, we propose two Bayesian solutions to this problem. In our first solution, treating it as a model selection problem, we rewrite the ZIP model as a mixture of a zero-truncated Poisson distribution and a degenerate distribution at zero. We justify an objective prior for the new parameters. Using this prior and the standard Jeffreys' prior for the Poisson mean we obtain the Bayes factor for the ZIP model versus the standard Poisson model. In the second approach, for the i.i.d. setup we embed the ZIP model into a larger class of models by suitable extension of the parameter space. Our Bayesian test depends on the posterior probability of the hypothesis of zero inflation. Some applications of both solutions and suitable extension to the regression case will be discussed.
Title:
Cross-Profile Shrinkage in Multivariate Bayesian Variable Selection -
With Applications to Gene Set Enrichment Analysis
Speaker: Dr Sierra M. Li, Division of Oncology Biostatistics,
Sidney Kimmel Cancer Center, Johns Hopkins School of Medicine,
Baltimore, MD
Date: 04 December 2007 (Tuesday)
Time:
4:00pm - 5:00pm
Venue:
S16-06-118 (Seminar Room)
Abstract
Variable selection is an important problem in statistical modelling in both association and prediction studies. Pervious research on multivariate Bayesian variable selection has the same definition of latent variable types as in univariate regressions. We focus on the multivariate nature of the problem and raise the new concept of 3 latent types that distinguishes variables by common and differential magnitude in their regression coefficients. We propose a Bayesian hierarchical model that is flexible for both conjugate and non-conjugate structure. The dimension of the model is not fixed when the prior puts a point mass at zero. We are able to integrate out parameters that affect the dimensionality of the model and obtain the marginal posterior of the latent variable types.
Simulation studies prove that the 3-type model out-performs the traditional 2-type model when there are heterogenous signals, by compare the sensitivity and specificity in variable classification. The model framework is general enough for a wide range of applied problems. We demonstrate the model by two case studies. The first study is to find gene-phenotype association in random recombinant yeast segregants treated by diverse small molecules. The second example is to analylize the enrichment of KEGG pathways in a breast cancer study. Gene set enrichment analysis (GSEA) examines the pre-defined, biologically meaningful sets of genes with increased power and robustness to find subtle changes. With gene expression data measured by a profile of multiple responses, such as in different cell lines and under different drug treatments, it is of great interest to elucidate which and how the enrichment differs among these multiple responses. The 3-type Bayesian variable selection model leads to the new concept of common and differential enrichment modelling cross-profile mean and variance of regression coefficients. The case studies show that the 3-type hierarchical model is generally applicable at both gene and gene-set levels.
Title:
Cure Model with Current Status Data
Speaker: Prof Shuangge Ma, Steven, Department of Epidemiology and Public Health, Yale University
Date: 21 November 2007 (Wednesday)
Time:
4:00pm - 5:00pm
Venue:
S16-06-118 (Seminar Room)
Abstract
Current status data arise when only random censoring time and event status at censoring are available. We consider current status data under the cure model, where a proportion of the subjects are not susceptible to the event of interest and the cure probability satisfies a generalized linear model. We assume Cox proportional hazards models for the event time of susceptible subjects. We investigate the maximum likelihood estimate for the linear Cox model and the penalized maximum likelihood estimate for the partly linear Cox model. It is shown that estimates of the parametric regression coefficients are root-n consistent, asymptotically normal and efficient. The nonparametric baseline function and nonparametric covariate effect can be estimated with n^1/3 convergence rate. We propose inference for estimates of the regression coefficients using the weighted bootstrap. Simulation studies are used to assess finite sample performance of the proposed estimates. We also analyze the Calcification data for demonstration.
Title:
Another Look at the Moment Method for Large Dimensional Random Matrices - I & II
Speaker: Prof Arup Bose, Indian Statistical Institute, Calcutta
Date: 14 November 2007 (Wednesday) - Part I
Date: 16 November 2007 (Friday) - Part II
Time:
4:00pm - 5:00pm - Part I & II
Venue:
S16-06-118 (Seminar Room) - Part I
Venue:
S16-05-101, Computer Lab 1 - Part II
Abstract
The methods to establish the limiting spectral distribution (LSD) of large dimensional random matrices includes the well known moment method which invokes the trace formula. Its success has been demonstrated in several types of matrices such as the Wigner matrix and the sample variance covariance matrix. In a recent article Bryc, Dembo and Jiang (Annals of Probability, 2006) establish the LSD for the random Toeplitz and Hankel matrices using the moment method. They perform the necessary counting of terms in the trace by splitting the relevant sets into equivalent classes and relating the limits of the counts to certain volume calculations.
We build on their work and present a unified approach. This helps provide relatively short and easy proofs for the LSD of several common matrices while at the same time providing insight into the nature of different LSD and their interrelations. By extending these methods we are also able to deal with matrices with appropriate dependent entries.
[This work is joint with Dr. Anindya Roy, University of Maryland, BaltimoreCounty, U.S.A. ]
Title:
On Coverage of Generalized Confidence Intervals
Speaker: Prof Arup Bose, Indian Statistical Institute, Calcutta
Date: 07 November 2007 (Wednesday)
Time:
4:00pm - 5:00pm
Venue:
S16-06-118 (Seminar Room)
Abstract
Generalized confidence intervals do not have exact frequentist coverage, but often provide coverage close to the nominal value and have the correct asymptotic coverage.
Many articles have shown that for messy parametric problems with certain pivotal structure, the generalized intervals perform adequately in the repeated sampling set up (even though the generalized intervals are not motivated from a repeated sampling argument).
Generalized procedures have been successfully applied to several problems of practical importance. Several simulation studies have demonstrated the success of the generalized procedure in many problems where the classical approach fails to yield adequate confidence intervals.
There has been some theoretical investigation of the success of generalized intervals in the frequentist sense. Hannig, Iyerand Patterson (2006) have shown that asymptotically the generalized intervals maintain the target coverage level for a large class of problems. Hannig (2006) has also investigated the connection between the generalized procedures and fiducial inference.
The focus of this talk is to provide theoretical explanation of the observed empirical behavior of the generalized intervals and to suggest ways of improving the finite sample performance of the generalized intervals.
We derive expansions of coverage probabilities of one-sided generalized confidence intervals and use the expansions to explain the nonuniform performance of the generalized intervals. We establish that in general the generalized confidence intervals are not first order accurate, i.e., accurate only up to the n-1/2term. We provide a necessary and sufficient condition for the generalized intervals to be first order accurate.
We then show how to use these expansions to obtain improved coverage by suitable calibration. The benefits of the proposed modification are illustrated in the context of several examples.
[This work is joint with Dr. Anindya Roy, University of Maryland, BaltimoreCounty, U.S.A. ]
Title:
Random Continued Fractions
Speaker: Prof Alok Goswami
Date: 11 October 2007 (Thursday)
Time:
4:00pm - 5:00pm
Venue:
S16-06-118 (Seminar Room)
Abstract
Given a terminating or non-terminating sequence of positive integers, the continued fraction determined by this sequence gives a positive real number. Moreover, every positive real can be represented this way. Research on properties of this continued fraction representation had been a signi?cant part in classical mathematics. The most important of these has been the study of the Gauss dynamical system. A stochastic counterpart of this is when the continued fractions are generated by sequences of random variables, giving rise to Random Continued Fractions. For the case of a sequence of i.i.d. non-negative random variables, the random continued fraction converges almost surely. A related markov chain and its ergodic properties play a crtical role in deriving interesting properties of this limit random variable. Some special cases give rise to interesting distributions for the limit random variable. These ideas extend in natural way to higher dimensions.
Title:
On Distribution Estimation and Prediction for Bivariate Extreme-Value Distributions
Speaker: Prof Nader Tajvidi, Mathematical Statistics, Centre for Mathematical Sciences, Lund Institute of Technology, Lund, Sweden
Date: 10 October 2007 (Wednesday)
Time:
3:00pm - 4:00pm
Venue:
S16-06-118 (Seminar Room)
Abstract
Two new methods are suggested for estimating the dependence function of a bivariate extreme-value distribution. One is based on a multiplicative modification of an earlier technique suggested by Pickands, and the other employs spline smoothing under constraints.
Both produce estimators that satisfy all the conditions that define a dependence function, including convexity and the restriction that its curve lie within a certain triangular region. The first approach does not require selection of smoothing parameters; the second does, and for that purpose we suggest explicit tuning methods, one of them based on cross-validation. Applications of our dependence function estimators to estimating the full bivariate distribution, and its density, are described, as too are applications to prediction. Indeed, the cross-validation algorithm is designed to provide near-optimal performance when estimating the bivariate density, and is particularly useful for constructing compact prediction regions by the method of profiling.
Title:
Estimating the Error Distribution In Multivariate Heteroscedastic Time Series Models
Speaker: Prof M.J. Silvapulle, Department of Econometrics and Business Statistics, Monash University, Australia
Date: 28 September 2007 (Friday)
Time:
4:00pm - 5:00pm
Venue:
S16-06-118 (Seminar Room)
Abstract
Copulas have attracted considerable interest for modelling multivariate observations and for stress testing in quantitative finance. In this paper, a semiparametric method is studied for estimating the copula parameter and the joint distribution of the error term in a class of multivariate time series models when the marginal distributions of the errors are unknown.The proposed method first obtains √n-consistent estimates of the parameters of each uni-variate marginal time-series, and computes the corresponding residuals. These are then used to estimate the joint distribution of the multivariate error terms, which is specified using a copula. The proposed estimator of the copula parameter of the multivariate error term is
asymptotically normal, and a consistent estimator of its large sample variance is also given so that confidence intervals may be constructed. A simulation study was carried out to compare the estimators particularly when the error distributions are unknown. In this simulation study, our proposed semiparametric method performed better than the well-known parametric methods. An example on exchange rates is used to illustrate the method.
Title:
Skew Hedging of the Barrier Options
Speaker: Dr Szymon Borak, Humboldt University, Berlin
Date: 19 September 2007 (Wednesday)
Time:
4:00pm - 5:00pm
Venue:
S16-06-118 (Seminar Room)
Abstract
The price of the barrier options depends on the shape of the implied volatility surface. Barrier options can be understood for instance as an option on the implied volatility skew. The implied volatility surface, however, is highly dynamic object, that is subjected to considerable deformations as time passes. Consequently, the hedging performance of these options crucially depends on the strategy to extract the key factors of the implied volatility surface dynamics. We extract these factors by applying dynamic semiparametric factor model and study the hedging performance of the knock-out options.
Title:
A Bayes Method of a Monotone Hazard Rate Via S-paths
Speaker: Dr Ho Man Wai
Date: 12 September 2007 (Wednesday)
Time:
4:00pm - 5:00pm
Venue:
S16-06-118 (Seminar Room)
Abstract
A class of random hazard rates, which is defined as a mixture of an indicator kernel convoluted with a completely random measure, is of interest. We provide an explicit characterization of the posterior distribution of this mixture hazard rate model via S-paths. A closed-form and tractable Bayes estimator for the hazard rate is derived to be a finite sum over S-paths. The path characterization or the estimator is proved to be a Rao-Blackwellization of an existing partition characterization or partition-sum estimator. This accentuates the importance of S-paths in Bayesian modeling of monotone hazard rates. An efficient Markov chain Monte Carlo method for sampling the S-paths is proposed to approximate this class of estimates. Numerical studies show that it performs better than existing popularly used partition-based sampling methods.
Title:
Parameter Estimation Techniques for Statistical Process Monitoring in the Presence of Data Autocorrelation
Speaker: Prof Thaung Lwin , CSIRO Mathematical and Information Sciences, Australia
Date: 05 September 2007 (Wednesday)
Time:
4:00pm - 5:00pm
Venue:
S16-06-118 (Seminar Room)
Abstract
The present paper considers an application of the first-order autoregressive (AR(1)) model to realizations,
,of an unobservable variable, , representing a quality characteristic of a process monitored at a sequence of 'time' intervals in mineral processing or manufacturing production. The unknown realizations are observed subject to errors, implying errors-in-variables model, (AR(1)\_EIV), for the observed sequence of data. The model has a reasonably wide range of applications in process monitoring with autocorrelated data.
Application of such a model to process data requires both estimation of the unobservables, , in constructing one-step-ahead predictions and also estimation of all the underlying model parameters.
For given values of the underlying model parameters, estimation of the unobservables can be carried out most efficiently by Kalman-filter technique. Estimation of the model parameters can be handled by a number of techniques. Specific contributions of the present paper are: (i) a parametric approach comprising a comprehensive development of the full maximum likelihood technique for estimation of the model parameters in the presence of random effects, the number of which increases with the number of observations and (ii) a semi-parametric approach combining a direct or indirect fitting of a variogram with the method of moments, and minimum prediction error sum of squares techniques for estimation of model parameters.
Title:
Nonparametric Monotone Regression for Generalized Linear Models
Speaker: Prof Jyh-Jen Horng Shiau, National Chiao Tung University
Date: 08 August 2007 (Wednesday)
Time:
3:00pm - 4:00pm
Venue:
S16-06-118 (Seminar Room)
Abstract
In this study, motivated by the WAT-EC problem in semiconductor manufacturing, we develop a new nonparametric monotone smoothing spline smoother for analyzing responses from exponential families. The new method modifies the monotone smoothing spline smoother developed by Zhang (2004) and then combines with the methodology
developed by Gu (2002) for data from exponential families. An algorithm with implementation details is provided. Computation is efficient because we utilize the characteristics of the natural cubic splines. The effectiveness of the proposed method is studied by simulation and the results demonstrate that the proposed method performs well in the regression models with both the Bernoulli and Poisson responses. In terms of the averages squared error, the
proposed monotone estimator outperforms the unconstrained smoother when the latter produces non-monotone estimates, while retaining about the same performance when the latter produces monotone estimates. As an illustrative example for applications, we demonstrate the proposed method can be used in screening WAT test items for more stringent engineering control and in setting appropriate control limits.
Title:
Analysis of Least Absolute Deviation
Speaker: Prof Ying Zhiliang, Columbia University
Date: 25 June 2007 (Monday)
Time:
3:00pm - 4:00pm
Venue:
S16-06-118 (Seminar Room)
Abstract
In this talk, I will describe a least absolute deviation-based method for testing linear hypothesis. Like ANOVA, this method is coordinate-free, and admits singular design matrices. A simple approximation using stochastic perturbation is developed to obtain cut-off values for the resulting tests. Theoretical justification, computer implementation and simulation will be presented. Focus will be given to the special cases of one and multi-way layouts.
Title:
The Markovian Frame of the Bayesian Inference Upon the
Missing Data Models
Speaker:
Prof Gang Wei, School of Mathematics and System Sciences,
Shandong University, Jinan, Shandong
Date: 22 June 2007 (Friday)
Time:
4:00pm - 5:00pm
Venue:
S16-06-118 (Seminar Room)
Abstract
The EM algorithm and the
Data Augmentation scheme have been taken as the most fundamental
approaches handling the statistical inference for the
missing data models. Though it had long been recognized
since 1995 that exact posterior probability density function
might be obtained for some missing data models, statisticians
do not seem to have paid enough attention to this idea.
In this talk we will demonstrate that in most missing
data models with low dimensional parameter space, the
Bayesian/Likelihood inference could be performed more
efficiently as compared with the iterative/sampling schemes
of EM algorithm and the Data Augmentation. The Markovian
frame for this not so well-known approach, the so called
Inverse Bayesian Formula, will be briefly introduced and
discussed.
Title:
Spectral Analysis of Faint Astronomical Objects: Bayesian
Modeling, Computation, and Inference
Speaker:
Prof David A. Van Dyk, Department of Statistics, University
of California, Irvine
Date: 25 April 2007
(Wednesday)
Time:
4:00pm - 5:00pm
Venue:
S16-06-118 (Seminar Room)
Abstract
The development of ever
more sophisticated space-based telescopes brings forth
richer astronomical data that are opening a new window
on the cosmos. Instruments designed to record high-energy
electromagnetic radiation (X-rays and gamma-rays), for
example, are clarifying our understanding of some of the
most energetic events in the universe. Matter falling
into black holes, the birth and death of stars, and the
collisions of galaxies all can be explored through their
high-energy spectra. This cosmic exploration, however,
requires careful quantitative analysis of sometimes very
limited photon counts.
Statistical methods must
account not only for the complexity of the astronomical
objects themselves, but also of the instruments and the
scientific questions that are posed. In this talk we discuss
the search for narrow emission lines in spectra. The spectra
are the distribution of photon energies and the emission
lines are narrow ranges of energy with excess photon emission.
The search for lines involves constructing a multi-level
model that accounts for data degradation, instrumental
effects, and the structure in the astronomical sources.
The complexity of the model leads to highly multimodal
likelihoods and complicated inferential questions. Standard
test statistics cannot be directly used to evaluate the
evidence for including lines and computational methods
must be specially tailored to the problem. In the talk
I will emphasize the use of profile methods for exploratory
data analysis and a generalization of the Gibbs sampler
that samples incompatible conditional distributions but
is guaranteed to have the target posterior distribution
as its stationary distribution.
This is is joint work
with Taeyoung Park and the California-Harvard Statistics
Collaboration.
Title:
Kernel Methods For Optimal Change-Points Estimation In
Derivatives
Speaker:
Prof Ming-Yen Cheng, National Taiwan University and Marc
Raimondo University of Sydney
Date: 18 April 2007
(Wednesday)
Time:
4:00pm - 5:00pm
Venue:
S16-06-118 (Seminar Room)
Abstract
We propose an implementation
of the so-called zero-crossing-time detection technique
specifically designed for estimating the location of jump-points
in the first derivative (kinks) of a regression function.
Our algorithm relies on a new class of kernel functions
having a second derivative with vanishing moments and
an asymmetric first derivative steep enough near the origin.
We provide a software package which, for a sample of size
$n$, produces estimators with an accuracy of order, at
least, $O(n^{-2/5})$. This contrasts with current algorithms
for kink estimation which at best provide an accuracy
of order $O(n^{-1/3})$. In the software, the kernel statistics
is standardised and compared to the universal threshold
to assess the significance of the kink scenario. A simulation
study shows that our algorithm enjoys very good finite
sample properties even in large noise levels. The method
reveals kink features in real data sets with high noise
level where modern regression methods tend to oversmooth
the data.
Title:
Cluster Identification via Projection Pursuit
Speaker:
Prof Yannis Yatracos, Department of Statistics & Applied
Probability, NUS
Date: 11 April 2007
(Wednesday)
Time:
4:00pm - 5:00pm
Venue:
S16-06-118 (Seminar Room)
Abstract
In a sample variance decomposition,
the largest component "I" (for index) determines
two least homogeneous sample clusters. For multivariate
data, "I" can be used in the pursuit of two
clusters with the least homogeneous one-dimensional data
projection.
The properties of "I",
of its population counterpart and of the associated projection
pursuit index are examined. Applications include the determination
of:
a) clusters from a mixture
distribution,
b) remote observations
in regression,
c) a separating hyperplane
in support vector machines,
d) data structures.
With the proposed method
the "curse of dimensionality" turns into an
advantage in cluster detection.
Title:
Bayesian Functional Mapping of Complex Dynamic Traits
Speaker:
Dr. Liu Tian, Department of Statistics, University of
Florida
Date: 4 April 2007
(Wednesday)
Time:
4:00pm - 5:00pm
Venue:
S16-06-118 (Seminar Room)
Abstract
Understanding the genetic
control of complex dynamic traits is fundamental to agricultural,
evolutionary, and biomedical genetic research. In the
past, the so-called functional mapping model was derived
within the maximum likelihood context to characterize
the genetic and developmental mechanisms for many biological
processes. However, when dealing with such a high-dimension
problem, identifiability problems tend to occur for the
maximum likelihood method. Moreover, the computation load
is substantial to perform the significant tests and to
obtain the confidence interval estimators by using repeated
sampling techniques. To cope with those problems, we propose
a Bayesian approach that can identify multiple QTLs for
a dynamic complex trait simultaneously within the functional
mapping framework. Bayesian parameter estimation and hypothesis
testing, in our approach, are implemented via Markov chain
Monte Carlo algorithms. Some mice body mass data from
an F2 population are used to demonstrate the effectiveness
of this proposed method.
Title:
Global and Local Stationary Modelling in Finance: Theory
and Empirical Evidence
Speaker:
Prof Guegen Dominique, Department of d'Economie et Gestio,
E.N.S, Cachan, France
Date: 3 April 2007
(Tuesday)
Time:
4:00pm - 5:00pm
Venue:
S16-05-101, Computer Lab 1
Abstract
To model real data sets
using second order stochastic processes imposes that the
data sets verify the second order stationarity condition.
This stationarity condition concerns the unconditional
moments of the process. It is in that context that most
of models developed from the sixties’ have been
studied; We refer to the ARMA processes (Brockwell and
Davis, 1988), the ARCH, GARCH and EGARCH models (Engle,
1982, Bollerslev, 1986, Nelson, 1990), the SETAR process
(Lim and Tong, 1980 and Tong, 1990), the bilinear model
(Granger and Andersen, 1978, Guégan, 1994), the
EXPAR model (Haggan and Ozaki, 1980, the long memory process
(Granger and Joyeux, 1980, Hosking, 1981, Gray, Zang andWoodward,
1989, Beran, 1994, Giraitis and Leipus, 1995, Guégan,
2000), the switching process (Hamilton, 1988). For all
these models, we get an invertible causal solution under
specific conditions on the parameters, then the forecast
points and the forecast intervals are available.
Thus, the stationarity
assumption is the basis for a general asymptotic theory
for identification, estimation and forecasting. It guarantees
that the increase of the sample size leads to more and
more information of the same kind which is basic for an
asymptotic theory to make sense.
Now non-stationarity modelling
has also a long tradition in econometrics. This one is
based on the conditional moments of the data generating
process. It appears mainly in the heteroscedastic and
volatility models, like the GARCH and related models,
and stochastic volatility processes (Ghysels, Harvey and
Renault (1997)). This non stationarity appears also in
a different way with structural changes models like the
switching models (Hamilton, 1988), the stopbreak model
(Diebold and Inoue, 2001, Breidt and Hsu, 2002, Granger
and Hyung, 2004) and the SETAR models, for instance. It
can also be observed from linear models with time varying
coefficients (Nicholls and Quinn, 1982, Tsay, 1987).
Thus, using stationary
unconditional moments suggest a global stationarity for
the model, but using non-stationary unconditional moments
or nonstationary conditional moments or assuming existence
of states suggest that this global stationarity fails
and that we only observe a local stationary behavior.
The growing evidence of
instability in the stochastic behavior of stocks, of exchange
rates, of some economic data sets like growth rates for
instance, characterized by existence of volatility or
existence of jumps in the variance or on the levels of
the prices imposes to discuss the assumption of global
stationarity and its consequence in modelling, particularly
in forecasting. Thus we can address several questions
with respect to these remarks.
1. What kinds of non-stationarity
affect the major financial and economic data sets? How
to detect them?
2. Local and global stationarities:
How are they defined?
3. What is the impact
of evidence of non stationarity on the statistics computed
from the global non stationary data sets?
4. How can we analyze
data sets in the non stationary global framework? Does
the asymptotic theory work in non-stationary framework?
5. What kind of models
create local stationarity instead of global stationarity?
How can we use them to develop a modelling and a forecasting
strategy?
These questions began
to be discussed in some papers in the economic literature.
For some of these questions, the answers are known, for
others, very few works exist. In this paper we discuss
all these problems and we propose new stategies and modelling
to solve them. Several interesting topics im empirical
finance awaiting future research are also discussed.
Title:
Asymptotics of Eigenvectors of Large Sample Covariance
Matrices
Speaker:
Dr. Pan Guangming, University of Science & Technology
of China
Date: 21 March 2007 (Wednesday)
Time:
4:00pm - 5:00pm
Venue:
S16-06-118, Seminar Room
Abstract
The eigenvectors of sample
covariance matrices play an important role in principal
component, wireless communication and some other fields.
But, relative less work was done regarding the asymptotic
behavior of eigenvectors in the research of large dimensional
sample covariance matrices, compared to the eigenvalues.
In this talk, we define a new form of empirical spectral
distribution, which involves the eigenvectors and the
eigenvalues. It is shown that this empirical spectral
distribution and the classical empirical spectral distribution
converge to the same limiting spectral distribution. Based
on this new empirical spectral distribution, the central
limit theorem of linear spectral statistics involving
the eigenvectors and eigevalues are also established.
Finally, we demonstrate how large sample covariance matrix
theory work in wireless communication area.
Title:
Marginal Models For Analyzing Data On Recurrent And Terminal
Events
Speaker:
Prof John D. Kalbfleisch, Saw Swee Hock Professor of Statistics,
Department of Statistics & Applied Probability (NUS)
and University of Michigan
Date: 14 March 2007 (Wednesday)
Time:
4:00pm - 5:00pm
Venue:
S16-06-118, Seminar Room
Abstract
In clinical and observational
studies, recurrent event data (e.g. repeated hospitalizations)
are often encountered and, in some important applications,
the recurrent events are censored by a terminal event
(e.g. death). In such situations, the terminal and recurrent
event rates are often strongly correlated. We review models
and methods of analysis for data on recurrent and terminal
events, which have for the most part been based on complete
intensity models with strong Poisson type assumptions
for the recurrent event process. We develop a new approach
that retains the use of shared frailties to build in correlations,
but relaxes the complete intensity assumption. Specifically,
methods based on estimating functions with nonparametric
components are used to assess dependence on covariates
and to estimate the correlations between the recurrent
and terminal processes. Asymptotic results and approximations
parallel closely those available in the analysis of semiparametric
models. The approach is compared with others in the literature
and illustrated on data on recurrent hospitalizations
and failure of treatment that arise in a Canada/USA study
of peritoneal dialysis as a treatment for end stage renal
disease.
Title:
The General
Dynamic Factor Model: Determining the Number of Factors
Speaker:
Prof Marc Hallin, Institut de Statistique, E.C.A.R.E.S.,
and Department of Mathematics, Université
Libre de Bruxelles
Date: 7
March 2007 (Wednesday)
Time:
4:00pm - 5:00pm
Venue:
S16-06-118, Seminar Room
Abstract
In this talk we briefly
review estimation methods in the dynamic factor model,
and propose an information criterion for determining the
number q of factors in the general model developed by
Forni et al.~(2000), as opposed to the static and restricted
dynamic models considered in Bai and Ng (2002, 2005) or
Amengual and Watson (2006). Our criterion is based on
the fact that this number q is also the number of diverging
eigenvalues of the spectral density matrix of the observations
as the cross-sectional dimension n goes to infinity. We
provide sufficient conditions for consistency of the criterion
for large n and T (where T is the series length). We show
how the method can be implemented, and provide simulations
and empirics illustrating its excellent finite sample
performance. Application to real data brings some new
empirical contribution in the ongoing debate on the number
of factors driving the US economy.
*This is a joint work with Roman Liska
Title:
A Covariate-Adjusted
Adaptive Design For Two-Stage Clinical Trials With Survival
Data
Speaker:
Dr. Atanu Biswas, Indian Statistical Institute
Date: 27 February 2007
(Tuesday)
Time:
4:00pm - 5:00pm
Venue:
S16-05-101, Computer Lab 1
Abstract
A new two-stage response-adaptive
design for phase III clinical trials is proposed with
survival data in the presence of covariates. Several exact
and asymptotic properties of the design are studied. The
procedure is illustrated by using some real data.
(Joint work with Uttam Bandyopadhyay and Rahul Bhattacharya)
Title: A New Approach to Singular Stochastic Control in
Optimal Hedging
and Investment-Consumption Under Transaction Costs
Speaker:
Dr. Lim Tiong Wee, Department of Statistics and Applied
Probability, NUS
Date: 28 February 2007
(Wednesday)
Time:
4:00pm - 5:00pm
Venue:
S16-06-118, Seminar Room
Abstract
The problems of optimal
investment and consumption and of option pricing and hedging
in the presence of proportional transaction costs can
be formulated as singular stochastic control problems.
Up till now, numerical computation of the optimal trading
or hedging strategy has been based on the method of Markov
chain approximation and discrete-time dynamic programming
applied directly to the control problem, which necessitates
the comparison of maximum attainable utilities from buying
stock, selling stock, or doing nothing. This approach
is computationally intensive. In this talk, we propose
a new approach. Beginning with a class of singular stochastic
control problems that can be transformed to optimal stopping
problems, we use the equivalence to optimal stopping to
develop an efficient backward induction algorithm. We
then use the method of finite differences to modify the
backward induction algorithm for much more general stochastic
control problems, including those that arise in applications
to finite-horizon optimal investment and consumption and
to option pricing and hedging in the presence of transaction
costs. Specific algorithms and numerical results are provided
for these applications.
Title:
High-Dimensional Data Analysis
Speaker:
Prof Bai Zhidong, Department of Statistics and Applied
Probability, NUS
Dates: 7 February 2007
(Talks 1 & 2), 9 February 2007 (Talk 3) & 14 February
2007 (Talk 4 & 5)
Time:
3:00pm - 4:00pm (7 February 2007), 12:00pm - 1:00pm (9
February 2007), 3:00pm - 4:00pm (14 February 2007)
Venue:
S16-06-118, Seminar Room
Summary
Talk 1: In the first talk,
I would introduce some examples to show the difference
between large and small data analysis. How serious classical
limiting theorems made errors in statistical inferences.
Talk 2: I will introduce
some methodologies used in RMT. Introduce how the moment
method and Stieltjes transforms are used in RMT.
Talk 3: I will introduce
some results on Wigners Semicircular law and Marcenko-Pastur
Law.
Talk 4: I will introduce
spectral analysis of products of two random matrices and
the limit spectral law of large F-matrices.
Talk 5: I will introduce
the CLT of Linear Spectral Statistics constructed by eigenvalues
and those by eigenvectors.
Title:
Rounded Data Analysis
Speaker:
Prof Bai Zhidong, Department of Statistics and Applied
Probability, NUS
Dates: 31 January 2007
(Talks 1 & 2) & 2 February 2007 (Talk 3)
Time:
3:00pm - 4:00pm (31 January 2007), 12:00pm - 1:00pm (2
February 2007)
Venue:
S16-06-118, Seminar Room
Abstract
Except categorical data,
all continuous data need to be rounded when they are collected
and recorded. The rounding errors definitely affect the
accuracy of statistical inferences. In old days, it was
not seriously treated because statistical problems are
only met for small samples. However, with the wide application
of advanced computers, statisticians are gradually facing
to deal with data of large sample with large dimension.
This forces our statisticians to pay serious attention
to the rounding errors. It has been found in the literature
that the usual t-test will reject the true null hypothesis
with probability near to 1 when the sample size is large
enough. This shows that we have to search new methodologies
to deal with analysis of data when the observations are
rounded and the rounding scale is large relatively to
the sample size. In a series of my recent research works,
we proposed some new methods to deal with the analysis
of rounded data.
In the first talk, I will
give certain examples to show how seriously the rounding
errors affect the statistical inferences.
In the second talk, I
will revisit an example discussed by Dempster and Rubin
in their 1982 JRSSB paper. We will show that Sheppard
corrections are superior to the BRB corrections only for
their example and generally not the case. Further, we
propose a new method to find consistent and asymptotically
normal estimates in rounded linear models. The new method
is named as two-stage estimation.
In the third talk, I will
discuss the estimation problem through a rounded time
series model. We named the method as snake-cutting (sc
method). We show that the sc-estimates are consistent
and asymptotically normal.
Title:
Testing for Threshold Moving Average with Conditional
Heteroscedasticity
Speaker:
Dr. Li Guodong, Department of Statistics and Actuarial
Science, University of Hong Kong
Date: 22 January 2007
(Monday)
Time:
10:30am - 11:30am
Venue:
S16-06-118, Seminar Room
Abstract
The recent paper by Ling
and Tong (2005) considered a quasi-likelihood ratio test
for the threshold in moving average models with i.i.d.
errors. This article generalizes their results to the
case with GARCH errors and a new quasi-likelihood ratio
test is derived. The generalization is not direct since
the techniques developed for TMA models heavily depend
on the property of p-dependence which is no longer satisfied
by the time series odels with conditional heteroscedasticity.
The new test statistic in this article is shown to converge
weakly to a functional of a centered Gaussian process
under the null hypothesis of no threshold and it is also
proved that the test has nontrivial asymptotic power under
local alternatives. Monte Carlo experiments demonstrate
the necessity of our test when moving average time series
has a time varying conditional variance. As a further
support, two real data examples are also reported.
Title:
Informative Transmission Disequilibrum Test (i-TDT): Combined
Linkage and Association Mapping that Includes Unaffected
Offspring as well as Affected Offspring
Speaker:
Dr Guo Chao-Yu, Department of Mathematics & Statistics,
Boston University
Date: 18 January 2007
(Thursday)
Time:
3:00pm - 4:00pm
Venue:
S16-05-101, Computer Lab1
Abstract
To date, there is no test
valid for the composite null hypothesis of no linkage
or no association that utilizes transmission information
from heterozygous parents to their unaffected offspring
as well as the affected offspring from ascertained nuclear
families. Since the unaffected siblings also provide information
about linkage and association, we introduce a new strategy
called the informative-transmission disequilibrium test
(i-TDT), which uses transmission information from heterozygous
parents to all of the affected and unaffected offspring
in ascertained nuclear families and provides a valid chi-square
test for both linkage and association. The i-TDT can be
used in various study designs and can accommodate all
types of independent nuclear families with at least one
affected offspring. We show that the transmission/disequilibrium
test (TDT) [Spielman et al., 1993] is a special case of
the i-TDT, if the study sample contains only case-parent
trios. If the sample contains only affected and unaffected
offspring without parental genotypes, the i-TDT is equivalent
to the sibship disequilibrium test (SDT) [Horvath and
Laird, 1998]. In addition, the test statistic of i-TDT
is simple, explicit and can be implemented easily without
intensive computing. Through computer simulations, we
demonstrate that power of the i-TDT can be higher in many
circumstances compared to a method that uses affected
offspring only. Applying the i-TDT to the Framingham Heart
Study data, we found that the apolipoprotein E (APOE)
gene is significantly linked and associated with cross-sectional
measures and longitudinal changes in total cholesterol.
Title:
Changing Patterns of Myopia and Eye Growth in Singapore
Children - a Cohort Study
Speaker: Assoc.
Prof Saw Seang Mei, Department of Community, Occupational
and Family Medicine, Yong Loo Lin School of Medicine,
NUS
Date: 17 January 2007
(Wednesday)
Time:
3:00pm - 4:00pm
Venue:
S16-06-118, Seminar Room
Abstract
The Singapore Cohort study
Of the Risk factors for Myopia was conducted to determine
the longitudinal patterns of refractive error and risk
factors for incident myopia. 1,979 children from 3 schools
have been examined yearly. There were 1478 Chinese, 349
Malays and 152 children who were Indian and other races,
amongst them, 851 children aged 7 years, 630 children
aged 8 years and 498 children aged 9 years. During the
first visit, the parents completed a questionnaire that
included questions about possible risk factors for myopia
such as the number of books read per week and whether
the parents were myopic. Yearly eye examinations, including
vision chart testing, eye testing using an autorefractor
machine and biometry tests of eye size and shape have
been conducted and continue to be conducted in the schools.
We have examined the children in Yio Chu Kang Primary
School and Tao Nan School every year for the past 8 years
and Rulang Primary School yearly for the previous 6 years.
We plan to continue the yearly examinations of the children
even after the commencement of secondary school education.
The prevalence rates of myopia in Singapore school children
are one of the highest in the world: 28% in 7 year olds,
34% in 8 year olds, 43% in 9 year olds, 62.5% in 10 year
olds, 67.1% in 11 year olds, and 63% in 12 year olds.
The 3-year increases in axial length, anterior chamber
depth, lens thickness, vitreous chamber depth and corneal
curvature were 0.89 mm, -0.02 mm, -0.01 mm, 0.92 mm and
0.01 mm, respectively. Children who were younger, female
and who had a parental history of myopia were more likely
to have greater increases in axial length. In a cohort
analysis of three year data, the relative risks (RR) of
myopia was 1.37 [95% confidence interval (CI) 1.05 to1.80]
for two versus no myopic parents, after controlling for
school, age, gender, income, reading in books per week
and intelligence quotient (IQ). The multivariate RR of
myopia for IQ in the third versus first tertile was 1.47
(95% CI 1.16 to1.87). Among children with IQ in the highest
tertile, the RR of high myopia was 2.72 (95% CI 1.26 to
5.84) for those reading more than 2 books per week as
compared to those reading 2 books or less per week. This
cohort provides valuable data about the aetiology of the
incidence and progression of myopia.
Title:
Higher Order Semiparametric Frequentist Inference Based
on the Profile Sampler
Speaker: Mr
Cheng Guang, Institute of Statistics & Decision Sciences,
Duke University
Date: 15 January 2007 (Monday)
Time:
10:30am - 11:30am
Venue:
S16-06-118, Seminar Room
Abstract
In this talk, we have
systematically constructred a higher order frequentist
validation of semiparametric estimation procedures through
easy-to-implement Bayesian MCMC methodology. Specifically
speaking, inference for the parametric component of a
semiparametric model based on sampling from the posterior
profile distribution, called "the profile sampler",
is thoroughly investigated from frequentist viewpoint.
We first derive the second order asympotic frequentist
properties of the profile sampler in terms of distributions,
moments and confidence intervals. Further, by a delicate
analysis of the entropy of the semiparametric models involved,
we find that the accuracy of inferences based on the profile
sampler improve as the convergence rate of the nuisance
parameter increases.
From the above analysis,
we notice that the estimation accuracy of the profile
sampler method is intrinsically determined by the semiparametric
model specifications. Therefore it is natural to question
how to control the degree of accuracy. In the last section,
we address this by proposing the penalized profile sampler
method, in which we profile the penalized likelihood rather
than the full likelihood. Thus, we can achieve the desired
estimation accuracy for the parameter of interest by tuning
the associated smoothing parameters.
Our theory is verified
in several popular semiparametric models arising from
Survival Analysis, Epidemiology and Econometrics. As far
as we are aware, the above results are the first higher
order frequentist inferences obtained for semiparametric
estimation.
Title: Testing Hypothesis of Erros / Innovations
in Non-parametric Regression
Speaker: Prof
Estate V. Khmaladze, Victoria University of Wellington
Date: 11 January
2007 (Thursday)
Time:
3:00pm - 4:00pm
Venue:
S16-05-101, Computer Lab 1
Title: Approximating the Variance of the
Conditional Probability of the state of a Hidden Markov
Model
Speaker: Prof
David Siegmund, Stanford University
Date: 10 January
2007 (Wednesday)
Time:
3:00pm - 4:00pm
Venue:
S16-06-118, Seminar Room
Abstract
For a hidden Markov model
the variance of the conditional probability of the underlying
state given the observations measures the information
lost by failure to observe directly the state of the hidden
process. In the case when changes of state occur slowly
relative to the speed at which information about the underlying
state accumulates in the observed data, the variance of
this conditional probability is computed approximately
in terms of functionals of Brownian motion that arise
in change-point analysis. Applications in gene-mapping,
where this variance plays a role in standardizing the
score statistic and in evaluating the loss of noncentrality
due to incomplete information, are discussed. Numerical
examples illustrate the range of validity and limitations
of our results.