Data Modelling and Applied Statistics

ROBUST ESTIMATORS AND LARGE DEVIATIONS
1975-77

During my two years of research in the Statistics Dept at Univ. of California (Berkeley), I participated intensively in seminars led by Lucien LECAM, and became quite interested by LECAM’s powerful approach to asymptotics of statistical experiments and parameter estimation. At the same time, I absorbed innovative ideas of HUBER and BICKEL on robust parameter estimators, built to resist random perturbations of observed data and to avoid overfitting of stochastic models. The natural links of their approaches with LECAM distance between experiments, and Kullback distance between probability models, as well as with BAHADUR and ZABELL exponential rates of convergence for parameter estimators, clinched excitingly with my ongoing work on large deviations theory.
This led me to launch, at University Paris 11, in collaboration with Didier DACUNHA-CASTELLE, an intensive one year seminar on large deviations applied to asymptotic statistics. I thus obtained concrete results for two large classes of robust estimators for the unknown mean of independent random variables, the so called M-estimators and R-estimators generated by adaptive censoring away of extreme observations.

References :
Two Chapters in book
Robust Estimation (Ed. DacunhaCastelle) , Asterisque, Soc. Math. France, vol 43 , 1977
Parameters Estimation through Rank Tests , R.Azencott , pp 41-64
Robustness of R-estimators , R. Azencott, pp 189-202

 

TIME SERIES AND ARMA MODELS
1980-1989

In 1975, I had created at Universities Paris 7 and Paris 1 a new cursus “Mathematics and Economy”, which I directed until 1982, and in which I taught several applied graduate courses on auto-regressive/moving average (ARMA ) models for modelization and forecast of random time series. These popular statistical techniques essentially rely on estimating adequate rational fractions to approximate the spectral density of 2nd order stationary random processes. A key problem in this context, as noticed in the mid-seventies by AKAIKE, is to avoid over-parametrization of the statistical model, which leads inexorably to asyptotically inconsistent estimators. But even the intuitive AKAIKE approach to control dimension estimates in ARMA models, had just been shown to still lead to asymptotically inconsistent dimension estimates for the underlying model.
I thus undertook, with D. DACUNHA-CASTELLE, a rigorous and thorough study of computable consistent estimates of parametric dimensions for ARMA models, which led us to publish an in depth book on random time series and their forecast. Our book had editions, successively in french, english, and japanese.
The statistical know-how involved in adequate modeling of real life time series by ARIMA techniques was widely diffused for practical purposes but not always well formalized. I felt that this situation could be improved, and collaborated with two remarkably competent applied statisticians and computer scientists : Yvonne and Bernard GIRARD, as well as with P. ASTIER and M.M. MARTIN (EDF) to realize a desktop scientific software offering quick and easy implementations of consistent dimension estimates as well as efficient expert parametric estimates of random time series by ARIMA models.
This became a successful joint project with EDF/GDF, the french national electricity company, which supported the research, and adopted our scientic software MANDRAKE in its R&D dept.
A collaboration with INRETS (national research institute for transportations), on efficient stochastic modelization and forecasts for massive national data recorded on highway vehicule accidents, led me to direct Anne RICORDEAU’s applied PhD thesis, and to focus her work on Markov fields models for interacting families of Poisson stochastic processes.
A couple of years later, the INRETS collaboration was extended, with the financial support of the french transportation ministry, and I led an efficient scientific team (Y. and B. GIRARD, J. LACAILLE, B. DURAND) in the delicate extension of MANDRAKE automatic ARIMA modeling to Multivariate Time series.
The number of ARIMA parameters to estimate increases very fast with the dimension of the observed random vectors, so that a good mathematical approach to avoid overparametrization becomes a key practical question, and it was an exciting challenge to link the theoretical results of RISSANEN on “model dimension estimates” to new concrete algorithms implementable in an easy to use scientific software. We then collaborated with EDF/GDF, to implement at the national operational level a boosted up and customized version of our MANDRAKE software for on line automatization of the massive nationwide regional short-term forecasts of gaz consumptions.

References :
(Book) Time Series of irregular observations
R. Azencott, D.Dacunha Castelle ; French edition : Masson Paris 1983
English edition : Springer NewYork 1985; Japanese edition, : Tokyo 1989

Mandrake, expert software and algorithms for time series forecast
R. Azencott, B. and Y. Girard, R. Astier, P. Jacoubowitz, M.M. Martin
Proceedings 8th Int. Symp. on Forecasting Amsterdam 1988