Analysing large datasets of functional data: a survey sampling point of view

Pauline Lardin-Puech, Hervé Cardot, Camelia Goga

Résumé


At the age of Big Data, it is now common to have to deal with very large datasets of phenomena that evolve over time. When the aim is to estimate simple quantities such as the mean or the median trajectory, as well as the main modes of variation of the data, captured through a principal components analysis, survey sampling techniques may be employed successfully. They can offer an interesting trade off between size of the data and accuracy of estimators. This paper makes a review of survey sampling approaches recently developed to deal with large datasets of functional data. We present different sampling techniques that can be employed to build confidence bands and improve, with the help of auxiliary information, the accurary of estimators compared to simple random sampling without replacement. These procedures are illustrated on a dataset of electricity load curves measured every half-hour over a period of one week.

Texte intégral : PDF


Creative Commons License
Ce travail est autorisé sous licence avec la Licence de paternité Creative Commons 3.0.

SFdS / SMF - Journal de la Société Française de Statistique - ISSN 2102-6238