Analysing large datasets of functional data: a survey sampling point of view


  • Pauline Lardin-Puech
  • Hervé Cardot
  • Camelia Goga


At the age of Big Data, it is now common to have to deal with very large datasets of phenomena that evolve over time. When the aim is to estimate simple quantities such as the mean or the median trajectory, as well as the main modes of variation of the data, captured through a principal components analysis, survey sampling techniques may be employed successfully. They can offer an interesting trade off between size of the data and accuracy of estimators. This paper makes a review of survey sampling approaches recently developed to deal with large datasets of functional data. We present different sampling techniques that can be employed to build confidence bands and improve, with the help of auxiliary information, the accurary of estimators compared to simple random sampling without replacement. These procedures are illustrated on a dataset of electricity load curves measured every half-hour over a period of one week.






Numéro spécial : sondages