A general approach to account for dependence in large-scale multiple testing

Chloé Friguet


The data generated by high-throughput biotechnologies are characterized by their high-dimension and
heterogeneity. Usual, tried and tested inference approaches are questioned in the statistical analysis of such data.
Motivated by issues raised by the analysis of gene expressions data, I focus on the impact of dependence on the
properties of multiple testing procedures in high-dimension. This article aims at presenting the main results: after
introducing the issues brought by dependence among variables, the impact of dependence on the error rates and on
the procedures developed to control them is more particularly studied. It results in the description of an innovative
methodology based on a factor structure to model the data heterogeneity, which provides a general framework to deal
with dependence in multiple testing. The proposed framework leads to less variability for error rates and consequently
shows large improvements of power and stability of simultaneous inference with respect to existing multiple testing
procedures. Besides, the model parameters estimation in a high-dimensional setting and the determination of the
number of factors to be considered in the model are evoked. These results are then illustrated by real data from
microarray experiments analyzed using the R package called FAMT.

Texte intégral : Sans titre

Creative Commons License
Ce travail est autorisé sous licence avec la Licence de paternité Creative Commons 3.0.

SFdS / SMF - Journal de la Société Française de Statistique - ISSN 2102-6238