Science.Online
Publisher and Institutes
Akademie Verlag
Deutsches Institut für Urbanistik
Oldenbourg Wissenschaftsverlag
Walter de Gruyter
Schattauer
You are here: Home :: Area NEM :: Medical science :: Human medicine
 
F. Markowetz, R. Spang

Molecular Diagnosis. Classification, Model Selection and Performance Evaluation

Keywords: Microarrays, statistical classification, generalization error, model assessment, gene selection

OBJECTIVES: We discuss supervised classification techniques applied to medical diagnosis based on gene expression profiles. Our focus lies on strategies of adaptive model selection to avoid overfitting in highdimensional spaces. METHODS: We introduce likelihood-based methods, classification trees, support vector machines and regularized binary regression. For regularization by dimension reduction, we describe feature selection methods: feature filtering, feature shrinkage and wrapper approaches. In small sample-size situations efficient methods of data re-use are needed to assess the predictive power of a model. We discuss two issues in using cross-validation: the difference between in-loop and out-of-loop feature selection, and estimating model parameters in nested-loop cross-validation. RESULTS: Gene selection does not reduce the dimensionality of the model. Tuning parameters enable adaptive model selection. The feature selection bias is a common pitfall in performance evaluation. Model selection and performance evaluation can be combined by nested-loop cross-validation. CONCLUSIONS: Classification of microarrays is prone to overfitting. A rigorous and unbiased assessment of the predictive power of the model is a must.

Methods of Information in Medicine, Schattauer

Print ISSN: 0026-1270
Volume: 44, 01/2005
Pages: 438 - 443

Show full article (external site)

Show all available items of this journal