Regression analysis of a response variable Y requires careful selection of explanatory variables. The quality of a set of explanatory features X=(X(1), ..., X(d)) can be measured in terms of the minimum mean squared error
L*=minfE{(Y−f(X))2}.
This paper investigates methods for estimating L* from i.i.d. data. No estimate can converge rapidly for all distributions of (X,Y). For Lipschitz continuous regression function E{Y|X=x}, two estimators for L* are discussed: fitting a regression estimate to a subset of the data and assessing its mean residual sum of squares on the remaining samples, and a nearest neighbor cross-validation type estimate.
Print ISSN: 0721-2631
Volume: 21, 01/2003
Pages: 015