1. Introduction
In this issue of
The points raised in Kilgarriff’s paper are various and important and considerations of space do not allow me to address all of them in as great detail as they certainly deserve. I will therefore concentrate on only one particular aspect of the paper which I find ? given my own research history and subjective interests ? particularly important, namely the issue of statistical hypothesis testing. More precisely, I will address one of the central claims of Kilgarriff’s paper. Kilgarriff argues ? apparently taking up issues from methodological discussion in many other disciplines (cf. section 2) ? that the efficiency of statistical null-hypothesis testing is often doubtful because (i) “[g]iven enough data, H0 is almost always rejected however arbitrary the data” and (ii) “true randomness is not possible at all”. In information-retrieval parlance, null-hypothesis significance testing when applied to large corpora yields too many false hits.
In this short discussion note I would like to do two things. First, I would like to make a few suggestions as to what I think are the most natural methodological consequences of Kilgarriff’s statement and several other points of critique concerning null-hypothesis significance testing raised in other disciplines. Second, I would like to revisit one of the examples Kilgarriff discusses in his paper to exemplify aspects of these proposals and show how the results bear on corpus-linguistic issues.
Print ISSN: 1613-7027
Volume: 1, 11/2005
Pages: 277 - 294