Science.Online
Publisher and Institutes
Akademie Verlag
Deutsches Institut für Urbanistik
Oldenbourg Wissenschaftsverlag
Walter de Gruyter
Schattauer
You are here: Home :: Area CULI :: Linguistics and literature :: Communication science
 
Adam Kilgarriff

Language is never, ever, ever, random

Language users never choose words randomly, and language is essentially non-random. Statistical hypothesis testing uses a null hypothesis, which posits randomness. Hence, when we look at linguistic phenomena in corpora, the null hypothesis will never be true. Moreover, where there is enough data, we shall (almost) always be able to establish that it is not true. In corpus studies, we frequently do have enough data, so the fact that a relation between two phenomena is demonstrably non-random, does not support the inference that it is not arbitrary. We present experimental evidence of how arbitrary associations between word frequencies and corpora are systematically non-random. We review literature in which hypothesis testing has been used, and show how it has often led to unhelpful or misleading results.

Corpus Linguistics and Lingustic Theory, Walter de Gruyter

Print ISSN: 1613-7027
Volume: 1, 11/2005
Pages: 263 - 276

Show full article (external site)

Show all available items of this journal