A Robust statistical model of word frequencies

Date
2019-08
Authors
Ingelby, Michael
Sharoff, Serge
Journal Title
Journal ISSN
Volume Title
Publisher
Strathmore University
Abstract
For the purposes of language teaching or automatic language processing it is important to know how frequent a word is. However, a simple procedure counting the number of times a word occurs in a collection of texts leads to many unfortunate artefacts because some words occur too often in a small number of texts leading to frequency bursts. Our task in this paper is to introduce a statistical model which uses methods from robust statistics to estimate the frequencies of words in a collection of texts.
Description
Paper presented at the 5th Strathmore International Mathematics Conference (SIMC 2019), 12 - 16 August 2019, Strathmore University, Nairobi, Kenya
Keywords
Robust statistics, Word frequencies, Core lexicon
Citation
Collections