A Robust statistical model of word frequencies

Loading...
Thumbnail Image

Authors

Ingelby, Michael
Sharoff, Serge

Journal Title

Journal ISSN

Volume Title

Publisher

Strathmore University

Abstract

For the purposes of language teaching or automatic language processing it is important to know how frequent a word is. However, a simple procedure counting the number of times a word occurs in a collection of texts leads to many unfortunate artefacts because some words occur too often in a small number of texts leading to frequency bursts. Our task in this paper is to introduce a statistical model which uses methods from robust statistics to estimate the frequencies of words in a collection of texts.

Description

Paper presented at the 5th Strathmore International Mathematics Conference (SIMC 2019), 12 - 16 August 2019, Strathmore University, Nairobi, Kenya

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By