A Robust statistical model of word frequencies

Ingelby, Michael; Sharoff, Serge

A Robust statistical model of word frequencies

Files

A robust statistical model of word frequencies.pdf(89 KB)

Date

2019-08

Authors

Ingelby, Michael

Sharoff, Serge

Publisher

Strathmore University

Abstract

For the purposes of language teaching or automatic language processing it is important to know how frequent a word is. However, a simple procedure counting the number of times a word occurs in a collection of texts leads to many unfortunate artefacts because some words occur too often in a small number of texts leading to frequency bursts. Our task in this paper is to introduce a statistical model which uses methods from robust statistics to estimate the frequencies of words in a collection of texts.

Description

Paper presented at the 5th Strathmore International Mathematics Conference (SIMC 2019), 12 - 16 August 2019, Strathmore University, Nairobi, Kenya

Keywords

Robust statistics, Word frequencies, Core lexicon

URI

http://hdl.handle.net/11071/10467

Collections

SIMC 2019

Full item page