Putting the Corpus Into the Dictionary

Vinay Hegde1 and T. Rangaswamy2

1Department of Computer Science & Engineering, RVCE, Bangalore, India.

2Department of IEM, RVCE, Bangalore, India.


A corpus is an arbitrary sample of language, whereas a dictionary aims to be a systematic account of the lexicon of a language. Children learn language through encountering arbitrary samples, and using them to build systematic representations.

These banal observations suggest a relationship between corpus and dictionary in which the former is a provisional and dispensable resource used to develop the latter. In this paper we use the idea to, first, review the Word Sense Disambiguation (WSD) research paradigm, and second, guide our current activity in the development of the Sketch Engine, a corpus query tool. We develop a model in which a database of mappings between collocations and meanings acts as an interface between corpus and dictionary.

