Corpus-Based Methods

This course deals with corpus-based methods, that is, the large-scale study of written text, or spoken or signed utterances.

The course covers data, methods and evidence in different linguistic traditions. It also explores quantitative properties of language, for example frequencies, and n-grams.

The course also gives an overview of computational linguistic methods for automatic segmentation and annotation of text – including tokenisation, part-of-speech tagging and syntactic analysis – and describe how to search corpora using regular expressions.

We will also analyze corpora, based on occurrences and co-occurrences and the relationship between corpus material and research questions, ethics, copyright, and licenses.



Teaching Format

The course is based on lectures and laborations.


Assessment

The course is examined through written exams and reports.

Examiner

The schedule will be available no later than one month before the start of the course. We do not recommend print-outs as changes can occur. At the start of the course, your department will advise where you can find your schedule during the course.


Note that the course literature can be changed up to two months before the start of the course.


Course reports are displayed for the three most recent course instances.







You are welcome to contact our Student Office!