Stockholm university logo, link to start page
Gå till denna sida på svenska webben

Corpus-Based Methods

This course deals with corpus-based methods, that is, the large-scale study of written text, or spoken or signed utterances.

Contents: Data, methods and evidence in different linguistic traditions. Quantitative properties of language, frequencies, n-grams. Data collection for different types of corpora (including traditional sample corpora, monitor corpora and web corpora) and modalities (text, speech, signing). Representation of corpora in XML. Overview of computational linguistic methods for automatic segmentation and annotation of text, including tokenisation, part-of-speech tagging and syntactic analysis. Searching corpora using regular expressions. Analysis of corpora based on occurrences and co-occurrences. Relationship between corpus material and research questions. Ethics, copyright, licenses.

  • Course structure

    Teaching format

    The course is based on lectures and laborations.

    Assessment

    The course is examined through written exams and reports.

    Examiner

    Mats Wirén

  • Schedule

    The schedule will be available no later than one month before the start of the course. We do not recommend print-outs as changes can occur. At the start of the course, your department will advise where you can find your schedule during the course.
  • Course literature

    Note that the course literature can be changed up to two months before the start of the course.
  • Contact

    Student Affairs Office, Departement of Linguistics

    Södra huset, C 378
    Visiting hours for students
    Tuesdays 9.00-10.00
    Wednesdays 13.00-15.00
    Thursdays 9.00-11.00 and 13.00-16.00

    exp@ling.su.se
    +46 8 16 23 47

    Director of Studies

    Sofia Gustafson-Capková
    ma@ling.su.se
    +46 8 16 34 88