Stockholm university logo, link to start page

Seminar: Miriam Hurtado Bodell, Linköpings University

Wednesday 19 May 2021 13.00 – 14.00

Online

From Documents to Data: a Framework for Total Corpus Quality

Time: 19 maj, 2021, kl. 13-14
Where: This seminar is given online. E-mail Dan Hedlin if you want to attend.

Abstract

As digitized large-scale textual corpora and novel methodologies are increasingly becoming available, researchers are rediscovering textual sources’ potential for inquiries into social and cultural phenomena. Yet while textual corpora show great promise to enrich our knowledge of the social, empirical research faces challenges on how to avoid particular “garbage in-garbage out” problems: our scientific inferences are only as good as the quality of our data analyzed. This paper argues that an evaluation of a processed machine-readable corpus with regard to its quality is pivotal for later social science inquiries. The paper proposes a framework of total corpus quality, which identifies three dimensions that impact the potential of using large corpora for research. Our conceptual framework helps to diagnose and understand errors in studies based on large-scale textual analyses.

Labels