PhD students build open infrastructure for register-data research

Two PhD students have developed a new open system designed to make it easier for researchers to understand, use, and replicate studies based on register data. The tools are already used by researchers and Swedish government agencies, and autolabel is installed in Statistics Sweden's secure research environment MONA.

Jeffrey Clark and Jie Wen.

Jeffrey Clark and Jie Wen.

A new approach to making Nordic register data accessible

The RegiStream project was founded in 2024 by Jeffrey Clark, a PhD student in economics at Stockholm University, and Jie Wen, a PhD student in accounting at the Stockholm School of Economics. The infrastructure consists of a multilingual metadata catalog and two tools: autolabel and datamirror.

Jeffrey Clark explains that RegiStream acts as a software layer between register data and research.

”Instead of every researcher having to interpret on their own how different agencies document their variables, we have gathered and curated the metadata into a shared catalog,” he says. ”Fundamentally, this is about accessibility, both for researchers who don’t work with the data daily and for international collaborators who don’t read Swedish.”

The catalog today covers 64,367 unique variables from six Nordic statistical agencies. Using the accompanying tools, chiefly autolabel, researchers can apply variable and value labels to an entire dataset with a single command, in Swedish or English.

The second part of RegiStream is the recently released datamirror tool. It addresses a structural gap in register-data research: studies based on restricted microdata cannot currently be replicated because the data cannot ship with the code.

Clark and Wen have a paper presenting autolabel in the revise-and-resubmit stage at the Stata Journal. Everything is open source at registream.org.

Last updated: 2026-05-04

Source: Department of Economics