Stockholm university logo, link to start page

Research project DataLEASH: LEarning And SHaring under Privacy Constraints

With massive amounts of personal data being generated, privacy has become a great challenge. This project studies how machine learning can be used for sharing language models without risking to share information that may identify individuals.

Genre photo: Multi-colored text on a computer screen. Photo: Shahadat Rahman/Unsplash.
Photo: Shahadat Rahman/Unsplash.

The recent confluence of digitalization, increasingly data heavy technologies, advances in machine learning, and legal regulations has turned privacy into a great challenge.

While regulations such as the GDPR serve as a major step toward protecting society, there is a lack of guidelines and technical specifications of what kind of privacy leakage is acceptable. Currently, this prevents data from being exploited and shared to the full extent possible.

To solve these problems, we need:
1. quantitative and legal privacy risk and utility assessments, and 
2. mechanisms for data transformation and learning that improve the results of these assessments.

Thus, the DataLEASH project will develop and test the methods that will lead to more open data. Participants in this project are Stockholm University, KTH and RISE.

HB Deid is a tool that has been developed for de-identification of texts. See how it works

Former members of this project are Hanna Berg and Mila Grancharova.

Recipients of this project are Charlotte Dingertz, City of Stockholm, Sven-Åke Lööv, Region Stockholm, Henrik Löf, Karolinska University Hospital, Marina Santini, RISE, and Peter Lundberg, Linköping University Hospital.

This project is financed in KTH’s digitalization initiative in 2019 for IT and mobile communication (ICT TNG) through the government’s strategic research areas (SFO) to create world-leading research.

Project description

In WP5, we will start with a problem formulation from existing projects where medical, municipal, and other data repositories have been facing challenges with privacy, anonymization, pseudonymization and similar.

We will perform a series of experiments on existing very large data sets from, e. g., the Stockholm county council (medical records), Elekta (medical imaging data) and City of Stockholm (data from numerous systems that are linked to many areas of the city), to investigate possibilities and challenges with the mechanisms developed within DataLEASH.

WP5 will also create demo applications for demonstrating possibilities of DataLEASH mechanisms on different types of data. At Stockholm University, we have access to the research infrastructure Health Bank, the Swedish Health Record Research Bank that contains over two million electronic patient records from Karolinska University Hospital from the years 2007–2014.

They are stored in a relational database with over 80 tables where experiments can be carried out to study when and where anonymity can be preserved for example by using different privacy preserving data record linkage methods beyond regular pseudonymization. Experiments can be done and data securely shared between partners on the RISE ICE computer cluster.

Project members

Project managers

Hercules Dalianis

Professor

Department of Computer and Systems Sciences
Hercules Dalianis

Uno Fors

Professor

Department of Computer and Systems Sciences

Members

Thomas Vakili

PhD student

Department of Computer and Systems Sciences
Picture of Thomas Vakili in Bender

Anastasios Lamproudis

Forskningsassistent

Department of Computer and Systems Sciences
Anastasios Lamproudis

Publications

Berg, H., Henriksson, A., Fors, U. and Dalianis, H. (2020)

“De-identification of Clinical Text for Secondary Use: Research Issues”. Presented at the Healthcare Text Analytics Conference HealTAC 2020, April 23, London.