New algorithm finds valuable information in complex data
Data is available in abundance. However, to utilize this raw material we need to sort it so that patterns and trends become visible. Zed Lee develops algorithms that are useful for industry as well as healthcare.
Zed Lee’s research interest was sparked ten years ago. Back then, he worked for a car manufacturer in Korea. Machine learning and deep learning were hot topics.
“The company had lots of data that we tried to analyze. I became more and more interested in machine learning”, says Zed Lee.
“When I started working at the company, I thought I would stay there forever. But quite soon, I got bored. I was young and wanted to learn more about things that could affect my future career, whether it would be within the business sector or academia.”
Curiosity led Zed Lee to Stockholm where he pursued a master’s degree at KTH. After graduation, his teacher Henrik Broström informed him that the Department of Computer and Systems Sciences (DSV) at Stockholm University was looking for doctoral students. Zed Lee was admitted and in November 2023, he defended his PhD thesis.
The methods we use must be interpretable and understandable
“When I came to Sweden, I got to know all these weird algorithms. I thought: ‘Does it have to be so complex?’ If your car has a problem, it is not enough to know that there is a problem. You want to know why there is a problem The methods we use must be interpretable and understandable.”
Interpretability is a guiding principle in his PhD thesis, which is largely theoretical. It concerns data mining – how we work with data extraction.
Hidden values in data
In traditional mining, one hacks or drills through rock walls to find precious metals and other valuable materials. Data mining works similarly: By processing massive amounts of data, valuable information can be extracted.
“My thesis is based on time and time series. It has to do with combining different types of data and identifying patterns”, says Zed Lee.
In his research, he has used data from both clothing manufacturers and dentists, but the principle remains the same. How can we collect qualitative and quantitative data generated over different time periods, analyse them and draw conclusions – that can be acted upon?
“A time series could, for example, be measurements of a person’s heart rate over six months. During the same period – or a part of it – we can also measure their blood pressure.”
Both measurements result in numbers. But the data could be more complicated than that. Suppose a doctor diagnoses a patient and prescribes a medication to be taken for two weeks. During five of those days, the patient experiences dizziness – a potential side effect of the drug. Or not? The doctor’s notes might contain valuable data to help our understanding of the problem.
“To draw conclusions and identify patterns, we need to integrate, interpret, and analyse data from different sources and time periods – data that change over time. We also need to compare with other patients who have the same or different diagnoses.”
In his thesis, Zed Lee develops a new algorithm capable of dealing with time series data from different sources.
“The data is complex, and the algorithm is quite complicated too. But the point is that the outcome is easy to understand.”
“The algorithm can be used by all sorts of organisations that have this type of data”, he says.
When Zed Lee joined DSV, he was happy to see that nine other PhD students started their journeys at the same time. Another advantage, according to Zed Lee, was that the topic of each thesis was more open than it usually is.
“The life of a PhD student can be lonely, but it hasn’t been for us. We started at the same time, took courses together, and had our mid-term seminars at roughly the same time. That has been a strength.”
“I had an interest in data mining when I started, even though I didn’t know exactly how to formulate my research questions. All of us who started at the same time have chosen to focus on different topics”, Zed Lee explains.
I like to explore new things
He emphasises that the journey of a doctoral student has been tough at times. For instance, when his articles were not accepted or when multiple deadlines coincided. Nevertheless, he looks forward to continuing in academia.
“As an employee in a company, the level of freedom is not always high. I like to explore new things and take responsibility for my own projects”, says Zed Lee.
More about the research
Zed Lee successfully defended his PhD thesis on November 24, 2023, at the Department of Computer and Systems Sciences (DSV), Stockholm University.
The title of his thesis is “Z-Series: Mining and learning from complex sequential data”.
Zed Lee’s supervisor is Panagiotis Papapetrou, DSV, co-supervisor is Tony Lindgren, DSV.
Toon Calders, University of Antwerp, Belgium, was the external reviewer.
Text: Åse Karlén
Last updated: December 6, 2023
Source: Department of Computer and Systems Sciences, DSV