Denna sida på svenska

Ioanna MiliouSenior Lecturer

Denna sida på svenska

Contact

Name and title: Ioanna Miliou Senior Lecturer

ioanna.miliou@dsv.su.se

08-16 16 08

Orcid

0000-0002-1357-1967

Workplace: Department of Computer and Systems Sciences

Visiting address

Nodhuset, Borgarfjordsgatan 12

Postal address

Institutionen för data- och systemvetenskap

164 25 Kista

Research group

Data Science Research Group

The Data Science Research Group focuses on core data science research, as well as on applications where data science can provide insights for decision making. We formulate novel data science problems and develop algorithmic methods and methodological workflows.

Links

Personal Website

About me

I am Ioanna Miliou, a Senior Lecturer in Data Science. I am a member of the Data Science Research Group at the Department of Computer and Systems Sciences (DSV) of Stockholm University. I hold a Ph.D. in Computer Science from the University of Pisa, Italy, and a diploma in Electrical and Computer Engineering from the National Technical University of Athens, Greece.

Teaching

Current course at Stockholm University:
- FODS: Foundations of Data Science (Fall 2024-2025)
Past courses at Stockholm University:
- DATM: Data Mining (Spring 2024-2025)
- DSHI: Data Science for Health Informatics (Spring 2023-2024)
Supervisor at the graduate level in Data Science and Health Informatics

Research

Research overview

My research interests lie in the fields of Data Science for Social Good, Nowcasting, and Forecasting, with the use of Big Data Analytics, Data Mining, and Machine Learning. Using Big Data deriving from everyday life as external proxies, it is possible to nowcast and forecast the evolution of phenomena whose study relies only on historical data or data that come with a significant lag. I work mainly on epidemics, healthcare, peace, and sentiment.

PhD students

Current:

Franco Rugolon, Ph.D. student in Data Science (co-supervisor)
Lena Mondrejevski, Ph.D. student in Data Science (co-supervisor)

Graduated:

Maria Bampa, Ph.D. in Data Science (2024, co-supervisor)

Research projects

Publications

A selection from Stockholm University publication database

Glacier: guided locally constrained counterfactual explanations for time series classification

2024. Zhendong Wang (et al.). Machine Learning

Article

In machine learning applications, there is a need to obtain predictive models of high performance and, most importantly, to allow end-users and practitioners to understand and act on their predictions. One way to obtain such understanding is via counterfactuals, that provide sample-based explanations in the form of recommendations on which features need to be modified from a test example so that the classification outcome of a given classifier changes from an undesired outcome to a desired one. This paper focuses on the domain of time series classification, more specifically, on defining counterfactual explanations for univariate time series. We propose Glacier, a model-agnostic method for generating locally-constrained counterfactual explanations for time series classification using gradient search either on the original space or on a latent space that is learned through an auto-encoder. An additional flexibility of our method is the inclusion of constraints on the counterfactual generation process that favour applying changes to particular time series points or segments while discouraging changing others. The main purpose of these constraints is to ensure more reliable counterfactuals, while increasing the efficiency of the counterfactual generation process. Two particular types of constraints are considered, i.e., example-specific constraints and global constraints. We conduct extensive experiments on 40 datasets from the UCR archive, comparing different instantiations of Glacier against three competitors. Our findings suggest that Glacier outperforms the three competitors in terms of two common metrics for counterfactuals, i.e., proximity and compactness. Moreover, Glacier obtains comparable counterfactual validity compared to the best of the three competitors. Finally, when comparing the unconstrained variant of Glacier to the constraint-based variants, we conclude that the inclusion of example-specific and global constraints yields a good performance while demonstrating the trade-off between the different metrics.

Read more about Glacier: guided locally constrained counterfactual explanations for time series classification
M-ClustEHR: A multimodal clustering approach for electronic health records

2024. Maria Bampa (et al.). Artificial Intelligence in Medicine 154

Article

Sepsis refers to a potentially life-threatening situation where the immune system of the human body has an extreme response to an infection. In the presence of underlying comorbidities, the situation can become even worse and result in death. Employing unsupervised machine learning techniques, such as clustering, can assist in providing a better understanding of patient phenotypes by unveiling subgroups characterized by distinct sepsis progression and treatment patterns. More concretely, this study introduces M-ClustEHR, a clustering approach that utilizes medical data of multiple modalities by employing a multimodal autoencoder for learning comprehensive sepsis patient representations. M-ClustEHR consistently outperforms traditional clustering approaches in terms of several internal clustering performance metrics, as well as cluster stability in identifying phenotypes in the sepsis cohort. The unveiled patterns, supported by existing medical literature and clinicians, highlight the importance of multimodal clustering for advancing personalized sepsis care.

Read more about M-ClustEHR
The Impact of Climate Change on the Mental Health of Populations at Disproportionate Risk of Health Impacts and Inequities: A Rapid Scoping Review of Reviews

2024. Germán Andrés Alarcón Garavito (et al.). International Journal of Environmental Research and Public Health 21 (11)

Article

The impacts of climate change on mental health are starting to be recognized and may be exacerbated for populations at disproportionate risk of health impacts or inequalities, including some people living in low- and middle-income countries, children, indigenous populations, and people living in rural communities, among others. Here, we conduct a rapid scoping review of reviews to summarize the research to date on climate impacts on the mental health of populations at disproportionate risk. This review highlights the direct and indirect effects of climate change, the common mental health issues that have been studied related to climate events, and the populations that have been studied to date. This review outlines key gaps in the field and important research areas going forward. These include a need for more systematic methodologies, with before-and-after comparisons or exposure/non-exposure group comparisons and consistent mental health outcome measurements that are appropriately adapted for the populations being studied. Further research is also necessary in regard to the indirect effects of climate change and the climate effects on indigenous populations and populations with other protected and intersecting characteristics. This review highlights the key research areas to date and maps the critical future research necessary to develop future interventions.

Read more about The Impact of Climate Change on the Mental Health of Populations at Disproportionate Risk of Health Impacts and Inequities: A Rapid Scoping Review of Reviews
Counterfactual Explanations for Time Series Forecasting

2024. Zhendong Wang (et al.). 2023 IEEE International Conference on Data Mining (ICDM), 1391-1396

Conference

Among recent developments in time series forecasting methods, deep forecasting models have gained popularity as they can utilize hidden feature patterns in time series to improve forecasting performance. Nevertheless, the majority of current deep forecasting models are opaque, hence making it challenging to interpret the results. While counterfactual explanations have been extensively employed as a post-hoc approach for explaining classification models, their application to forecasting models still remains underexplored. In this paper, we formulate the novel problem of counterfactual generation for time series forecasting, and propose an algorithm, called ForecastCF, that solves the problem by applying gradient-based perturbations to the original time series. The perturbations are further guided by imposing constraints to the forecasted values. We experimentally evaluate ForecastCF using four state-of-the-art deep model architectures and compare to two baselines. ForecastCF outperforms the baselines in terms of counterfactual validity and data manifold closeness, while generating meaningful and relevant counterfactuals for various forecasting tasks.

Read more about Counterfactual Explanations for Time Series Forecasting
Ijuice: integer JUstIfied counterfactual explanations

2024. Alejandro Kuratomi Hernandez (et al.). Machine Learning

Article

Counterfactual explanations modify the feature values of an instance in order to alter its prediction from an undesired to a desired label. As such, they are highly useful for providing trustworthy interpretations of decision-making in domains where complex and opaque machine learning algorithms are utilized. To guarantee their quality and promote user trust, they need to satisfy the faithfulness desideratum, when supported by the data distribution. We hereby propose a counterfactual generation algorithm for mixed-feature spaces that prioritizes faithfulness through k-justification, a novel counterfactual property introduced in this paper. The proposed algorithm employs a graph representation of the search space and provides counterfactuals by solving an integer program. In addition, the algorithm is classifier-agnostic and is not dependent on the order in which the feature space is explored. In our empirical evaluation, we demonstrate that it guarantees k-justification while showing comparable performance to state-of-the-art methods in feasibility, sparsity, and proximity.

Read more about Ijuice: integer JUstIfied counterfactual explanations
COMET: Constrained Counterfactual Explanations for Patient Glucose Multivariate Forecasting

2024. Zhendong Wang (et al.). Annual IEEE Symposium on Computer-Based Medical Systems, 502-507

Conference

Applying deep learning models for healthcare-related forecasting applications has been widely adopted, such as leveraging glucose monitoring data of diabetes patients to predict hyperglycaemic or hypoglycaemic events. However, most deep learning models are considered black-boxes; hence, the model predictions are not interpretable and may not offer actionable insights into medical practitioners’ decisions. Previous work has shown that counterfactual explanations can be applied in forecasting tasks by suggesting counterfactual changes in time series inputs to achieve the desired forecasting outcome. This study proposes a generalized multivariate forecasting setup of counterfactual generation by introducing a novel approach, COMET, which imposes three domain-specific constraint mechanisms to provide counterfactual explanations for glucose forecasting. Moreover, we conduct the experimental evaluation using two diabetes patient datasets to demonstrate the effectiveness of our proposed approach in generating realistic counterfactual changes in comparison with a baseline approach. Our qualitative analysis evaluates examples to validate that the counterfactual samples are clinically relevant and can effectively lead the patients to achieve a normal range of predicted glucose levels by suggesting changes to the treatment variables.

Read more about COMET
MASICU: A Multimodal Attention-based classifier for Sepsis mortality prediction in the ICU

2024. Lena Mondrejevski (et al.). 2024 IEEE 37th International Symposium on Computer-Based Medical Systems (CBMS), 326-331

Conference

Sepsis poses a significant threat to public health, causing millions of deaths annually. While treatable with timely intervention, accurately identifying at-risk patients remains challenging due to the condition’s complexity. Traditional scoring systems have been utilized, but their effectiveness has waned over time. Recognizing the need for comprehensive assessment, we introduce MASICU, a novel machine learning model architecture tailored for predicting ICU sepsis mortality. MASICU is a novel multimodal, attention-based classification model that integrates interpretability within an ICU setting. Our model incorporates multiple modalities and multimodal fusion strategies and prioritizes interpretability through different attention mechanisms. By leveraging both static and temporal features, MASICU offers a holistic view of the patient’s clinical status, enhancing predictive accuracy while providing clinically relevant insights.

Read more about MASICU
ORANGE: Opposite-label soRting for tANGent Explanations in heterogeneous spaces

2023. Alejandro Kuratomi Hernandez (et al.). 2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA), 1-10

Conference

Most real-world datasets have a heterogeneous feature space composed of binary, categorical, ordinal, and continuous features. However, the currently available local surrogate explainability algorithms do not consider this aspect, generating infeasible neighborhood centers which may provide erroneous explanations. To overcome this issue, we propose ORANGE, a local surrogate explainability algorithm that generates highaccuracy and high-fidelity explanations in heterogeneous spaces. ORANGE has three main components: (1) it searches for the closest feasible counterfactual point to a given instance of interest by considering feasible values in the features to ensure that the explanation is built around the closest feasible instance and not any, potentially non-existent instance in space; (2) it generates a set of neighboring points around this close feasible point based on the correlations among features to ensure that the relationship among features is preserved inside the neighborhood; and (3) the generated instances are weighted, firstly based on their distance to the decision boundary, and secondly based on the disagreement between the predicted labels of the global model and a surrogate model trained on the neighborhood. Our extensive experiments on synthetic and public datasets show that the performance achieved by ORANGE is best-in-class in both explanation accuracy and fidelity.

Read more about ORANGE
Early prediction of the risk of ICU mortality with Deep Federated Learning

2023. Korbinian Robert Randl (et al.). 2023 IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS), 706-711

Conference

Intensive Care Units usually carry patients with a serious risk of mortality. Recent research has shown the ability of Machine Learning to indicate the patients’ mortality risk and point physicians toward individuals with a heightened need for care. Nevertheless, healthcare data is often subject to privacy regulations and can therefore not be easily shared in order to build Centralized Machine Learning models that use the combined data of multiple hospitals. Federated Learning is a Machine Learning framework designed for data privacy that can be used to circumvent this problem. In this study, we evaluate the ability of deep Federated Learning to predict the risk of Intensive Care Unit mortality at an early stage. We compare the predictive performance of Federated, Centralized, and Local Machine Learning in terms of AUPRC, F1-score, and AUROC. Our results show that Federated Learning performs equally well as the centralized approach (for 2, 4, and 8 clients) and is substantially better than the local approach, thus providing a viable solution for early Intensive Care Unit mortality prediction. In addition, we demonstrate that the prediction performance is higher when the patient history window is closer to discharge or death. Finally, we show that using the F1-score as an early stopping metric can stabilize and increase the performance of our approach for the task at hand.

Read more about Early prediction of the risk of ICU mortality with Deep Federated Learning
FLICU: A Federated Learning Workflow for Intensive Care Unit Mortality Prediction

2022. Lena Mondrejevski (et al.). 2022 IEEE 35th International Symposium on Computer-Based Medical Systems (CBMS), 32-37

Conference

Although Machine Learning can be seen as a promising tool to improve clinical decision-making, it remains limited by access to healthcare data. Healthcare data is sensitive, requiring strict privacy practices, and typically stored in data silos, making traditional Machine Learning challenging. Federated Learning can counteract those limitations by training Machine Learning models over data silos while keeping the sensitive data localized. This study proposes a Federated Learning workflow for Intensive Care Unit mortality prediction. Hereby, the applicability of Federated Learning as an alternative to Centralized Machine Learning and Local Machine Learning is investigated by introducing Federated Learning to the binary classification problem of predicting Intensive Care Unit mortality. We extract multivariate time series data from the MIMIC-III database (lab values and vital signs), and benchmark the predictive performance of four deep sequential classifiers (FRNN, LSTM, GRU, and 1DCNN) varying the patient history window lengths (8h, 16h, 24h, and 48h) and the number of Federated Learning clients (2, 4, and 8). The experiments demonstrate that both Centralized Machine Learning and Federated Learning are comparable in terms of AUPRC and F1-score. Furthermore, the federated approach shows superior performance over Local Machine Learning. Thus, Federated Learning can be seen as a valid and privacy-preserving alternative to Centralized Machine Learning for classifying Intensive Care Unit mortality when the sharing of sensitive patient data between hospitals is not possible.

Read more about FLICU
Impact of Dimensionality on Nowcasting Seasonal Influenza with Environmental Factors

2022. Stefany Guarnizo, Ioanna Miliou, Panagiotis Papapetrou. Advances in Intelligent Data Analysis XX, 128-142

Conference

Seasonal influenza is an infectious disease of multi-causal etiology and a major cause of mortality worldwide that has been associated with environmental factors. In the attempt to model and predict future outbreaks of seasonal influenza with multiple environmental factors, we face the challenge of increased dimensionality that makes the models more complex and unstable. In this paper, we propose a nowcasting and forecasting framework that compares the theoretical approaches of Single Environmental Factor and Multiple Environmental Factors. We introduce seven solutions to minimize the weaknesses associated with the increased dimensionality when predicting seasonal influenza activity level using multiple environmental factors as external proxies. Our work provides evidence that using dimensionality reduction techniques as a strategy to combine multiple datasets improves seasonal influenza forecasting without the penalization of increased dimensionality.

Read more about Impact of Dimensionality on Nowcasting Seasonal Influenza with Environmental Factors
Understanding peace through the world news

2022. Vasiliki Voukelatou (et al.). EPJ Data Science 11 (1)

Article

Peace is a principal dimension of well-being and is the way out of inequity and violence. Thus, its measurement has drawn the attention of researchers, policymakers, and peacekeepers. During the last years, novel digital data streams have drastically changed the research in this field. The current study exploits information extracted from a new digital database called Global Data on Events, Location, and Tone (GDELT) to capture peace through the Global Peace Index (GPI). Applying predictive machine learning models, we demonstrate that news media attention from GDELT can be used as a proxy for measuring GPI at a monthly level. Additionally, we use explainable AI techniques to obtain the most important variables that drive the predictions. This analysis highlights each country’s profile and provides explanations for the predictions, and particularly for the errors and the events that drive these errors. We believe that digital data exploited by researchers, policymakers, and peacekeepers, with data science tools as powerful as machine learning, could contribute to maximizing the societal benefits and minimizing the risks to peace.

Read more about Understanding peace through the world news
Measuring objective and subjective well-being: dimensions and data sources

2021. Vasiliki Voukelatou (et al.). International journal of data science and analytics 11, 279-309

Article

Well-being is an important value for people’s lives, and it could be considered as an index of societal progress. Researchers have suggested two main approaches for the overall measurement of well-being, the objective and the subjective well-being. Both approaches, as well as their relevant dimensions, have been traditionally captured with surveys. During the last decades, new data sources have been suggested as an alternative or complement to traditional data. This paper aims to present the theoretical background of well-being, by distinguishing between objective and subjective approaches, their relevant dimensions, the new data sources used for their measurement and relevant studies. We also intend to shed light on still barely unexplored dimensions and data sources that could potentially contribute as a key for public policing and social development.

Read more about Measuring objective and subjective well-being

Show all publications by Ioanna Miliou at Stockholm University

Edit the profile

Ioanna MiliouSenior Lecturer

About me

Teaching

Research

Research projects

Publications

Glacier: guided locally constrained counterfactual explanations for time series classification

M-ClustEHR: A multimodal clustering approach for electronic health records

The Impact of Climate Change on the Mental Health of Populations at Disproportionate Risk of Health Impacts and Inequities: A Rapid Scoping Review of Reviews

Counterfactual Explanations for Time Series Forecasting

Ijuice: integer JUstIfied counterfactual explanations

COMET: Constrained Counterfactual Explanations for Patient Glucose Multivariate Forecasting

MASICU: A Multimodal Attention-based classifier for Sepsis mortality prediction in the ICU

ORANGE: Opposite-label soRting for tANGent Explanations in heterogeneous spaces

Early prediction of the risk of ICU mortality with Deep Federated Learning

FLICU: A Federated Learning Workflow for Intensive Care Unit Mortality Prediction

Impact of Dimensionality on Nowcasting Seasonal Influenza with Environmental Factors

Understanding peace through the world news

Measuring objective and subjective well-being: dimensions and data sources