Stockholms universitet

Ioanna MiliouUniversitetslektor

Om mig

I am Ioanna Miliou, a Senior Lecturer in Data Science. I am a member of the Data Science Research Group at the Department of Computer and Systems Sciences (DSV) of Stockholm University. I hold a Ph.D. in Computer Science from the University of Pisa, Italy, and a diploma in Electrical and Computer Engineering from the National Technical University of Athens, Greece. 

Undervisning

  • Courses:
    • FODS: Foundations of Data Science (Fall I 2024/25)
    • DAMI: Data Mining (Fall I 2023/24)
    • DSHI: Data Science for Health Informatics (Spring II 2023/24) 
  • Supervisor at the graduate level in Data Science and Health Informatics
  • PhD co-supervisor of:
    • Franco Rugolon, Ph.D. student in Data Science
    • Lena Mondrejevski, Ph.D. student in Data Science
    • Maria Bampa, Ph.D. student in Data Science

Forskning

My research interests lie in the fields of Data Science for Social Good, Nowcasting, and Forecasting, with the use of Big Data Analytics, Data Mining, and Machine Learning. Using Big Data deriving from everyday life as external proxies, it is possible to nowcast and forecast the evolution of phenomena whose study relies only on historical data or data that come with a significant lag. I work mainly on epidemics, healthcare, peace, and sentiment.  

Forskningsprojekt

Publikationer

I urval från Stockholms universitets publikationsdatabas

  • Glacier: guided locally constrained counterfactual explanations for time series classification

    2024. Zhendong Wang (et al.). Machine Learning

    Artikel

    In machine learning applications, there is a need to obtain predictive models of high performance and, most importantly, to allow end-users and practitioners to understand and act on their predictions. One way to obtain such understanding is via counterfactuals, that provide sample-based explanations in the form of recommendations on which features need to be modified from a test example so that the classification outcome of a given classifier changes from an undesired outcome to a desired one. This paper focuses on the domain of time series classification, more specifically, on defining counterfactual explanations for univariate time series. We propose Glacier, a model-agnostic method for generating locally-constrained counterfactual explanations for time series classification using gradient search either on the original space or on a latent space that is learned through an auto-encoder. An additional flexibility of our method is the inclusion of constraints on the counterfactual generation process that favour applying changes to particular time series points or segments while discouraging changing others. The main purpose of these constraints is to ensure more reliable counterfactuals, while increasing the efficiency of the counterfactual generation process. Two particular types of constraints are considered, i.e., example-specific constraints and global constraints. We conduct extensive experiments on 40 datasets from the UCR archive, comparing different instantiations of Glacier against three competitors. Our findings suggest that Glacier outperforms the three competitors in terms of two common metrics for counterfactuals, i.e., proximity and compactness. Moreover, Glacier obtains comparable counterfactual validity compared to the best of the three competitors. Finally, when comparing the unconstrained variant of Glacier to the constraint-based variants, we conclude that the inclusion of example-specific and global constraints yields a good performance while demonstrating the trade-off between the different metrics.

    Läs mer om Glacier: guided locally constrained counterfactual explanations for time series classification
  • Counterfactual Explanations for Time Series Forecasting

    2024. Zhendong Wang (et al.). 2023 IEEE International Conference on Data Mining (ICDM), 1391-1396

    Konferens

    Among recent developments in time series forecasting methods, deep forecasting models have gained popularity as they can utilize hidden feature patterns in time series to improve forecasting performance. Nevertheless, the majority of current deep forecasting models are opaque, hence making it challenging to interpret the results. While counterfactual explanations have been extensively employed as a post-hoc approach for explaining classification models, their application to forecasting models still remains underexplored. In this paper, we formulate the novel problem of counterfactual generation for time series forecasting, and propose an algorithm, called ForecastCF, that solves the problem by applying gradient-based perturbations to the original time series. The perturbations are further guided by imposing constraints to the forecasted values. We experimentally evaluate ForecastCF using four state-of-the-art deep model architectures and compare to two baselines. ForecastCF outperforms the baselines in terms of counterfactual validity and data manifold closeness, while generating meaningful and relevant counterfactuals for various forecasting tasks.

    Läs mer om Counterfactual Explanations for Time Series Forecasting
  • Ijuice: integer JUstIfied counterfactual explanations

    2024. Alejandro Kuratomi Hernandez (et al.). Machine Learning

    Artikel

    Counterfactual explanations modify the feature values of an instance in order to alter its prediction from an undesired to a desired label. As such, they are highly useful for providing trustworthy interpretations of decision-making in domains where complex and opaque machine learning algorithms are utilized. To guarantee their quality and promote user trust, they need to satisfy the faithfulness desideratum, when supported by the data distribution. We hereby propose a counterfactual generation algorithm for mixed-feature spaces that prioritizes faithfulness through k-justification, a novel counterfactual property introduced in this paper. The proposed algorithm employs a graph representation of the search space and provides counterfactuals by solving an integer program. In addition, the algorithm is classifier-agnostic and is not dependent on the order in which the feature space is explored. In our empirical evaluation, we demonstrate that it guarantees k-justification while showing comparable performance to state-of-the-art methods in feasibility, sparsity, and proximity.

    Läs mer om Ijuice: integer JUstIfied counterfactual explanations
  • ORANGE: Opposite-label soRting for tANGent Explanations in heterogeneous spaces

    2023. Alejandro Kuratomi Hernandez (et al.). 2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA), 1-10

    Konferens

    Most real-world datasets have a heterogeneous feature space composed of binary, categorical, ordinal, and continuous features. However, the currently available local surrogate explainability algorithms do not consider this aspect, generating infeasible neighborhood centers which may provide erroneous explanations. To overcome this issue, we propose ORANGE, a local surrogate explainability algorithm that generates highaccuracy and high-fidelity explanations in heterogeneous spaces. ORANGE has three main components: (1) it searches for the closest feasible counterfactual point to a given instance of interest by considering feasible values in the features to ensure that the explanation is built around the closest feasible instance and not any, potentially non-existent instance in space; (2) it generates a set of neighboring points around this close feasible point based on the correlations among features to ensure that the relationship among features is preserved inside the neighborhood; and (3) the generated instances are weighted, firstly based on their distance to the decision boundary, and secondly based on the disagreement between the predicted labels of the global model and a surrogate model trained on the neighborhood. Our extensive experiments on synthetic and public datasets show that the performance achieved by ORANGE is best-in-class in both explanation accuracy and fidelity.

    Läs mer om ORANGE
  • Early prediction of the risk of ICU mortality with Deep Federated Learning

    2023. Korbinian Robert Randl (et al.). 2023 IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS), 706-711

    Konferens

    Intensive Care Units usually carry patients with a serious risk of mortality. Recent research has shown the ability of Machine Learning to indicate the patients’ mortality risk and point physicians toward individuals with a heightened need for care. Nevertheless, healthcare data is often subject to privacy regulations and can therefore not be easily shared in order to build Centralized Machine Learning models that use the combined data of multiple hospitals. Federated Learning is a Machine Learning framework designed for data privacy that can be used to circumvent this problem. In this study, we evaluate the ability of deep Federated Learning to predict the risk of Intensive Care Unit mortality at an early stage. We compare the predictive performance of Federated, Centralized, and Local Machine Learning in terms of AUPRC, F1-score, and AUROC. Our results show that Federated Learning performs equally well as the centralized approach (for 2, 4, and 8 clients) and is substantially better than the local approach, thus providing a viable solution for early Intensive Care Unit mortality prediction. In addition, we demonstrate that the prediction performance is higher when the patient history window is closer to discharge or death. Finally, we show that using the F1-score as an early stopping metric can stabilize and increase the performance of our approach for the task at hand.

    Läs mer om Early prediction of the risk of ICU mortality with Deep Federated Learning
  • Predicting Drug Treatment for Hospitalized Patients with Heart Failure

    2023. Linyi Zhou, Ioanna Miliou. Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 275-290

    Konferens

    Heart failure and acute heart failure, the sudden onset or worsening of symptoms related to heart failure, are leading causes of hospital admission in the elderly. Treatment of heart failure is a com- plex problem that needs to consider a combination of factors such as clinical manifestation and comorbidities of the patient. Machine learning approaches exploiting patient data may potentially improve heart failure patients disease management. However, there is a lack of treatment prediction models for heart failure patients. Hence, in this study, we propose a workflow to stratify patients based on clinical features and predict the drug treatment for hospitalized patients with heart failure. Initially, we train the k-medoids and DBSCAN clustering methods on an extract from the MIMIC III dataset. Subsequently, we carry out a multi-label treatment prediction by assigning new patients to the pre-defined clusters. The empirical evaluation shows that k-medoids and DBSCAN successfully identify patient subgroups, with different treatments in each subgroup. DSBCAN outperforms k-medoids in patient stratification, yet the performance for treatment prediction is similar for both algorithms. Therefore, our work supports that clustering algorithms, specifically DBSCAN, have the potential to successfully perform patient profiling and predict individualized drug treatment for patients with heart failure.

    Läs mer om Predicting Drug Treatment for Hospitalized Patients with Heart Failure
  • JUICE: JUstIfied Counterfactual Explanations

    2022. Alejandro Kuratomi Hernandez (et al.). Discovery Science, 493-508

    Konferens

    Complex, highly accurate machine learning algorithms support decision-making processes with large and intricate datasets. However, these models have low explainability. Counterfactual explanation is a technique that tries to find a set of feature changes on a given instance to modify the models prediction output from an undesired to a desired class. To obtain better explanations, it is crucial to generate faithful counterfactuals, supported by and connected to observations and the knowledge constructed on them. In this study, we propose a novel counterfactual generation algorithm that provides faithfulness by justification, which may increase developers and users trust in the explanations by supporting the counterfactuals with a known observation. The proposed algorithm guarantees justification for mixed-features spaces and we show it performs similarly with respect to state-of-the-art algorithms across other metrics such as proximity, sparsity, and feasibility. Finally, we introduce the first model-agnostic algorithm to verify counterfactual justification in mixed-features spaces.

    Läs mer om JUICE
  • FLICU: A Federated Learning Workflow for Intensive Care Unit Mortality Prediction

    2022. Lena Mondrejevski (et al.). 2022 IEEE 35th International Symposium on Computer-Based Medical Systems (CBMS), 32-37

    Konferens

    Although Machine Learning can be seen as a promising tool to improve clinical decision-making, it remains limited by access to healthcare data. Healthcare data is sensitive, requiring strict privacy practices, and typically stored in data silos, making traditional Machine Learning challenging. Federated Learning can counteract those limitations by training Machine Learning models over data silos while keeping the sensitive data localized. This study proposes a Federated Learning workflow for Intensive Care Unit mortality prediction. Hereby, the applicability of Federated Learning as an alternative to Centralized Machine Learning and Local Machine Learning is investigated by introducing Federated Learning to the binary classification problem of predicting Intensive Care Unit mortality. We extract multivariate time series data from the MIMIC-III database (lab values and vital signs), and benchmark the predictive performance of four deep sequential classifiers (FRNN, LSTM, GRU, and 1DCNN) varying the patient history window lengths (8h, 16h, 24h, and 48h) and the number of Federated Learning clients (2, 4, and 8). The experiments demonstrate that both Centralized Machine Learning and Federated Learning are comparable in terms of AUPRC and F1-score. Furthermore, the federated approach shows superior performance over Local Machine Learning. Thus, Federated Learning can be seen as a valid and privacy-preserving alternative to Centralized Machine Learning for classifying Intensive Care Unit mortality when the sharing of sensitive patient data between hospitals is not possible.

    Läs mer om FLICU
  • Impact of Dimensionality on Nowcasting Seasonal Influenza with Environmental Factors

    2022. Stefany Guarnizo, Ioanna Miliou, Panagiotis Papapetrou. Advances in Intelligent Data Analysis XX, 128-142

    Konferens

    Seasonal influenza is an infectious disease of multi-causal etiology and a major cause of mortality worldwide that has been associated with environmental factors. In the attempt to model and predict future outbreaks of seasonal influenza with multiple environmental factors, we face the challenge of increased dimensionality that makes the models more complex and unstable. In this paper, we propose a nowcasting and forecasting framework that compares the theoretical approaches of Single Environmental Factor and Multiple Environmental Factors. We introduce seven solutions to minimize the weaknesses associated with the increased dimensionality when predicting seasonal influenza activity level using multiple environmental factors as external proxies. Our work provides evidence that using dimensionality reduction techniques as a strategy to combine multiple datasets improves seasonal influenza forecasting without the penalization of increased dimensionality.

    Läs mer om Impact of Dimensionality on Nowcasting Seasonal Influenza with Environmental Factors
  • Understanding peace through the world news

    2022. Vasiliki Voukelatou (et al.). EPJ Data Science 11 (1)

    Artikel

    Peace is a principal dimension of well-being and is the way out of inequity and violence. Thus, its measurement has drawn the attention of researchers, policymakers, and peacekeepers. During the last years, novel digital data streams have drastically changed the research in this field. The current study exploits information extracted from a new digital database called Global Data on Events, Location, and Tone (GDELT) to capture peace through the Global Peace Index (GPI). Applying predictive machine learning models, we demonstrate that news media attention from GDELT can be used as a proxy for measuring GPI at a monthly level. Additionally, we use explainable AI techniques to obtain the most important variables that drive the predictions. This analysis highlights each country’s profile and provides explanations for the predictions, and particularly for the errors and the events that drive these errors. We believe that digital data exploited by researchers, policymakers, and peacekeepers, with data science tools as powerful as machine learning, could contribute to maximizing the societal benefits and minimizing the risks to peace.

    Läs mer om Understanding peace through the world news
  • Measuring objective and subjective well-being: dimensions and data sources

    2021. Vasiliki Voukelatou (et al.). International journal of data science and analytics 11, 279-309

    Artikel

    Well-being is an important value for people’s lives, and it could be considered as an index of societal progress. Researchers have suggested two main approaches for the overall measurement of well-being, the objective and the subjective well-being. Both approaches, as well as their relevant dimensions, have been traditionally captured with surveys. During the last decades, new data sources have been suggested as an alternative or complement to traditional data. This paper aims to present the theoretical background of well-being, by distinguishing between objective and subjective approaches, their relevant dimensions, the new data sources used for their measurement and relevant studies. We also intend to shed light on still barely unexplored dimensions and data sources that could potentially contribute as a key for public policing and social development.

    Läs mer om Measuring objective and subjective well-being
  • Sentiment Nowcasting during the COVID-19 Pandemic

    2021. Ioanna Miliou, Ioannis Pavlopoulos, Panagiotis Papapetrou. Discovery Science, 218-228

    Konferens

    In response to the COVID-19 pandemic, governments around the world are taking a wide range of measures. Previous research on COVID-19 has focused on disease spreading, epidemic curves, measures to contain it, confirmed cases, and deaths. In this work, we sought to explore another essential aspect of this pandemic, how do people feel and react to this reality and the impact on their emotional well-being. For that reason, we propose using epidemic indicators and government policy responses to estimate the sentiment, as this is expressed on Twitter. We develop a nowcasting approach that exploits the time series of epidemic indicators and the measures taken in response to the COVID-19 outbreak in the United States of America to predict the public sentiment at a daily frequency. Using machine learning models, we improve the short-term forecasting accuracy of autoregressive models, revealing the value of incorporating the additional data in the predictive models. We then provide explanations to the indicators and measures that drive the predictions for specific dates. Our work provides evidence that data about the way COVID-19 evolves along with the measures taken in response to the COVID-19 outbreak can be used effectively to improve sentiment nowcasting and gain insights into people’s current emotional state.

    Läs mer om Sentiment Nowcasting during the COVID-19 Pandemic

Visa alla publikationer av Ioanna Miliou vid Stockholms universitet