Jonathan RebanePhD student
About me
I am focused on the development and application of AI methods for heterogeneous and temporally evolving data. My main application domain is healthcare, including the detection/prediction of adverse drug events from electronic healthcare records (EHRs).
Predicting future medical events from patient EHR data presents a unique opportunity as such data contains a wealth of longitudinal and varied sources such as text, categorical, and numerical medical features. How to best exploit this data in order to generate accurate predictions using AI is a core part of my research.
Particularly within healthcare there is also need for AI models that are both well-performing and interpretable to help provide explanations of a model's decisions that stakeholders can trust and take appropriate actions with. As such, a key part of my research is to develop and investigate Explainable AI methods.
Publications
A selection from Stockholm University publication database
-
Exploiting complex medical data with interpretable deep learning for adverse drug event prediction
2020. Jonathan Rebane, Isak Samsten, Panagiotis Papapetrou. Artificial Intelligence in Medicine 109
ArticleA variety of deep learning architectures have been developed for the goal of predictive modelling and knowledge extraction from medical records. Several models have placed strong emphasis on temporal attention mechanisms and decay factors as a means to include highly temporally relevant information regarding the recency of medical event occurrence while facilitating medical code-level interpretability. In this study we utilise such models with a large Electronic Patient Record (EPR) data set consisting of diagnoses, medication, and clinical text data for the purpose of adverse drug event (ADE) prediction. The first contribution of this work is an empirical evaluation of two state-of-the-art medical-code based models in terms of objective performance metrics for ADE prediction on diagnosis and medication data. Secondly, as an extension of previous work, we augment an interpretable deep learning architecture to permit numerical risk and clinical text features and demonstrate how this approach yields improved predictive performance compared to the other baselines. Finally, we assess the importance of attention mechanisms in regards to their usefulness for medical code-level and text-level interpretability, which may facilitate novel insights pertaining to the nature of ADE occurrence within the health care domain.
-
Locally and globally explainable time series tweaking
2020. Isak Karlsson (et al.). Knowledge and Information Systems 62 (5), 1671-1700
ArticleTime series classification has received great attention over the past decade with a wide range of methods focusing on predictive performance by exploiting various types of temporal features. Nonetheless, little emphasis has been placed on interpretability and explainability. In this paper, we formulate the novel problem of explainable time series tweaking, where, given a time series and an opaque classifier that provides a particular classification decision for the time series, we want to find the changes to be performed to the given time series so that the classifier changes its decision to another class. We show that the problem is NP -hard, and focus on three instantiations of the problem using global and local transformations. In the former case, we investigate the k-nearest neighbor classifier and provide an algorithmic solution to the global time series tweaking problem. In the latter case, we investigate the random shapelet forest classifier and focus on two instantiations of the local time series tweaking problem, which we refer to as reversible and irreversible time series tweaking, and propose two algorithmic solutions for the two problems along with simple optimizations. An extensive experimental evaluation on a variety of real datasets demonstrates the usefulness and effectiveness of our problem formulation and solutions.
-
SMILE
2020. Jonathan Rebane (et al.). Data mining and knowledge discovery
ArticleIn this paper, we study the problem of classification of sequences of temporal intervals. Our main contribution is a novel framework, which we call SMILE, for extracting relevant features from interval sequences to construct classifiers.SMILE introduces the notion of utilizing random temporal abstraction features, we define as e-lets, as a means to capture information pertaining to class-discriminatory events which occur across the span of complete interval sequences. Our empirical evaluation is applied to a wide array of benchmark data sets and fourteen novel datasets for adverse drug event detection. We demonstrate how the introduction of simple sequential features, followed by progressively more complex features each improve classification performance. Importantly, this investigation demonstrates that SMILE significantly improves AUC performance over the current state-of-the-art. The investigation also reveals that the selection of underlying classification algorithm is important to achieve superior predictive performance, and how the number of features influences the performance of our framework.
-
A classification framework for exploiting sparse multi-variate temporal features with application to adverse drug event detection in medical records
2019. Francesco Bagattini (et al.). BMC Medical Informatics and Decision Making 19
ArticleBackground: Adverse drug events (ADEs) as well as other preventable adverse events in the hospital setting incur a yearly monetary cost of approximately $3.5 billion, in the United States alone. Therefore, it is of paramount importance to reduce the impact and prevalence of ADEs within the healthcare sector, not only since it will result in reducing human suffering, but also as a means to substantially reduce economical strains on the healthcare system. One approach to mitigate this problem is to employ predictive models. While existing methods have been focusing on the exploitation of static features, limited attention has been given to temporal features.
Methods: In this paper, we present a novel classification framework for detecting ADEs in complex Electronic health records (EHRs) by exploiting the temporality and sparsity of the underlying features. The proposed framework consists of three phases for transforming sparse and multi-variate time series features into a single-valued feature representation, which can then be used by any classifier. Moreover, we propose and evaluate three different strategies for leveraging feature sparsity by incorporating it into the new representation.
Results: A large-scale evaluation on 15 ADE datasets extracted from a real-world EHR system shows that the proposed framework achieves significantly improved predictive performance compared to state-of-the-art. Moreover, our framework can reveal features that are clinically consistent with medical findings on ADE detection.
Conclusions: Our study and experimental findings demonstrate that temporal multi-variate features of variable length and with high sparsity can be effectively utilized to predict ADEs from EHRs. Two key advantages of our framework are that it is method agnostic, i.e., versatile, and of low computational cost, i.e., fast; hence providing an important building block for future exploitation within the domain of machine learning from EHRs.
-
Mining disproportional frequent arrangements of event intervals for investigating adverse drug events
2020. Zed Lee, Jonathan Rebane, Panagiotis Papapetrou. 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), 289-292
ConferenceAdverse drug events are pervasive and costly medical conditions, in which novel research approaches are needed to investigate the nature of such events further and ultimately achieve early detection and prevention. In this paper, we seek to characterize patients who experience an adverse drug event, represented as a case group, by contrasting them to similar control group patients who do not experience such an event. To achieve this goal, we utilize an extensive electronic patient record database and apply a combination of frequent arrangement mining and disproportionality analysis. Our results have identified how several adverse drug events are characterized in regards to frequent disproportional arrangements, where we highlight how such arrangements can provide additional temporal-based information compared to similar approaches.
-
An Investigation of Interpretable Deep Learning for Adverse Drug Event Prediction
2019. Jonathan Rebane, Isak Karlsson, Panagiotis Papapetrou. 2019 IEEE 32nd International Symposium on Computer-Based Medical Systems
ConferenceA variety of deep learning architectures have been developed for the goal of predictive modelling in regards to detecting health diagnoses in medical records. Several models have placed strong emphases on temporal attention mechanisms and decay factors as a means to include highly temporally relevant information regarding the recency of medical event occurrence while facilitating medical code-level interpretability. In this study we utilise such models with a novel Electronic Patient Record (EPR) data set consisting of both diagnoses and medication data for the purpose of Adverse Drug Event (ADE) prediction. As such, a main contribution of this work is an empirical evaluation of two state-of-the-art deep learning architectures in terms of objective performance metrics for ADE prediction. We also assess the importance of attention mechanisms in regards to their usefulness for medical code-level interpretability, which may facilitate novel insights pertaining to the nature of ADE occurrence within the health care domain.
-
Explainable time series tweaking via irreversible and reversible temporal transformations
2018. Isak Karlsson (et al.). 2018 IEEE International Conference on Data Mining (ICDM), 207-216
ConferenceTime series classification has received great attention over the past decade with a wide range of methods focusing on predictive performance by exploiting various types of temporal features. Nonetheless, little emphasis has been placed on interpretability and explainability. In this paper, we formulate the novel problem of explainable time series tweaking, where, given a time series and an opaque classifier that provides a particular classification decision for the time series, we want to find the minimum number of changes to be performed to the given time series so that the classifier changes its decision to another class. We show that the problem is NP-hard, and focus on two instantiations of the problem, which we refer to as reversible and irreversible time series tweaking. The classifier under investigation is the random shapelet forest classifier. Moreover, we propose two algorithmic solutions for the two problems along with simple optimizations, as well as a baseline solution using the nearest neighbor classifier. An extensive experimental evaluation on a variety of real datasets demonstrates the usefulness and effectiveness of our problem formulation and solutions.
-
Seq2Seq RNNs and ARIMA models for Cryptocurrency Prediction
2018. Jonathan Rebane (et al.). Proceedings of SIGKDD Workshop on Fintech (SIGKDD Fintech’18)
ConferenceCyrptocurrency price prediction has recently become an alluring topic, attracting massive media and investor interest. Traditional models, such as Autoregressive Integrated Moving Average models (ARIMA) and models with more modern popularity, such as Recurrent Neural Networks (RNN’s) can be considered candidates for such financial prediction problems, with RNN’s being capable of utilizing various endogenous and exogenous input sources. This study compares the model performance of ARIMA to that of a seq2seq recurrent deep multi-layer neural network (seq2seq) utilizing a varied selection of inputs types. The results demonstrate superior performance of seq2seq over ARIMA, for models generated throughout most of bitcoin price history, with additional data sources leading to better performance during less volatile price periods.
-
Learning from Administrative Health Registries
2017. Jonathan Rebane (et al.). SoGood 2017: Data Science for Social Good
ConferenceOver the last decades the healthcare domain has seen a tremendous increase and interest in methods for making inference about patient care using large quantities of medical data. Such data is often stored in electronic health records and administrative health registries. As these data sources have grown increasingly complex, with millions of patients represented by thousands of attributes, static or time evolving, finding relevant and accurate patterns that can be used for predictive or descriptive modelling is impractical for human experts. In this paper, we concentrate our review on Swedish Administrative Health Registries (AHRs) and Electronic Health Records (EHRs) and provide an overview of recent and ongoing work in the area with focus on adverse drug events (ADEs) and heart failure.
Show all publications by Jonathan Rebane at Stockholm University