Thashmee Karunaratne

Thashmee Karunaratne


Visa sidan på svenska
Works at Department of Computer and Systems Sciences
Telephone 08-16 16 05
Visiting address Nodhuset, Borgarfjordsgatan 12
Postal address Institutionen för data- och systemvetenskap 164 07 Kista

About me

I am a project leader and associate professor at the Department of  Computer and Systems Sciences (DSV), and currently conducting research in the field of Learning analytics and educational data mining. In DSV I am within the ICT for Development group. The main focus of my research is on how to use the log data in the e-learning systems as another dimension for student and course evaluations, mainly for online courses. I am also involving in investigations on how and in which ways ICT has been useful to bridge in communication and collaboration gaps. I manage few projects within the eGov lab, EU funded DG CONNECT project SkillsMarch (Coordinator) and an H2020 project Digital Europe for all (DE4A) (WP leader).  Key projects managed previously includes a Sida-funded project Capacity building in higher education in Rwanda,  EU/EP7 funded H2020-MSCA-RISE-2014  project on Transitional Diaspora Entrepreneurship as a development link between home and resident countries.  I also teach various courses and supervise students at DSV.


Personal web:


A selection from Stockholm University publication database
  • 2020. Sindre Gjøystdal, Thashmee Karunaratne. International Journal of Information Technology Project Management 11 (3), 95-106

    Building self-organizing teams in agile projects is considered an important job for project leaders. However, the reality is that building self-organized teams lacks focus as many go back to managing tasks because it is more concrete and tangible. While there are an excessive number of studies proving that developing self-organized teams has a positive contribution to project success, there is a lack of knowledge about the consequences of not doing it. This study, therefore, explores the impact inadequate self-organizing teams has on agile project success. Results have identified five failure areas in a self-organizing team that have a negative impact on three success factors in agile projects. Due to a weak direct link between success factors and success criteria, conclusions are limited to a universally applicable impact on success factors. Further research is recommended to generate a universal checklist for success criteria in agile projects that can have a direct link to the identified success factors.

  • 2020. Myrsini Glinos, Maria Petritsopoulou, Thashmee Karunaratne. ECEL 2020 19th European Conference on e-Learning, 406-414

    Non-cognitive skills (NCS) are essential in increasing employability according to contemporary studies, but improving NC skills are less prioritised in typical education curricula. Work-based training programs and other upskilling programs offered in lifelong learning settings compensate for the gap of NCS requirements. However, standardised learning opportunities are in demand to increase the chances to compete in the labour market as highlighted by the Skills Agenda of the European Union. This article explores the learner's perceptions of a purpose-built LMS to support upskilling with NCS. The LMS consists of structure and content to learn, assess, and map the NCS skills with the learner's favorable occupations. This learning solution advances from the current education systems by strictly focussing on job-oriented learning, which means training the learner with a specific set of skills aiming at a selected job profile. The learner’s intention for learning NCS with the integrated LMS is tested with purposely selected sixty-one learners in three countries, Sweden, Spain, and Ireland. Based on the model for diffusion of innovations a baseline for investigation was created. Mapping the relative advantage, compatibility, complexity, trialability and observability of the LMS provided a comprehensive picture of the maturity of the learning solution. A survey questionnaire is used for capturing user perceptions. Accordingly, 75% of the learners on average perceived the system as a learner-centric online digital solution that allows learning the definitions of NCS, as well as learning the NCS sets required by each occupation categorised according to the European occupation frameworks. The study outcomes included insights that could be concluded as recommendations for a learner-centric robust approach to learning NCS.

  • 2020. Nina Bergdahl (et al.). International journal of learning analytics and artificial intelligence for education 2 (2), 46-79

    Learning Analytics (LA) approaches in Blended Learning (BL) research is becoming an established field. In the light of previous critiqued toward LA for not being grounded in theory, the General Data Protection and a renewed focus on individuals’ integrity, this review aims to explore the use of theories, the methodological and analytic approaches in educational settings, along with surveying ethical and legal considerations. The review also maps and explores the outcomes and discusses the pitfalls and potentials currently seen in the field. Journal articles and conference papers were identified through systematic search across relevant databases. 70 papers met the inclusion criteria: they applied LA within a BL setting, were peer-reviewed, full-papers, and if they were in English. The results reveal that the use of theoretical and methodological approaches was disperse, we identified approaches of BL not included in categories of BL in existing BL literature and suggest these may be referred to as hybrid blended learning, that ethical considerations and legal requirements have often been overlooked. We highlight critical issues that contribute to raise awareness and inform alignment for future research to ameliorate diffuse applications within the field of LA.

  • 2019. Thashmee Karunaratne, Pooyeh Mobini. Proceeedings of the 18th European Conference on e-Learning, 276-283

    Compared to traditional work-based skills development programs, commissioned education is an attractive solution for working professionals to develop skills to better fit into their constantly reshaping job profiles while acquiring a higher educational qualification. Commissioned programs, on the other hand, typically require a high level of commitment, such as constant engagement in learning and rigorous assessment of knowledge. However, comparatively less light is shed on the designs and impacts of commissioned education than the other lifelong learning and professional development methods. This paper, therefore, presents a systematic empirical study on design, execution and evaluation of commissioned education based on a program at masters’ level for middle and senior project managers offered for seven years at the Department of Computer and Systems Sciences, Stockholm University. The program is evaluated at the end of every academic year, and the reforms are duly implemented. Impact of these reforms is systematically evaluated based on perceptions of the teachers and students of the program. Blended form of courses offering was voted best in contrast of complete online and face-to-face forms, with a structure of classroom meetings in the beginning and the end of a course, and synchronised online meetings for formative assessments. Didactical indicators included problem-based learning approaches that tights the workplace problems into course assignments and formative discussions. Need for pre-planning with adequate information about the course workload and deadlines, increased communication between stakeholders, flexibility and efficiency in the course offers are identified as essential success factors. Emotional support from the family is also recognized as an equally important factor in adult learning. A formal education qualification such as a master’s degree is a difficult goal to achieve in one step, and should ideally be achieved by aggregating short term goals such as certifications of shorter durations, according to the outcome of the study.

  • 2019. Mikko Apiola (et al.). 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)

    Digital learning management systems (LMS) are revolutionizing learning in many areas, including computer science education (CSE). They are capable of tracking learners' characteristics, such as prior knowledge, and other learning habits, and may offer more personalized learning or guidance on useful learning practices. LMSs collect large amounts of data. Proper processing of such collected data can offer valuable insights about the learning process, support for higher quality education, insights on why some students drop out of courses, and so on. In this paper, we briefly review and discuss the global trends in digital learning and learning lnalytics (LA), specifically from the viewpoint of two LMS systems and related LA research, one in Finland and one in Sweden. In this paper, we address the context-, and course-specific nature of LA by developing the idea of cross-country and cross-systems learning analytics. Second, we consider our research especially from an educational perspective to identify the most beneficial practices for teachers and students. Third, we discuss, based on findings from our projects, future avenues for research.

  • 2019. Pooyeh Mobini, Thashmee Karunaratne. Proceeedings of the 18th European Conference on e-Learning, 396-405

    Non-cognitive skills (NCS) such as critical thinking, creativity, reliability, problem-solving, self-management, decision-making, and communication are the keys to a successful life and career development in a knowledge society. In contrast to hard skills or cognitive ability, which are the mainstream of formal and informal education, NCS is yet to be recognised and measured in academic curriculums, despite that many studies have shown the importance of NCS in building the character of a person. The system requirements and design architecture for an ICT solution are sought in this study that can quantify and assess NCS. A Design Science Approach (DSA) is followed in systematically eliciting the requirements and hence the functionalities and main modules of the ICT enabler. The resulting artefact includes the functionalities for a dashboard that the personal profile and skills of users can be visualised, and a matching system of NCS required for prospective occupations, together with a recommender system that can recommend courses to acquire NCS. The requirements were elicited and refined, starting in a co-creation session with project stakeholders from six European countries. The elicited requirements from the co-creation session were validated by pilots consisting of 190 participants from three European countries. Finally, the results were transformed into wireframes, and, ranked based on the perceptions of the end-user focus group participants, refined by the expertsin the field. The discovered system features include visualisation of skills profiles of individual users, with an ability to upgrade the profile autonomously as they complete courses or assessment tests in the platform. The ICT enabler is designed under a project of European Commission Directorate General for Communications Networks, Content & Technology (DG CONNECT), named SkillsMatch.

  • 2019. Jean Claude Byungura (et al.). Proceeedings of the 18th European Conference on e-Learning, 109-118

    Plagiarism has been a critical concern to consider by universities and research institutes worldwide for ensuring academic integrity. Even with the internet revolution, this academic dishonesty became increasingly overwhelming more especially due to easy access and use of online resources without acknowledging the original authors. Prior research explored several perspectives of plagiarism such as culture, language, internet technology, and policies from different institutional settings. However, little is known about plagiarism in the online Rwandan higher education context. The aim of this study is twofold. First it attempts to understand the tendencies of plagiarism through internet-based resources and then secondly to identify the contextual factors that contribute to plagiarism by students at University of Rwanda. Both undergraduate and master’s dissertations were randomly collected and analysed to ascertain the frequency of plagiarism tendencies from the text-based similarity indexes. In addition, open-ended interviews were conducted to 15 teachers and 15 students from UR colleges. Similarity indexes from Turnitin’s originality reports were used to determine the frequency of similarity indexes through computer-based text-matching process. Results indicated a highly critical rate of tendencies to plagiarism each referred type of plagiarism. Likewise, findings portrayed that for the sample of analysed documents, no thesis document is deemed genuine to fulfil the academic integrity. In addition, the frequency of similarity indexes of the texts matched from the analysed thesis documents and online databases is closely similar for both undergraduate and graduate students. Moreover, this study identified 17 factors contributing mostly to plagiarising through easy access to internet resources at this university. Among them, six reasons are related to social-cultural context, five to institutional context and the last six are attributed to individual factors. A holistic approach encompassing innovative, detective and preventive strategies is recommended in association with computer-supported tools to eradicate plagiarism at this institution. Further research can explore the adoption and use of computer-based text matching tools as an additional strategy for combating plagiarism in both public and private universities at national or regional level by comparing several institutions.

  • 2019. Thashmee Karunaratne (et al.). Proceeedings of the 18th European Conference on e-Learning, 284-293

    Successful implementation of e-learning environments requires adequate teacher training. In this study, a solution is sought for how teacher training programmes on newly introduced Learning Management Systems (LMS) can be designed in such a way that they would be inclusive of different levels of digital competence, supportive of individual teacher development and flexible enough to be applied in different institutional environments. Following the soft design science research methodology, two co-creation and co-design sessions, as well as eight in-depth interviews, were used for designing and testing a model for the process of teacher training in using LMS. The outcome of the study indicated the need of an agile-based approach as well as two kinds of segmentation of the training process based on the level of competence and on the position of the training process on the timeline. Besides, the cascading of teacher training was found to be a key approach to make it cost- and time-efficient. Moreover, the introduction of agility to the recursive process of training was found to strengthen consistent knowledge building in teachers. This has been illustrated in the form of a spiral model of knowledge construction, with teachers beginning to learn the basics and incrementally progressing to acquire advanced skills. The proposed process model allows academic institutions to mobilize the teaching task force by equipping them with technological knowledge systematically. This way, the level of expertise in using LMS can be developed incrementally and in synergy with the Pyramid Model of Digital literacy. The level of knowledge acquisition by teachers would correspond to the levels in the pyramid, and thereby, it would be easy to see them “climb up” the pyramid as they progress. A more thorough evaluation of the model using different cases and a careful revision of its components are considered as further steps in this research.

  • 2018. Thashmee Karunaratne. Electronic Journal of e-Learning 16 (2), 79-90

    The thesis component of a degree program is vital since the quality of it contributes to the quality of the whole degree. Maintaining the quality of the degree programs and handling the constantly increasing numbers of students entering higher education simultaneously is a challenge for many higher educational institutions. This paper presents a study of how ICT can be used to improve the quality and effectiveness of the thesis projects at Bachelors and Masters Levels. Further, how the blended model of supervision supports solving the issues of managing supervisor time efficiently and providing a quality guidance for thesis students are also explored. Supervisors’ perceptions of the ICT enabled thesis process are captured via interviews. Statistics about the completed theses and the user log data of the ICT system are triangulated to complement supervisor perceptions. Results revealed that the supervisors take advantage of the functions in the system to support improving the quality and the quantity of the theses, and the blended supervision model adapted in the thesis process support the supervisors to have a better collaboration with the students.

  • 2018. Thashmee Karunaratne, Ranil Peiris Colombage, Henrik Hansson. ijEDict - International Journal of Education and Development using Information and Communication Technology 14 (1), 118-140

    ICT and development implementations in developing regions depend on many factors. This paper summarises experiences of efforts made by twenty individuals when implementing small-scale ICT development projects in their organizations located in seven developing countries. The main focus of these projects was the use of ICT in educational settings. Challenges encountered and the contributing factors for implementation success of the projects are systematically investigated using interviews and follow up surveys. Results show that the typical limitations of technology and infrastructure were the key obstacles. The commitment of individual project managers in the role of “change agents” and organizational support were the strengths behind the success of the projects. Based on the outcome of this study, professional development of the change agents is a key factor for the success of projects. IT and infrastructure limitations contributed to the failure of the majority of the ICT related projects.

  • 2017. Thashmee Karunaratne, Henrik Hansson, Sten Holmberg.

    At universities students and supervisors strive for interesting and meaningful research problems to study in thesis projects. For students, finding a good topic for their thesis work is a lonely and tiresome process. A meaningful topic motivate supervisors to closely monitor and support students. The industry, state and the civil society where the interesting research problems exist on the other hand struggle to find appropriate expertise to solve their practical problems. Although connecting right stakeholders together fulfil the requirement of all the counterparts, such a collaboration is not as simple as it sounds. As Hansson (Hansson, et al., 2014) acknowledges, current efforts of connecting both the entities are mostly ad hoc and driven by private and personal networks. Many initiatives such as, Kista science city project (Kista Sceince City, 2017), Urban ICT arena (Urban ICT Arena, 2017), Samverkancheck (Collaboration) project at Stockholm University (Stockholm University, 2017) and many small scale collaboration initiatives of Ericsson and IBM with Swedish Universities can be named as few successful attempts. Silicon Valley in United States is another popular example for similar collaboration. With the use of information systems to manage and support activities, research collaborations are empowered by platforms that search and connect people for research and development purposes. Idea bank is a popular initiative for fulfilling this requirement. However, idea bank initiatives have limitations of connecting interested people to experts from a wider society. But there is a huge demand especially from developing regions, for new ideas which can be used as thesis works in higher education. A large number of students in these countries still struggle to find interesting and meaningful ideas to work with in their thesis work, while the society, state and industry are compelled for expensive problem solving approaches. Therefore a generic and globally reachable approach is required to fill in this gap of connecting required expertise with the needs. The Global idea bank approach is motivated by the exact requirement. The primary aim of this approach is to connect different counterparts in solving problems of general and practical interest. Thereby following objectives are set. 1. Developing an entry point for big and small ideas to be investigated by universities, 2. Creating supportive processes to facilitate communication among the stakeholders 3. Developing functions to create, select, sort, improve, refine, and filter ideas 4. Developing a gateway to connect the ideas to thesis support systems (eg. SciPro (Hansson & Moberg, 2011)) at universities 5. Facilitate the process with relevant information and -online templates for reporting and finance management

  • 2017. Thashmee Karunaratne, Henrik Hansson, Naghmeh Aghaee. Assessment in education

    Improving the quality of Bachelor’s and Master’s theses while at the same time increasing the number of theses without expanding the existing resources proportionately is a huge challenge faced by higher educational institutions. The aim of this study is to investigate the effect of multiple change processes on Bachelors and Masters level thesis work in a selected higher educational institution. The following research questions were studied: (1) How has the thesis quality changed? (2) How has the number of completed theses changed? and, (3) How has the ratio of completed theses per supervisor changed? The change processes were introduced into the thesis process in the Department of Computer and Systems Sciences (DSV), Stockholm University during 2008–2014. The results show that the quality and the number of completed theses have significantly increased. The multiple change processes including a purpose built ICT system named SciPro, which was introduced and improved incrementally during 2010–2014 are discussed and evaluated in relation to these results.

  • 2017. Thashmee Karunaratne, Jean Claude Byungura. IST-Africa 2017 Conference Proceedings
  • 2016. Jean Claude Byungura (et al.). IST-Africa 2016 Conference

    Integrating technology in pedagogy is a step for ICT capacity building for higher education to meet its current demands. Therefore, the integration of eLearning systems has been problematic, albeit huge investments in ICT infrastructure. This study investigates teacher adoption of a new upgraded eLearning platform being integrated at University of Rwanda. A six-constructs model related to technology adoption was used to design questionnaire and interviews. Closed and open-ended questions seeking perceptions on the UR eLearning environment were used on 87 respondents who were purposively selected. Findings indicate that although participants find the system useful, easy and trustworthy, the intention for adopting and using it is very low due managerial support and technical support. Gaps in policy synergy, incentives, basic infrastructure, managerial and technical support were among the identified bottlenecks contributing negatively to the low degree of teacher intention. The study concludes by proposing some remedies to address the above challenges.

  • 2016. Jean Claude Byungura (et al.). European Journal of Open, Distance and e-Learning 19 (2), 46-62

    With the development of technology in the 21st Century, education systems attempt to integrate technology-based tools to improve experiences in pedagogy and administration. It is becoming increasingly prominent to build human and ICT infrastructure capacities at universities from policy to implementation level. Using a critical discourse analysis, this study investigates the articulation of ICT capacity building strategies from both national and institutional ICT policies in Rwanda, focusing on the higher education. Eleven policy documents were collected and deeply analyzed to understand which claims of ICT capacity building are made. The analysis shows that strategies for building ICT capacities are evidently observed from national level policies and only in two institutional policies (KIST and NUR). Among 25 components of ICT capacity building used, the ones related to human capacity are not plainly described. Additionally, neither national nor institutional policy documents include the creation of financial schemes for students to acquire ICT tools whilst learners are key stakeholders. Although there is some translation of ICT capacity building strategies from national to some institutional policies, planning for motivation and provision of incentives to innovators is not stated in any of the institutional policies and this is a key to effective technology integration.

  • 2016. Naghmeh Aghaee (et al.). The International Review of Research in Open and Distributed Learning 17 (3), 360-383

    Many research studies have highlighted the low completion rate and slow progress in PhD education. Universities strive to improve throughput and quality in their PhD education programs. In this study, the perceived problems of PhD education are investigated from PhD students' points of view, and how an Information and Communication Technology Support System (ICTSS) may alleviate these problems. Data were collected through an online open questionnaire sent to the PhD students at the Department of (the institution's name has been removed during the double-blind review) with a 59% response rate. The results revealed a number of problems in the PhD education and highlighted how online technology can support PhD education and facilitate interaction and communication, affect the PhD students' satisfaction, and have positive impacts on PhD students' stress. A system was prototyped, in order to facilitate different types of online interaction through accessing a set of online and structured resources and specific communication channels. Although the number of informants was not large, the result of the study provided some rudimentary ideas that refer to interaction problems and how an online ICTSS may facilitate PhD education by providing distance and collaborative learning, and PhD students' self-managed communication.

  • 2015. Nam Aghaee (et al.). Proceedings of E-Learn, 237-244

    The low completion rate and slow progress in PhD education have been highlighted in many studies. However, the interaction problems and communication gaps that PhD students encounter make this attempt even more challenging. The aim of this study is to investigate the peer interaction problems and ICT based solutions from PhD students’ perspectives. The data collection method was an online questionnaire and in-depth interviews were used to follow up. The target group for the survey was the PhD students in Computer Science at Stockholm University. The total number of respondents for the survey was 53 PhD students and eleven randomly selected PhD students for the interviews. The results reflected a lack of peer interaction as an important issue in the perspective of the students. Based on this, the study showed several ICT solutions that have the potential to reduce the interaction problems and thereby improve PhD students’ collaborative learning and research quality.

  • 2015. Jean Claude Byungura, Henrik Hansson, Thashmee Karunaratne. Expanding Learning Scenarios - Opening Out the Educational Landscape, 64-64
  • 2014. Naghmeh Aghaee (et al.). DSV writers hut 2014, 33-40
  • 2014. Thashmee M. Karunaratne (et al.).

    Learning from graphs has become a popular research area due to the ubiquity of graph data representing web pages, molecules, social networks, protein interaction networks etc. However, standard graph learning approaches are often challenged by the computational cost involved in the learning process, due to the richness of the representation. Attempts made to improve their efficiency are often associated with the risk of degrading the performance of the predictive models, creating tradeoffs between the efficiency and effectiveness of the learning. Such a situation is analogous to an optimization problem with two objectives, efficiency and effectiveness, where improving one objective without the other objective being worse off is a better solution, called a Pareto improvement. In this thesis, it is investigated how to improve the efficiency and effectiveness of learning from graph data using pattern mining methods. Two objectives are set where one concerns how to improve the efficiency of pattern mining without reducing the predictive performance of the learning models, and the other objective concerns how to improve predictive performance without increasing the complexity of pattern mining. The employed research method mainly follows a design science approach, including the development and evaluation of artifacts. The contributions of this thesis include a data representation language that can be characterized as a form in between sequences and itemsets, where the graph information is embedded within items. Several studies, each of which look for Pareto improvements in efficiency and effectiveness are conducted using sets of small graphs. Summarizing the findings, some of the proposed methods, namely maximal frequent itemset mining and constraint based itemset mining, result in a dramatically increased efficiency of learning, without decreasing the predictive performance of the resulting models. It is also shown that additional background knowledge can be used to enhance the performance of the predictive models, without increasing the complexity of the graphs.

  • 2013. Thashmee Karunaratne, Henrik Boström, Ulf Norinder. Intelligent Data Analysis 17 (2), 327-341

    Quantitative structure-activity relationship (QSAR) models have gained popularity in the pharmaceutical industry due to their potential to substantially decrease drug development costs by reducing expensive laboratory and clinical tests. QSAR modeling consists of two fundamental steps, namely, descriptor discovery and model building. Descriptor discovery methods are either based on chemical domain knowledge or purely data-driven. The former, chemoinformatics-based, and the latter, substructures-based, methods for QSAR modeling, have been developed quite independently. As a consequence, evaluations involving both types of descriptor discovery method are rarely seen. In this study, a comparative analysis of chemoinformatics-based and substructure-based approaches is presented. Two chemoinformatics-based approaches; ECFI and SELMA, are compared to five approaches for substructure discovery; CP, graphSig, MFI, MoFa and SUBDUE, using 18 QSAR datasets. The empirical investigation shows that one of the chemo-informatics-based approaches, ECFI, results in significantly more accurate models compared to all other methods, when used on their own. Results from combining descriptor sets are also presented, showing that the addition of ECFI descriptors to any other descriptor set leads to improved predictive performance for that set, while the use of ECFI descriptors in many cases also can be improved by adding descriptors generated by the other methods.

  • 2012. Thashmee Karunaratne, Henrik Boström. 11th International Conference on Machine Learning and Applications (ICMLA), 409-414

    Standard graph learning approaches are often challenged by the computational cost involved when learning from very large sets of graph data. One approach to overcome this problem is to transform the graphs into less complex structures that can be more efficiently handled. One obvious potential drawback of this approach is that it may degrade predictive performance due to loss of information caused by the transformations. An investigation of the tradeoff between efficiency and effectiveness of graph learning methods is presented, in which state-of-the-art graph mining approaches are compared to representing graphs by itemsets, using frequent itemset mining to discover features to use in prediction models. An empirical evaluation on 18 medicinal chemistry datasets is presented, showing that employing frequent itemset mining results in significant speedups, without sacrificing predictive performance for both classification and regression.

  • 2011. Thashmee Karunaratne. ECML/PKDD 2011: Workshop of Collective Learning and Inference on Structured Data

    The recent studies of pattern mining have given more attention to discovering patterns that are interesting, significant, discriminative and so forth, than simply frequent. Does this imply that the frequent patterns are not useful anymore? In this paper we carry out a survey of frequent pattern mining and, using an empirical study, show how far the frequent pattern mining is useful in building predictive models.

  • 2010. Thashmee Karunaratne, Henrik Boström, Ulf Norinder. Ninth International Conference on Machine Learning and Applications (ICMLA), 2010, 828-833

    Graph propositionalization methods can be used to transform structured and relational data into fixed-length feature vectors, enabling standard machine learning algorithms to be used for generating predictive models. It is however not clear how well different propositionalization methods work in conjunction with different standard machine learning algorithms. Three different graph propositionalization methods are investigated in conjunction with three standard learning algorithms: random forests, support vector machines and nearest neighbor classifiers. An experiment on 21 datasets from the domain of medicinal chemistry shows that the choice of propositionalization method may have a significant impact on the resulting accuracy. The empirical investigation further shows that for datasets from this domain, the use of the maximal frequent item set approach for propositionalization results in the most accurate classifiers, significantly outperforming the two other graph propositionalization methods considered in this study, SUBDUE and MOSS, for all three learning methods.

  • 2009. Thashmee Karunaratne, Henrik Boström. The Eighth International Conference on Machine Learning and Applications, 196-201

    Graph propositionalization methods transform structured and relational data into a fixed-length feature vector format that can be used by standard machine learning methods. However, the choice of propositionalization method may have a significant impact on the performance of the resulting classifier. Six different propositionalization methods are evaluated when used in conjunction with random forests. The empirical evaluation shows that the choice of propositionalization method has a significant impact on the resulting accuracy for structured data sets. The results furthermore show that the maximum frequent itemset approach and a combination of this approach and maximal common substructures turn out to be the most successful propositionalization methods for structured data, each significantly outperforming the four other considered methods.

  • 2008. Thashmee Karunaratne, Henrik Boström. Trends in Intelligent Systems and Computer Engineering, 141-153

    Typical machine learning systems often use a set of previous experiences (examples) to learn concepts, patterns, or relations hidden within the data [1]. Current machine learning approaches are challenged by the growing size of the data repositories and the growing complexity of those data [1, 2]. In order to accommodate the requirement of being able to learn from complex data, several methods have been introduced in the field of machine learning [2]. Based on the way the input and resulting hypotheses are represented, two main categories of such methods exist, namely, logic-based and graph-based methods [3]. The demarcation line between logic- and graph-based methods lies in the differences of their data representation methods, hypothesis formation, and testing as well as the form of the output produced.

    The main purpose of our study is to investigate the effect of incorporating background knowledge into graph learning methods. The ability of graph learning methods to obtain accurate theories with a minimum of background knowledge is of course a desirable property, but not being able to effectively utilize additional knowledge that is available and has been proven important is clearly a disadvantage. Therefore we examine how far additional, already available, background knowledge can be effectively used for increasing the performance of a graph learner. Another contribution of our study is that it establishes a neutral ground to compare classifi- cation accuracies of the two closely related approaches, making it possible to study whether graph learning methods actually would outperform ILP methods if the same background knowledge were utilized [9].

    The rest of this chapter is organized as follows. The next section discusses related work concerning the contribution of background knowledge when learning from complex data. Section 10.3 provides a description of the graph learning method that is used in our study. The experimental setup, empirical evaluation, and the results from the study are described in Sect. 10.4. Finally, Sect. 10.5 provides conclusions from the experiments and points out interesting extensions of the work reported in this study.

  • 2007. Thashmee Karunaratne, Henrik Boström. IMECS 2007, 153-157

    Incorporating background knowledge in the learning process is proven beneficial for numerous applications of logic based learning methods. Yet the effect of background knowledge in graph based learning is not systematically explored. This paper describes and demonstrates the first step in this direction and elaborates on how additional relevant background knowledge could be used to improve the predictive performance of a graph learner. A case study in chemoinformatics is undertaken in this regard in which various types of background knowledge are encoded in graphs that are given as input to a graph learner. It is shown that the type of background knowledge encoded indeed has an effect on the predictive performance, and it is concluded that encoding appropriate background knowledge can be more important than the choice of the graph learning algorithm.

  • 2006. Thashmee Karunaratne, Henrik Boström. Proceedings of World Academy of Science, Engineering and Technology 15, 49-51

    Logic based methods for learning from structured data is limited w.r.t. handling large search spaces, preventing large-sized substructures from being considered by the resulting classifiers. A novel approach to learning from structured data is introduced that employs a structure transformation method, called finger printing, for addressing these limitations. The method, which generates features corresponding to arbitrarily complex substructures, is implemented in a system, called DIFFER. The method is demonstrated to perform comparably to an existing state-of-art method on some benchmark data sets without requiring restrictions on the search space. Furthermore, learning from the union of features generated by finger printing and the previous method outperforms learning from each individual set of features on all benchmark data sets, demonstrating the benefit of developing complementary, rather than competing, methods for structure classification.

  • 2006. Thashmee Karunaratne, Henrik Boström. Proceedings of the Ninth Scandinavian Conference on Artificial Intelligence (SCAI 2006), 120-126
  • 2006. Thashmee Karunaratne, Henrik Boström. Proceedings of the Second IASTED International Conference on Computational Intelligence

    Existing methods for learning from structured data are limited with respect to handling large or isolated substructures and also impose constraints on search depth and induced structure length. An approach to learning from structured data using a graph based propositionalization method, called finger printing, is introduced that addresses the limitations of current methods. The method is implemented in a system called DIFFER, which is demonstrated to compare favorable to existing state-of-art methods on some benchmark data sets. It is shown that further improvements can be obtained by combining the features generated by finger printing with features generated by previous methods.

Show all publications by Thashmee Karunaratne at Stockholm University

Last updated: May 4, 2021

Bookmark and share Tell a friend