Disputation: Sampath Deegalla

Disputation

Datum: tisdag 5 mars 2024

Tid: 13.00 – 17.00

Plats: Lilla Hörsalen, DSV, Borgarfjordsgatan 12, Kista

Välkommen till en disputation på DSV! Sampath Deegalla presenterar sin avhandling som handlar om att bearbeta data med en så kallad ”nearest neighbor”-metod.

5 mars 2024 presenterar Sampath Deegalla sin doktorsavhandling på Institutionen för data- och systemvetenskap (DSV) vid Stockholms universitet. Titeln är ”Nearest Neighbor Classification in High Dimensions”.

Disputationen genomförs i DSVs lokaler i Kista, med start klockan 13.00.
Hitta till DSV

Ladda ner avhandlingen från Diva

Doktorand: Sampath Deegalla, DSV
Opponent: Professor Slawomir Nowaczyk, Högskolan i Halmstad
Huvudhandledare: Professor Henrik Boström, KTH
Handledare: Professor Keerthi Walgama, University of Peradeniya, Sri Lanka

Kontaktuppgifter till Sampath Deegalla

 

Sammanfattning (på engelska)

The simple k nearest neighbor (kNN) method can be used to learn from high dimensional data such as images and microarrays without any modification to the original version of the algorithm. However, studies show that kNN’s accuracy is often poor in high dimensions due to the curse of dimensionality; a large number of instances are required to maintain a given level of accuracy in high dimensions. Furthermore, distance measurements such as the Euclidean distance may be meaningless in high dimensions. As a result, dimensionality reduction could be used to assist nearest neighbor classifiers in overcoming the curse of dimensionality. Although there are success stories of employing dimensionality reduction methods, the choice of which methods to use remains an open problem. This includes understanding how they should be used to improve the effectiveness of the nearest neighbor algorithm.

The thesis examines the research question of how to learn effectively with the nearest neighbor method in high dimensions. The research question was broken into three smaller questions. These were addressed by developing effective and efficient nearest neighbor algorithms that leveraged dimensionality reduction. The algorithm design was based on feature reduction and classification algorithms constructed using the reduced features to improve the accuracy of the nearest neighbor algorithm. Finally, forming nearest neighbor ensembles was investigated using dimensionality reduction.

A series of empirical studies were conducted to determine which dimensionality reduction techniques could be used to enhance the performance of the nearest neighbor algorithm in high dimensions. Based on the results of the initial studies, further empirical studies were conducted and they demonstrated that feature fusion and classifier fusion could be used to improve the accuracy further. Two feature and classifier fusion techniques were proposed, and the circumstances in which these techniques should be applied were examined. Furthermore, the choice of the dimensionality reduction method for feature and classifier fusion was investigated. The results indicate that feature fusion is sensitive to the selection of the dimensionality reduction method. Finally, the use of dimensionality reduction in nearest neighbor ensembles was investigated. The results demonstrate that data complexity measures such as the attribute-to-instance ratio and Fisher’s discriminant ratio can be used to select the nearest neighbor ensemble depending on the data type.

Nyckelord: Nearest Neighbor, High-Dimensional Data, Curse of Dimensionality, Dimensionality Reduction