Final colloquium Willem van der Linden

09 July 2024 14:00 till 15:00 - Location: ME-Hall K, 34.D-1-400 - By: DCSC | Add to my calendar

Title: Generative Adversarial Nets for generating synthetic Imaging Mass Spectrometry data

Supervisor: Dr. ing. Raf van de Plas

Abstract: Imaging Mass Spectrometry is a technique that measures molecular mass distributions with respect to their spatial location. The resulting dataset contains a mass spectrum for every pixel. If divided into different classes, the number of spectra belonging to the same class can vary significantly, thereby limiting the performance of classifiers. As IMS is destructive, generating additional original samples is not possible. The data imbalance problem therefore can be counteracted by generating synthetic samples belonging to the underrepresented class. A commonly used technique to generate additional samples is SMOTE.

Recently, generative adversarial nets (GANs) have been used instead of SMOTE for the oversampling of minority classes. Using GANs-based oversampling can result in better-performing classifiers than using SMOTE oversampling. GANs is a method of machine learning in which the combination of two functions tries to learn the distribution of data. The first function generates samples from noise, while the second function aims to distinguish these generated samples from the original samples. By updating the two functions the generated samples eventually should be indistinguishable from the original data.

In this research, conditional Wasserstein-GANs with gradient penalty is implemented and tested in various ways on IMS data to oversample minority classes in a multiclass setting. Different experiments are conducted in an investigation of why working on full spectra is unsuccessful. By limiting the number of features (by dimensionality reduction) the implemented GANs can generate very similar data (based on classifier testing).

On the dataset used, using a lower number of features, our GANs can slightly increase spectral classifier (LDA) accuracy on minority classes with the downside that the classifier overfits to the minority classes. SMOTE performs slightly better than the GANs, leading to the conclusion that using GANs to oversample minority classes in this IMS dataset is not useful. However, GANs might still hold great potential in other applications for IMS data such as anomaly detection or classification.