Mitigating Bias in Skin Lesion Classification Models Using Variational Autoencoders

This page provides a summary of my bachelor thesis which is available in the corresponding repository. Also, check out my Website, take a look at some other projects of mine on my GitHub Profile, or connect with me on LinkedIn.

Abstract

Leveraging deep learning for early detection of skin cancer could help prevent deaths. Current skin lesion classification algorithms include biases and perform worse for patients with rarer skin features. An existing bias mitigation method automatically detects rare skin features in a dataset using a Variational Autoencoder and takes them into account when training a classifier. We propose an adaptation of this method that allows having multiple classes. We show that the adaptation is effective in experiment setups similar to those in previous research. Bias with respect to age and skin tone of the patient was successfully reduced by more than 45%, with a significance of p < 0.0005. Further, we observe that using transfer learning diminishes the bias mitigation effects while providing decreased biases on its own. Lastly, we find that the method is not effective for a more complex multi-class skin lesion classification task. We discuss potential reasons and areas for future work.

Bias Mitigation Method

We applied an adapted version of the bias mitigation method from Amini et. al to a skin lesion classification task.

First, we train a Variational Autoencoder on images of different skin lesions. A Variational Autoencoder is a neural network that learns to encode images into a latent space and decode them back from the latent space.

The latent space is a lower-dimensional representation of the images which we use to determine images with rare skin features. This is of particular interest because it allows us to detect rare skin features without any human intervention. Next, we train a classifier on the images of the dataset. Hereby, we sample images with rare skin features more often. In doing so, the classifier learns to classify images with rare skin features better. We used the popular ResNet18 architecture as a classifier.

Comparison to Related Work

We performed an experiment where we follow a similar experiment setup as Sauman Das. we consider a skin lesion classification task with two classes. To evaluate the bias mitigation effect, we trained the model twice, once with the bias mitigation method and once without. Then, we compare the accuracies and biases of the two models with respect to age and sex of the patient, as well as the attribute visible hair and skin tone.

As shown in the diagram below, we observe that the bias mitigation method improves the overall weighted accuracy by almost 7% on average, while also improving the weighted accuracy for every single attribute.

Additionally, we measured bias, which we define as the variance of weighted accuracies for the different classes of an attribute. We observe that we don't have much bias with respect to sex in the first place. This can be explained by the fact that the dataset is balanced with respect to this attribute. With bias mitigation applied, the amount of bias does not change significantly. For the attribute visible hair, we observe a slight decrease in bias. However, the decrease is not significant.

The last two attributes age and skin tone are more interesting. We observe a significant decrease in bias for both attributes. This suggests, that the bias mitigation method is indeed able to automatically detect images from patient with a rare age group or a rare skin tone. Also, this information is successfully used to improve overall accuracy and reduce bias. Overall, we showed with this experiment that the bias mitigation method is effective in a setup similar to related work.

Effects of Using Transfer Learning

Transfer learning is used in many deep learning applications to improve the performance of a model. Thus, we wanted to find out how using transfer learning effects the bias mitigation method.

To do so, we performed an experiment where we trained the classifier with transfer learning. Once, with additional bias mitigation and once without. As in the previous experiment, we evaluated weighted accuracies and biases.

As expected, using transfer learning improves the overall weighted accuracy in comparison to not using transfer learning.

On top of that, we observe that using transfer learning and bias mitigation improves the overall weighted accuracy even further.

When taking a look at the biases below, we observe that using transfer learning alone leads to a decrease in bias. However, additionally applying bias mitigation no longer leads to a significant decrease in bias.

To conclude, we showed that using transfer learning leads to reduced bias and improved weighted accuracy. However, the bias mitigation method is no longer as effective.

Bias Mitigation for Complex Multi-Class classification task

Lastly, we applied the method to a more complex classification task with four classes. With this setup, the bias mitigation method was not able to improve the performance or reduce bias.

Potential reasons for that include that the dataset was to small in order to extract meaningful information from the latent space of a Variational Autoencoder. Future work, could investigate if using a separate Variational Autoencoder for each class could improve the bias mitigation effect. Another promising approach could be to perform thorough hyperparameter tuning.