woman with camera

Spectacular Superpixels for Manageable Medical Analysis: AI as a helpful tool to analyze CT scans more effectively

Published: 04th November 2022 Last updated: 04th November 2022

Deep learning are AI algorithms trained on lots of labelled data. A paper on a deep learning algorithm to find the pancreas shape with less labelled data will be presented at BNAIC/Benelearn conference, in Mechelen, on 8 November 2022. This work is inspired by cognitive science - how humans can learn from annotated data. This paper is co-authored by Sander van Donkelaar and Sharon Ong from Tilburg University, Lois Daamen and Paul Andel from UMC Utrecht and Ralf Zoetekouw from Datacation B.V.

Deep learning has taken the field of AI by storm.  It has proven to be very powerful in a lot of domains, from Natural Language processing to medical segmentation. These algorithms are taught using a huge inventory of data.  This is known as supervised learning. These algorithms get better the more data they see. 

But gathering large amounts of data in the medical field is hard.  Annotating medical data is very resource intensive. It can take a radiologist several hours to label a set of CT scans. Moreover, there are privacy issues:  data is not easily shared, which limits the availability of data. This negatively influences the development and usage of AI in the medical domain. Although AI has shown great potential for diagnosing diseases, the lack of data can result in less powerful models.

In his thesis, supervised by dr. Sharon Ong, MA CSAI student Sander van Donkelaar has developed an algorithm to tackle this problem. The algorithm uses self-supervision. It can learn the general structure of the data, without the need for labels from radiologists. 

Therefore, instead of getting radiologists to label the data, the labels can come from the images themselves. The methodology is inspired by how humans learn. Humans do not use labelled datasets to learn. They can learn from experience. Without the need of class labels, they explore the environment themselves to get a good understanding.  In some way, so does self-supervised learning.

The algorithm is trained in two tasks, a pre-training task followed by a target task. In the pre-training task, the algorithm creates superpixel groups in the images. Superpixels are perceptual groupings of pixels with similar intensity or texture. Afterwards, the algorithm corrupts the image by randomly swapping superpixel groups with noise. Without any labelled data, a computational model is trained to restore the corrupted image back to its original. The parameters/weights from this model are then used to train a deep learning model with to locate the boundaries of a pancreas (segment) from CT images. We can argue that reconstructing the image forces the network to learn more contextual information of the image.

Upon completion of the pre-training task, the algorithm is fine-tuned with labelled data on the target task. In this application, it is the segmentation of the pancreas on medical CT scans. van Donkerlaar’s results indicate that pre-training using self-supervision increases the performance in the target task, which is the segmentation of the pancreas. By leveraging unlabelled data, the problem of insufficient annotated data can be tackled.

van Donkerlaar’s thesis was performed in collaboration with dr. Lois Daamen and dr. Paul Andel from UMC Utrecht and Ralf Zoetekouw at Datacation B.V.  A paper from this work has been accepted to BNAIC/Benelearn conference 2022. It will be presented in Mechelen, on 8 November 2022. 

van Donkerlaar is a graduate of the Bachelor and Masters of Cognitive Science and AI at Tilburg University.  He credits his studies, which blends cognitive science and AI, for enabling him to produce this publication. Tilburg University is one of the few universities in Netherlands which offers such programs.