Can you believe your eyes? Visualization of data with machine learning
Imagine yourself as a medical expert, with a bright white doctor coat, looking at the files of patients full of fMRI and CT scans, blood measurements, ECG and EEG tests and so many more. You want to visualize the patients all in one go on your screen. The amount of data is so huge that it is called “high dimensional data”. Well, do not be scared, this is possible with machine learning algorithms (magic does not always happen at Hogwarts!). Such algorithms are called dimensionality reduction algorithms and one very famous of them is t-Distributed Stochastic Neighbor Embedding (t-SNE). Using t-SNE, we can visualize our data on our screens and see the patterns underneath it. This is very good news especially for those of us who turn on the lights first when they get into a dark room because t-SNE is the light that we turn on when we want to visualize our big, dark room of data. But the question is do we believe in what we see when the lights are on?
No algorithm is perfect and so is the t-SNE. Since our light is not perfect, we cannot rely on what we see on our monitors as visualizations of our data. Going back to where we started, we can be visualizing crucial medical data and making decisions out of what we see. The consequences of any mistakes sourcing from the algorithm can be severe. It seems we would better not believe in what we see. Well, what to do then? Two researchers from the Cognitive Science and Artificial Intelligence Department recently published an interesting article on how to measure the confidence or trustworthiness of t-SNE using machine learning. Their results show that it is in fact possible to quantify the output of t-SNE with considerable precision. Therefore, the answer to the question of whether you can believe in what you see or not becomes irrelevant, because now we can measure the trustworthiness of what we are seeing. Thanks to their algorithm, we can now visualize our data and at the same time see where the algorithm possibly makes mistakes so that we can make our decisions accordingly.
I can hear you asking “well okay, but what exactly will this bring us?”. We are living in an age of data. Information gathered from data is vital for many technologies from health care to entertainment. Quantifying the visualizations of the data provides us further insights into the data and understand it much better. And with this better understanding, we can make better decisions from health care to entertainment.
For more information, please see:
Ozgode Yigin, Busra, and Gorkem Saygili. "Confidence estimation for t-SNE embeddings using random forest." International Journal of Machine Learning and Cybernetics (2022): 1-12.