News and events Tilburg University

TSHD have acquired new GPU servers allowing students to gain hands-on experience in the field of AI

Published: 13th October 2021 Last updated: 04th July 2022

As scientific fields such as Machine Learning, Deep Learning, Advanced Data Processing, Data Science and other AI subfields evolve at a fast pace, so should educational practices evolve too. With the GPU4EDU project, we aim to provide new generations of students with access to modern facilities that allow them to gain up-to-date knowledge and progressive experience in the field of AI. With this project, we bring the technical equipment at Tilburg University, Tilburg School of Humanities and Digital Sciences (TSHD) to a level that matches the requirements for current and future AI education.

GPU computer

This project expands the hardware fleet of the School of Humanities and Digital Sciences for the educational needs of the Department of Cognitive Science and Artificial Intelligence with high-end, multi-GPU (graphics processing unit) servers, accessible remotely and securely. Furthermore, these servers are dedicated to the educational needs of the students in courses taught within the CSAI department.

At the moment we have acquired 4 servers with 2, high-end GPUs each. This will allow 24 to 48 students to work simultaneously. At this first stage of the project we aim to test the feasibility of the provided hardware and its usefulness for students’ needs. Based on the feedback received we will determine how and to what extent to continue growing the GPU4EDU fleet in the future.

Why do we need these computers?

The current state-of-the-art in artificial intelligence is achieved through deep neural models. Deep neural networks are networks of artificial neurons which employ basic mathematical operations to jointly transform an input, e.g. text in language, into a desired output, e.g. text in another language carrying the same meaning as the input.

Artificial neural networks (ANNs) are organised in input, hidden and output layers, where each layer is typically connected to the preceding and following ones. The more hidden layers, the deeper the network, which leads to the commonly accepted terms deep neural networks (DNNs) and deep learning (DL) - machine learning using DNNs. The advantages of DL have been evident in many fields - machine translation, image recognition, audio recognition, voice synthesis, question answering and many others.

State-of-the-art networks contain hundreds of thousands to millions of parameters that need to be updated at training time. The specifics of DNN’s training algorithm make the use of current CPUs impractical. To deal with this complexity, a graphics processing unit (GPU) is more suitable. This is due to the much larger number of cores in comparison to a CPU and substantial volume of dedicated memory.

Interviews with contributors of the project

This project is funded by Tilburg School of Humanities and Digital Sciences and the EDUiLAB, partial funding has been received through an education innovation grant! Following are discussions with Dr. Dimitar Shterionov, Prof. dr. ir. Pieter Spronck and Mr. Lars Biemans who are the main contributors in this project.

Dr. Dimitar Shterionov is an assistant professor at Tilburg University. He joined the CSAI department in August 2020 and started teaching a new bachelor course - Software Engineering for CSAI. While preparing this course, Dimitar realised that students do not have easy access to facilities for training and working with large deep learning models. This was the initial motivation behind the GPU4EDU project.

Prof. Pieter Spronck is a full professor at the CSAI department. As a former head of the department, up until July 2021, he strives towards driving the department and its activities forward ensuring state-of-the-art research as well as up-to-date education. His support of the project led to securing funding for the four servers at once.

Mr Lars Biemans is a system administrator at Tilburg University. His experience with servers, networks, system setup and maintenance, as well as his knowledge in hardware and system configurations was essential to the GPU4EDU project. Through the communication with Dell Technologies he ensured the best possible hardware solution for the needs of our students.

Details about the servers and how they are going to be accessed

What are the technical specifications of the machines?

We have in total 4 Dell PowerEdge R750 servers each of which hosts 192GB of RAM, 2 CPUs and 2 GPUs. We set up these machines together with the experts in Dell for optimal performance. These 4 servers are distributed between 2 data centers.

The CPU’s are Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz. Each CPU has 16 cores which have 2 threads, which totals to 64 threads for 2 CPU's.

The GPU’s are NVidia A40 GPU's with 48 GB per card. We decided to go for these cards due to their high memory capacity which would allow on the one hand large models to be run if that’s necessary for a specific project. On the other hand, this would allow multiple users to run smaller models in parallel. For example with a model that would consume up to 8 GB of vRAM we can have 6 users working at the same time on 1 GPU, or in total 48 users at the same time overall. 

What environment do you need for the safe and optimal operation of these machines? e.g. cooling, power, networking, etc.

Each card (A40) consumes at maximum 300 W, when idle it consumes around 70W. That would imply that the maximum power these cards can consume is 600W. We have connected these machines to a 1400 W power supply that should cover all power requirements, however, for redundancy purposes we have provided a second power supply of 1400 W. Each of these servers is connected to a UPS which is connected to an EPG (Emergency Power Generator) so even if the “normal” power to the data centers fails the servers should be kept running for some time.

Since each server has only a couple of hundred of GB of local storage for the operating system etc, these are connected to the back-end storage via a fiber connection. This is also a shared storage which implies that all user files are accessible from all servers.

The A40 GPU’s are passively cooled cards, i.e. they are cooled by the server cooling system. The server rooms are also air conditioned at all times to keep the temperature low enough for the optimal operation of our servers. This temperature is around 22 degrees Celsius.

They are connected to our TiU network via two copper cables, 1 for remote management, which allows us to access and manage the machine remotely and 1 for connection to the network which allows users to access the machines and run their scripts.

How would students connect to these machines?

Using their own SSH key they can connect to the portal at students.portal.tshd.uvt.nl. From this portal they can submit jobs using SLURM.

However, this setup is still to be tested and optimised for the convenience of the students.