Tilburg University department Cognitive Science and Artificial Intelligence

The voice of the artist

Published: 07th October 2021 Last updated: 25th October 2021

There are two women on stage. Their faces look the same, they have the same haircut and wear the same glasses. Someone from the audience asks a question. One woman starts answering in French, closely followed by the other – but although their voices are the same, the second woman is speaking English. She is also two meters tall and has a clearly mechanical body. The first woman is ORLAN, a French contemporary artist. The second is ORLANOÏDE, her robot replica. In a collaboration with Science Gallery Dublin, they present a new addition to this work at Ars Electronica Festival 2021 – the robot speaks.

A voice can seem so simple, but behind this one lies the effort of Dr. Dimitar Shterionov, Assistant Professor in Cognitive Science and Artificial Intelligence at Tilburg University, and a former researcher from the ADAPT Centre at Dublin City University. Most of his work is in machine translation of written text, so the speech translation necessary for this project was an entirely new challenge, especially because the work is intended to be performed live. “As scientists, we often think in terms of developing our own models for experiments. What this project taught me is that sometimes you need to be practical, use already available tools and adapt them to the situation,” says Dimitar. “Especially since we are often very focused on a specific domain of interest. When you want to build a system like this, it’s an engineering challenge; you have to deal with multiple systems and people at once.” ORLANOÏDE uses Microsoft Cognitive Services to translate the words spoken by ORLAN from French to English, but the wrapper around it is custom. In a two way streaming system, the audio input from the artist is recognized, translated, then send to a queue on the cloud, which is read by a script on the robot side. The robot utters the phrases, while the audio signal controls the movement of the lips. Additionally, the conversation is sent to a separate queue where a thermal printer records it on paper for posterity. Originally meant to just translate from French to English, further development allows it to generate different queues for different languages, and even make them two-way, so two people can converse without messages interfering. ORLAN’s work focuses on altering and reinventing the body and the self. For the final text to speech synthesis, the team therefore collaborated with Acapela Group to develop a speech model using ORLAN’s own voice. After one and a half hours of recording the artist speaking in English, the model could be trained and connected to the system. And thus, ORLANOÏDE could not only replicate ORLAN’s face and body movements, but also her voice. In a live setting, a Raspberri Pi running the Python scripts and connecting input and output can take care of the job. For Ars Electronica however, a recording was made that had quite the international character. “We did the first recording yesterday, without any preparation, we hadn’t done any tests on the wrapper or the connections. The artist doesn’t even have the system, so it was very funny, piping the audio in from France, running the system here in the Netherlands, and the robot script speaking from the queue in Dublin,” says a bemused Dimitar. “We never even tried the speech Department of Cognitive Science and Artificial Intelligence recognition with her. Luckily, she has a very deep and slow voice, which works very well. We were all really happy with the result.”For Dimitar, this project has been interesting because of the technical challenges, but also because of the newly gained perspectives. “The artist was very open minded, open for a lot of things. I have a background in industry, and projects can be very constrained sometimes.” ORLAN, who has extensively explored the relationship between artist and subject in her work, has inspired some new questions with this interactive experience. As Dimitar remarks: “I usually do not have human ‘subjects’, I build models and test them. I’m working directly with the AI. But I do wonder what people would want to learn from this robot, how humans interact with such a machine. Will they see it only as a fancy loudspeaker, or will they think about what it means to have a robot stand in for you, for example when you can not physically be there?”

You can see ORLAN and ORLANOÏDE this weekend at Ars Electronica online