Tilburg center for Cognition and Communication (TiCC)

We study how people communicate with each other and how computer systems can be taught to communicate with us.


Desmond Elliott

What: Multilingual Image Description with Neural Sequence Models
Where: DZ 4
When: Wednesday, 25 November 2015, 12:30 - 13:30 hours

We introduce multilingual image description, the task of generating descriptions of images given data in multiple languages. This can be viewed as visually-grounded machine translation, allowing the image to play a role in disambiguating language. We present models for this task that combine neural machine translation and neural image description. Our multilingual image description models generate target-language sentences using features transferred from existing models: multimodal features from a monolingual source-language image description model and visual features from an object recognition model. In experiments on a dataset of images paired with English and German sentences, using BLEU and Meteor as a metric, our models substantially improve upon existing monolingual image description models.

(Desmond Elliott from University of Amsterdam, Centrum voor Wiskunde & Grammatica)

