Good robot journalists need more development
Huge amounts of data are available in our society: for instance weather information, financial data, sports statistics, which are not yet represented in language. All this kind of data could improve our lives, increasing our understanding of the world and helping us take actions. But in a lot of cases, available raw data is too much and too unclear to understand. How to make that data more understandable, for media, Chris van der Lee explored in his dissertation on which he will receive his doctorate on Nov. 25.
Data-to-text systems are ideally suited as tools for processing raw data. These are computer programs that automatically convert data (for example, daily temperatures, amount of precipitation, wind force, etc. for a given location) into understandable, natural text (a weather report). The advantage of this type of system is that we can also use text to explain, for example, the background, context, and conditions of the numbers and statistics underlying it.
What is just striking is that most of the data-to-text systems being used - mainly by media outlets, where they are called robot journalists - are not that technologically advanced yet. They use template texts combined with simple, handwritten, rules for applying those templates.
With new self-learning machine learning models, it is theoretically possible to skip the step of manually writing rules and leave it to the computer. If the computer also does the rule learning competently, it may even begin to understand the underlying logic of a text better than a programmer can with handwritten rules. This, in turn, would result in more natural texts.
Applications in industry
Yet these machine learning data-to-text systems are not yet being used in industry. One reason for this is the way these models learn. Namely, data-to-text systems need a lot of examples of data, paired with the textual translation. Such a thing hardly occurs in a natural setting, which means it takes a lot of investment in terms of time and money to create enough examples.
Chris van der Lee aims to address these problems of machine learning models to ensure that they become more easily usable in industry. For example, he has developed new machine learning data-to-text systems that provide more insight into how models learn, and a new method for efficiently collecting suitable examples that machine learning models can learn from.
Chris van de Lee will receive his doctorate on Nov. 25 at 1:30 p.m. in the university's auditorium with livestream. His dissertation is titled: 'Next Steps in Data-to-Text Generation: Towards Better Data, Models, and Evaluation'. For more informatation please contact scientific editor Tineke Bennema, email@example.com and tel. 013 4668998.