woman with camera

Learning from sparse examples

Published: 10th November 2023 Last updated: 10th November 2023

Traditional handwritten text recognition (HTR) approaches ignore the multimodal character of historical manuscripts, underestimating problems at the level of layout analysis, detection of the logical reading order over a scanned page of text, and recognition of textual and graphic patterns with a special meaning (headings, illuminated capitals, repetitive administrative forms, tables and other markers). In order to exploit deep learning in these areas, it is necessary to have ‘humans in the loop’, who not only label characters & words and transcribe lines of text but who also train the machine to analyse images in a multifaceted manner.

Prof. Dr. Eric Postma is delighted to be involved in this project which seeks to develop advanced machine-learning methods that can deal with sparse data. The sparsity may be in the limited number of instances available or in the limited number of labels available. Dealing with sparse data will be achieved by means of (i) transfer learning, (ii) structured approaches such as Graph Convolutional Networks, and (iii) unsupervised, self-supervised and semi-supervised methods. In the context of DH applications, these advanced deep-learning methods will assist users in integrating their explicit domain knowledge with implicit knowledge acquired from sparse data by deep learning models.

 

Traditional handwritten text recognition (HTR) approaches ignore the multimodal character of historical manuscripts, underestimating problems at the level of layout analysis, detection of the logical reading order over a scanned page of text, and recognition of textual and graphic patterns with a special meaning (headings, illuminated capitals, repetitive administrative forms, tables and other markers). In order to exploit deep learning in these areas, it is necessary to have ‘humans in the loop’, who not only label characters & words and transcribe lines of text but who also train the machine to analyse images in a multifaceted manner. The approach will allow for multimodal data mining on the basis of layout and graphics attributes.

 

An important consideration is that the time-varying labels and metadata associated with a collection needs to be taken into account, preventing catastrophic forgetting. In order to optimise the training of continual machine learning algorithms, we also aim to create feedback loops between engaged citizens and our output: visualisations of basic narrative components (e.g. place, circumstances of a historical event, number of involved persons, thematic aspects) as can be found in many multimodal archives. By allowing citizens to label (e.g., by mouse click, or by voice) the machine-generated excerpts we aim to develop a ‘story visualiser’ which can also be used in the context of other multimodal collections.