The Project

INSIGHT is a research project that targets the digital assets of two museum clusters in Brussels: Royal Museums of Fine Arts of Belgium and Royal Museums of Art and History. This project aims to deploy the recent advances in Artificial Intelligence (language technology and computer vision in particular) to support the enrichment of these collections with descriptive metadata. An important focus of this project is the issue of transferring knowledge from open collections, such as The Rijksmuseum Dataset, to other players in the field. To this end, we investigate issues relating to multimodality or the way in which we can simultaneously model different information streams about digital heritage objects (e.g. in different languages, or across different media). Apart from multimodality, multilinguality will be another crucial aspect of our research, which is of course important in the context of federal heritage collections in Belgium. The end goal of this project is to develop and release a series of practical Machine Learning tools for managing digital collections. A major outcome of this project will be an export of the digital collections involved as a “Europeana-ready” linked open data set, which will contribute to the broader accessibility of these collections.

Cultural Heritage Collections

Many cultural heritage collections are nowadays going through a phase of mass-digitization, whereby heritage objects are digitized, catalogued and published at an unprecedented scale using computational means. This process is challenging because of the rapid pace at which it progresses: the digitization of cultural artefacts itself is already an expensive, and time-consuming process and yet, in the end, it only yields low-level data (e.g. raw scans) which have to be supplemented with descriptive metadata to become practically useful (e.g. assign a period of composition to a painting; describe the subject of a photograph etc.). This process is known as (semantic) data enrichment. Such metadata is often assigned using thesauri (e.g. Art and Architecture Thesaurus), that provide a standardized terminology (‘controlled vocabularies’) to characterize cultural artefacts. While crucial to both curation and research, such annotations are still expensive to obtain because they are provided manually by subject experts, who need to master the domain-specific language of individual thesauri.

The field of cultural heritage collections is characterized by an interesting discrepancy. There are the well-known collections of larger institutions, such as the Rijksmuseum or the British Museum, that enjoy a high, international visibility among the general public. Many of these institutions have taken the lead in collection digitization and are increasingly opening up their content to a wider audience in the public domain, using liberal licenses that encourage re-use. On the other hand, the cultural heritage sector abounds in smaller players, that lack the funding or manpower necessary to manually undertake such initiatives at the same pace or scale. Such, smaller players are increasingly experiencing difficulties in handling the incoming quantity of digitized materials. This gap between large, progressive and smaller, less advanced digital collections calls for the following research incentive: it should be possible to automatically extract the knowledge captured in the larger data sets, readily available for re-use, to support the data processing in up and coming collections.


The objective of this project therefore is to advance the application of automated algorithms from the field of Artificial Intelligence (AI) to support cultural heritage institutions in their effort to keep up with their ongoing annotation efforts for their expanding digital collections. In particular, we will focus on recent advances in Machine Learning, where the application of neural networks (Deep Learning) has recently led to significant breakthroughs, for instance, in the fields of Natural Language Processing and Computer Vision. We will determine how state-of-the-art algorithms can be used to (semi-)automatically catalogue and describe digital objects, especially those for which no, little or incomplete metadata is available. Importantly, our project aims to increase the interoperability of systems through providing ‘data-driven export filters’ that allow institutions to share their collections as open linked data, even if their collections for various practical reasons hitherto used closed-vendor, ad hoc metadata systems, e.g. mono-lingual thesauri that are still incompatible with international standards. Interestingly, many practitioners in the field of cultural heritage seem rather unaware of the significant progress which has recently been booked in AI. We therefore aim to raise awareness of the capabilities of present-day AI and bring heritage management up to speed with recent advances in Machine Learning.