EUFCC-340K : A faceted hierarchical dataset for metadata annotation in GLAM collections
Net, Francesc 
(Centre de Visió per Computador)
Folia, Marc (Nubilum)
Casals, Pep (Nubilum)
Bagdanov, Andrew 
(Università degli Studi di Firenze)
Gomez Bigorda, Lluis 
(Universitat Autònoma de Barcelona. Departament de Ciències de la Computació)
| Date: |
2025 |
| Abstract: |
In this paper, we address the challenges of automatic metadata annotation in the domain of Galleries, Libraries, Archives, and Museums (GLAMs) by introducing a novel dataset, EUFCC-340K, collected from the Europeana portal. Comprising over 340,000 images, the EUFCC-340K dataset is organized across multiple facets - Materials, Object Types, Disciplines, and Subjects - following a hierarchical structure based on the Art & Architecture Thesaurus (AAT). We developed several baseline models, incorporating multiple heads on a ConvNeXT backbone for multi-label image tagging on these facets, and fine-tuning a CLIP model with our image-text pairs. Our experiments to evaluate model robustness and generalization capabilities in two different test scenarios demonstrate the dataset's utility in improving multi-label classification tools that have the potential to alleviate cataloging tasks in the cultural heritage sector. The EUFCC-340K dataset is publicly available at https://github. com/cesc47/EUFCC-340K. |
| Grants: |
Ministerio de Ciencia e Innovación RYC2020-030777-I
|
| Note: |
Altres ajuts: acords transformatius de la UAB |
| Rights: |
Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial, la distribució, la comunicació pública de l'obra i la creació d'obres derivades, fins i tot amb finalitats comercials, sempre i quan es reconegui l'autoria de l'obra original.  |
| Language: |
Anglès |
| Document: |
Article ; recerca ; Versió publicada |
| Subject: |
Automatic metadata annotation ;
Hierarchical datasets ;
Image tagging ;
Cultural heritage ;
GLAM |
| Published in: |
Multimedia tools and applications, (January 2025) , ISSN 1573-7721 |
DOI: 10.1007/s11042-024-20561-9
The record appears in these collections:
Articles >
Research articlesArticles >
Published articles
Record created 2025-03-21, last modified 2025-12-10