Pre-trained CNNs as Feature-Extraction Modules for Image Captioning : An Experimental Study

Al-Malla, Muhammad Abdelhadie; Jafar, Assef; Ghneim, Nada

doi:10.5565/rev/elcvia.1436

Cita bibliográfica -- Enlace permanente: https://ddd.uab.cat/record/258876

Scopus: 8 citas, Google Scholar: citas

Pre-trained CNNs as Feature-Extraction Modules for Image Captioning : An Experimental Study
Al-Malla, Muhammad Abdelhadie

(Higher Institute of Applied Science and Technology)
Jafar, Assef

(Higher Institute for Applied Sciences and Technology (HIAST))
Ghneim, Nada

(Arab International University)

Título variante:	Pre-trained CNNs as Feature-Extraction Modules for Image Captioning
Fecha:	2022
Resumen:	In this work, we present a thorough experimental study about feature extraction using Convolutional Neural Networks (CNNs) for the task of image captioning in the context of deep learning. We perform a set of 72 experiments on 12 image classification CNNs pre-trained on the ImageNet [29] dataset. The features are extracted from the last layer after removing the fully connected layer and fed into the captioning model. We use a unified captioning model with a fixed vocabulary size across all the experiments to study the effect of changing the CNN feature extractor on image captioning quality. The scores are calculated using the standard metrics in image captioning. We find a strong relationship between the model structure and the image captioning dataset and prove that VGG models give the least quality for image captioning feature extraction among the tested CNNs. In the end, we recommend a set of pre-trained CNNs for each of the image captioning evaluation metrics we want to optimise, and show the connection between our results and previous works. To our knowledge, this work is the most comprehensive comparison between feature extractors for image captioning.
Derechos:	Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial, la distribució, i la comunicació pública de l'obra, sempre que no sigui amb finalitats comercials, i sempre que es reconegui l'autoria de l'obra original. No es permet la creació d'obres derivades.
Lengua:	Anglès
Documento:	Article ; recerca ; Versió publicada
Materia:	Convolutional Neural Network ; Feature Extraction ; Image Captioning ; Deep Learning
Publicado en:	ELCVIA. Electronic letters on computer vision and image analysis, Vol. 21 Núm. 1 (2022) , p. 1-16 (Regular Issue) , ISSN 1577-5097

Adreça original: https://elcvia.cvc.uab.cat/article/view/1436
Adreça alternativa: https://raco.cat/index.php/ELCVIA/article/view/980000001003
DOI: 10.5565/rev/elcvia.1436

16 p, 12.3 MB

El registro aparece en las colecciones:
Artículos > Artículos publicados > ELCVIA
Artículos > Artículos de investigación

Registro creado el 2022-05-14, última modificación el 2025-11-14

Registros similares

Añadir a la cesta personal
Exportar como Citation, BibTeX, MARC, MARCXML, DC, EDM OpenAire4