Learning to Represent Handwritten Shapes and Words for Matching and Recognition

Almazán, Jon

doi:10.5565/rev/elcvia.728

Cita bibliográfica -- Enlace permanente: https://ddd.uab.cat/record/144985

Google Scholar: citas

Learning to Represent Handwritten Shapes and Words for Matching and Recognition
Almazán, Jon

Fecha:	2015
Resumen:	Writing is one of the most important forms of communication and for centuries, handwriting had been the most reliable way to preserve knowledge. However, despite the recent development of printing houses and electronic devices, handwriting is still broadly used for taking notes, doing annotations, or sketching ideas. In order to be easily accessed, there is a huge amount of handwritten documents, some of them with uncountable cultural value, that have been recently digitized. This has made necessary the development of methods able to extract information from these document images. Transferring the ability of understanding handwritten text or recognizing handwritten shapes to computers has been the goal of many researches due to its huge importance for many different fields. However, designing good representations to deal with handwritten shapes, e. g. symbols or words, is a very challenging problem due to the large variability of these kinds of shapes. One of the consequences of working with handwritten shapes is that we need representations to be, i. e. , able to adapt to large intra-class variability. We need representations to be discriminative, i. e. , able to learn what are the differences between classes. And, we need representations to be efficient, i. e. , able to be rapidly computed and compared. Unfortunately, current techniques of handwritten shape representation for matching and recognition do not fulfill some or all of these requirements. Through this thesis we focus on the problem of learning to represent handwritten shapes aimed at retrieval and recognition tasks. Concretely, on the first part of the thesis, we focus on the general problem of representing any kind of handwritten shape. We first present a novel shape descriptor based on a deformable grid that deals with large deformations by adapting to the shape and where the cells of the grid can be used to extract different features. Then, we propose to use this descriptor to learn statistical models, based on the Active Appearance Model, that jointly learns the variability in structure and texture of a given class. Then, on the second part, we focus on a concrete application, the problem of representing handwritten words, for the tasks of word spotting, where the goal is to find all instances of a query word in a dataset of images, and recognition. First, we address the segmentation-free problem and propose an unsupervised, sliding-window-based approach that achieves state-of-the-art results in two public datasets. Second, we address the more challenging multi-writer problem, where the variability in words exponentially increases. We describe an approach in which both word images and text strings are embedded in a common vectorial subspace, and where those that represent the same word are close together. This is achieved by a combination of label embedding and attributes learning, and a common subspace regression. This leads to a low-dimensional, unified representation of word images and strings, resulting in a method that allows one to perform either image and text searches, as well as image transcription, in a unified framework. We evaluate our methods on different public datasets of both handwritten documents and natural images showing results comparable or better than the state-of-the-art on spotting and recognition tasks.
Nota:	Advisor/s: Ernest Valveny and Alicia Fornés. 21st October 2014, Autonomous University of Barcelona
Derechos:	Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial, la distribució, i la comunicació pública de l'obra, sempre que no sigui amb finalitats comercials, i sempre que es reconegui l'autoria de l'obra original. No es permet la creació d'obres derivades.
Lengua:	Anglès
Documento:	Altres ; recerca ; Versió publicada
Materia:	Pattern recognition ; Document analysis ; Shape extraction and representation
Publicado en:	ELCVIA. Electronic letters on computer vision and image analysis, Vol. 14 Núm. 3 (2015) , p. 52-53 (Special Issue on Recent PhD Thesis Dissemination (2014)) , ISSN 1577-5097

Adreça original: https://elcvia.cvc.uab.es/article/view/v14-n3-almazan
Adreça alternativa: https://raco.cat/index.php/ELCVIA/article/view/v14-n3-almazan
Adreça original: https://elcvia.cvc.uab.cat/article/view/v14-n3-almazan
DOI: 10.5565/rev/elcvia.728

2 p, 529.7 KB

El registro aparece en las colecciones:
Artículos > Artículos publicados > ELCVIA
Artículos > Artículos de investigación

Registro creado el 2015-12-24, última modificación el 2025-02-06

Registros similares

Añadir a la cesta personal
Exportar como Citation, BibTeX, MARC, MARCXML, DC, EDM OpenAire4