Exploiting multimedia content : a machine learning based approach

Hassa, Ehtesham

doi:10.5565/rev/elcvia.598

Bibliographic citation -- Permanent link: https://ddd.uab.cat/record/119289

Scopus: 0 citations, Google Scholar: citations

Exploiting multimedia content : a machine learning based approach
Hassa, Ehtesham

Date:	2014
Abstract:	This thesis explores use of machine learning for multimedia content management involving single/multiple features, modalities and concepts. We introduce shape based feature for binary patterns and apply it for recognition and retrieval application in single and multiple feature based architecture. The multiple feature based recognition and retrieval frameworks are based on the theory of multiple kernel learning (MKL). A binary pattern recognition framework is presented by combining the binary MKL classifiers using a decision directed acyclic graph. The evaluation is shown for Indian script character recognition, and MPEG7 shape symbol recognition. A word image based document indexing framework is presented using the distance based hashing (DBH) defined on learned pivot centres. We use a new multi-kernel learning scheme using a Genetic Algorithm for developing a kernel DBH based document image retrieval system. The experimental evaluation is presented on document collections of Devanagari, Bengali and English scripts. Next, methods for document retrieval using multi-modal information fusion are presented. Text/Graphics segmentation framework is presented for documents having a complex layout. We present a novel multi-modal document retrieval framework using the segmented regions. The approach is evaluated on English magazine pages. A document script identification framework is presented using decision level aggregation of page, paragraph and word level prediction. Latent Dirichlet Allocation based topic modelling with modified edit distance is introduced for the retrieval of documents having recognition inaccuracies. A multi-modal indexing framework for such documents is presented by a learning based combination of text and image based properties. Experimental results are shown on Devanagari script documents. Finally, we have investigated concept based approaches for multimedia analysis. A multi-modal document retrieval framework is presented by combining the generative and discriminative modelling for exploiting the cross-modal correlation between modalities. The combination is also explored for semantic concept recognition using multi-modal components of the same document, and different documents over a collection. An experimental evaluation of the framework is shown for semantic event detection in sport videos, and semantic labelling of components of multi-modal document images.
Note:	Advisors: Prof. M Gopal, Prof. Santanu Chaudhury. Date and location of PhD thesis defense: 10 September 2013, Indian Institute of Technology Delhi
Rights:	Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial i la comunicació pública de l'obra, sempre que no sigui amb finalitats comercials, i sempre que es reconegui l'autoria de l'obra original. No es permet la creació d'obres derivades.
Language:	Anglès
Document:	Altres ; recerca ; Versió publicada
Published in:	ELCVIA : Electronic Letters on Computer Vision and Image Analysis, Vol. 13, Núm. 2 (2014) , p. 69-69, ISSN 1577-5097

Adreça alternativa: https://raco.cat/index.php/ELCVIA/article/view/281643
Adreça original: https://elcvia.cvc.uab.es/article/view/v13-n3-hassan
DOI: 10.5565/rev/elcvia.598

335 p, 17.2 MB

1 p, 87.5 KB

The record appears in these collections:
Articles > Published articles > ELCVIA
Articles > Research articles

Record created 2014-07-29, last modified 2024-02-23

Similar records

Add to personal basket
Export as Citation, BibTeX, MARC, MARCXML, DC, EDM OpenAire4