A Tale of Two Transcriptions : Machine-Assisted Transcription of Historical Sources

Thorvaldsen, Gunnar; Pujadas-Mora, Joana Maria; Andersen, Trygve; Eikvil, Line; Lladós, Josep; Fornes Bisquerra, Alicia; Cabré, Anna

Cita bibliogràfica -- Enllaç permanent: https://ddd.uab.cat/record/165608

A Tale of Two Transcriptions : Machine-Assisted Transcription of Historical Sources
Thorvaldsen, Gunnar (University of Tromsø. Norwegian Historical Data Centre)
Pujadas-Mora, Joana Maria 1977-

(Centre d'Estudis Demogràfics)
Andersen, Trygve (University of Tromsø. Norwegian Historical Data Centre)
Eikvil, Line (Norwegian Computing Center)
Lladós, Josep

(Centre de Visió per Computador (Bellaterra, Catalunya))
Fornes Bisquerra, Alicia

(Universitat Autònoma de Barcelona. Departament de Ciències de la Computació)
Cabré, Anna, 1943-

(Centre d'Estudis Demogràfics)

Data:	2015
Resum:	This article explains how two projects implement semi-automated transcription routines: for census sheets in Norway and marriage protocols from Barcelona. The Spanish system was created to transcribe the marriage license books from 1451 to 1905 for the Barcelona area; one of the world's longest series of preserved vital records. Thus, in the Project "Five Centuries of Marriages" (5CofM) at the Autonomous University of Barcelona's Center for Demographic Studies, the Barcelona Historical Marriage Database has been built. More than 600,000 records were transcribed by 150 transcribers working online. The Norwegian material is cross-sectional as it is the 1891 census, recorded on one sheet per person. This format and the underlining of keywords for several variables made it more feasible to semi-automate data entry than when many persons are listed on the same page. While Optical Character Recognition (OCR) for printed text is scientifically mature, computer vision research is now focused on more difficult problems such as handwriting recognition. In the marriage project, document analysis methods have been proposed to automatically recognize the marriage licenses. Fully automatic recognition is still a challenge, but some promising results have been obtained. In Spain, Norway and elsewhere the source material is available as scanned pictures on the Internet, opening up the possibility for further international cooperation concerning automating the transcription of historic source materials. Like what is being done in projects to digitize printed materials, the optimal solution is likely to be a combination of manual transcription and machine-assisted recognition also for hand-written sources.
Ajuts:	European Commission 20100407
Nota:	This article is part of the "Norwegian Historical Population Register" project financed by the Norwegian Research Council (grant # 225950) and the Advanced Grand Project "Five Centuries of Marriages"(2011-2016) funded by the European Research Council (# ERC 2010-AdG_20100407)
Drets:	Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial, la distribució, la comunicació pública de l'obra i la creació d'obres derivades, fins i tot amb finalitats comercials, sempre i quan es reconegui l'autoria de l'obra original.
Llengua:	Anglès
Document:	Article ; recerca ; Versió publicada
Matèria:	Nominative sources ; Census ; Vital records ; Computer vision ; Optical character recognition ; Word spotting
Publicat a:	Historical life course studies, Vol. 2 (Gener 2015) , p. 1-19, ISSN 2352-6343

Adreça alternativa: https://hdl.handle.net/10622/23526343-2015-0001?locatt=view:master

21 p, 2.1 MB

El registre apareix a les col·leccions:
Documents de recerca > Documents dels grups de recerca de la UAB > Centres i grups de recerca (producció científica) > Ciències socials i jurídiques > Centre d’Estudis Demogràfics (CED-CERCA)
Articles > Articles de recerca
Articles > Articles publicats

Registre creat el 2016-10-13, darrera modificació el 2024-06-01

Registres semblants

Afegeix-lo al cistell personal
Anomena i desa Citation, BibTeX, MARC, MARCXML, DC, EDM OpenAire4