A corpus-based study of Spanish L2 mispronunciations by Japanese speakers

Carranza, Mario; Cucchiarini, Catia; Llisterri, Joaquim; Machuca Ayuso, María Jesús; Ríos, Antonio

Cita bibliogràfica -- Enllaç permanent: https://ddd.uab.cat/record/123311

A corpus-based study of Spanish L2 mispronunciations by Japanese speakers
Carranza, Mario (Universitat Autònoma de Barcelona)
Cucchiarini, Catia (Radboud Universiteit Nijmegen)
Llisterri, Joaquim

(Universitat Autònoma de Barcelona)
Machuca Ayuso, María Jesús

(Universitat Autònoma de Barcelona)
Ríos, Antonio (Universitat Autònoma de Barcelona)

Publicació:	IATED Academy 2014
Resum:	In a companion paper (Carranza et al. ) submitted to this conference we discuss the importance of collecting specific L1-L2 speech corpora for the sake of developing effective Computer Assisted Pronunciation Training (CAPT) programs. In this paper we examine this point more deeply by reporting on a study that was aimed at compiling and analysing such a corpus to draw up an inventory of recurrent pronunciation errors to be addressed in a CAPT application that makes use of Automatic Speech Recognition (ASR). In particular we discuss some of the results obtained in the analyses of this corpus and some of the methodological issues we had to deal with. The corpus features 8. 9 hours of spontaneous, semi-spontaneous and read speech recorded from 20 Japanese students of Spanish L2. The speech data was segmented and transcribed at the orthographic, canonical-phonemic and narrow-phonetic level using Praat software [1]. We adopted the SAMPA phonemic inventory for the phonemic transcription adapted to Spanish [2] and added 11 new symbols and 7 diacritics taken from X-SAMPA [3] for the narrow-phonetic transcription. Non linguistic phenomena and incidents were also annotated with XML tags in independent tiers. Standards for transcribing and annotating non-native spontaneous speech ([4], [5]), as well as the error encoding system used in the project will be addressed. Up to 13410 errors were segmented, aligned with the canonical-phonemic tier and the narrow-phonetic tier, and annotated following an encoding system that specifies the type of error (substitutions, insertion and deletion), the affected phone and the preceding and following phonemic contexts where the error occurred. We then carried out additional analyses to check the accuracy of the transcriptions by asking two other annotators to transcribe a subset of the speech material. We calculated intertranscriber agreement coefficients. The data was automatically recovered by Praat scripts and statistically analyzed with R. The resulting frequency ratios obtained for the most frequent errors and the most frequent contexts of appearance were statistically tested to determine their significance values. We report on the analyses of the combined annotations and draw up an inventory of errors that should be addressed in the training. We then consider how ASR can be employed to properly detect these errors. Furthermore, we suggest possible exercises that may be included in the training to improve the errors identified.
Drets:	Tots els drets reservats.
Llengua:	Anglès
Document:	Capítol de llibre
Matèria:	ELE ; Error analysis ; Phonetics ; Pronunciation teaching ; Speech corpus
Publicat a:	Edulearn14 Proceedings. 6th International Conference on Education and New Learning Technologies, July 7th-9th, 2014 - Barcelona, Spain (pp. 3696-3705)

Adreça alternativa: http://library.iated.org/

11 p, 600.4 KB

El registre apareix a les col·leccions:
Llibres i col·leccions > Capítols de llibres

Registre creat el 2014-09-30, darrera modificació el 2022-09-04

Registres semblants

Afegeix-lo al cistell personal
Anomena i desa Citation, BibTeX, MARC, MARCXML, DC, EDM OpenAire4