1 00:00:07,400 --> 00:00:11,200 EASIT: Easy Access for Social Inclusion Training. 2 00:00:12,300 --> 00:00:15,550 Unit 3B. Easy to understand and audio description. 3 00:00:15,550 --> 00:00:17,400 Element 3. Technical aspects. 4 00:00:17,400 --> 00:00:20,300 Interview with professionals: Joel Snyder. 5 00:00:20,650 --> 00:00:24,350 In this video interview the AD professional Joel Snyder 6 00:00:24,350 --> 00:00:27,300 will discuss some aspects regarding AD. 7 00:00:27,300 --> 00:00:30,650 He will share his ideas on how these can contribute 8 00:00:30,650 --> 00:00:33,730 to making AD easier to understand. 9 00:00:35,750 --> 00:00:38,450 Hello, this is Joel Snyder. 10 00:00:38,450 --> 00:00:43,800 I am the president of Audio Description Associates, 11 00:00:43,800 --> 00:00:47,800 in the USA. I am the director and founder 12 00:00:47,800 --> 00:00:53,000 of the AD project for the American Council of the Blind. 13 00:00:54,500 --> 00:01:00,500 Are there specific technical aspects that could improve 14 00:01:00,500 --> 00:01:06,300 the comprehensibility and listenability of AD? 15 00:01:06,300 --> 00:01:10,350 That can make it usable also for people 16 00:01:10,350 --> 00:01:14,120 with cognitive, intellectual and learning disabilities, 17 00:01:14,120 --> 00:01:16,430 whose processing times are slower? 18 00:01:16,500 --> 00:01:22,800 I think so. There are technical and programmatic aspects 19 00:01:22,800 --> 00:01:28,600 or aspects of execution of AD that can accomplish that, 20 00:01:28,600 --> 00:01:32,700 that can increase comprehensibility and listenability 21 00:01:32,700 --> 00:01:37,600 for people with intellectual or learning disabilities. 22 00:01:38,500 --> 00:01:44,800 On a technical side, there is a way to adjust pace. 23 00:01:46,000 --> 00:01:52,000 That is a critical element of listening and listenability, 24 00:01:52,800 --> 00:01:56,900 especially for people with cognitive disabilities. 25 00:01:57,000 --> 00:02:00,000 Slowing down the pace. 26 00:02:00,000 --> 00:02:05,700 That can be accomplished in the AD through the voicing. 27 00:02:07,750 --> 00:02:12,750 For six years I produced and supervised the AD 28 00:02:12,750 --> 00:02:15,900 for the American children show “Sesame Street”, 29 00:02:15,900 --> 00:02:20,200 which is geared towards three to five year-olds. 30 00:02:21,800 --> 00:02:27,800 That audience requires a vocabulary and reading level, 31 00:02:27,800 --> 00:02:31,500 that is lower than the general public. 32 00:02:31,500 --> 00:02:35,500 We were aware of that in the writing of the description 33 00:02:35,500 --> 00:02:40,400 and I think the same principle would apply 34 00:02:40,400 --> 00:02:45,800 to making description that is more listenable for all people. 35 00:02:45,800 --> 00:02:49,300 That is a key point. 36 00:02:49,500 --> 00:02:55,400 By making audio description more applicable 37 00:02:55,400 --> 00:02:59,700 and more reachable by more people, 38 00:02:59,700 --> 00:03:03,400 I think we build a visibility of AD 39 00:03:03,400 --> 00:03:08,000 and increase the network of people who understand 40 00:03:08,000 --> 00:03:10,300 and appreciate audio description. 41 00:03:10,300 --> 00:03:15,800 The audience of people with learning disabilities 42 00:03:15,800 --> 00:03:18,200 is an important part of that. 43 00:03:18,200 --> 00:03:22,800 But the audience can also include the general public, 44 00:03:22,800 --> 00:03:26,000 Audio description can be great for them as well: 45 00:03:26,000 --> 00:03:28,000 they can be in the kitchen, 46 00:03:28,000 --> 00:03:31,700 while the tv is on in the living room, they do not miss a bit. 47 00:03:31,700 --> 00:03:36,000 I think there are technical and problematic aspects, 48 00:03:36,000 --> 00:03:41,500 aspects of execution, that can make AD more listenable. 49 00:03:42,900 --> 00:03:46,800 Do you think that extended ADs could be a solution 50 00:03:46,800 --> 00:03:50,650 for people with intellectual or learning disabilities, 51 00:03:50,650 --> 00:03:53,000 whose processing times are slower? 52 00:03:53,000 --> 00:03:58,300 I do. Extended ADs, as long as the concept 53 00:03:58,300 --> 00:04:02,900 is done in a way that is easy to access and understand. 54 00:04:03,200 --> 00:04:07,800 Most extended AD involves clicking hyper links 55 00:04:07,800 --> 00:04:09,900 or switching to another screen 56 00:04:09,900 --> 00:04:13,000 or another audio file that supplements 57 00:04:13,000 --> 00:04:19,000 the AD that exists on a base level. 58 00:04:19,300 --> 00:04:24,800 It is important to remember that AD, describers, 59 00:04:25,000 --> 00:04:29,900 are about describing, not explaining. 60 00:04:29,900 --> 00:04:34,800 I think the explaining and the elaboration on a concept 61 00:04:34,800 --> 00:04:39,700 is best left to a teacher, 62 00:04:39,700 --> 00:04:43,200 someone who is working with a given student. 63 00:04:43,200 --> 00:04:48,900 The AD does not need to be charged with that aspect. 64 00:04:48,900 --> 00:04:53,600 It may provide additional description in the extended AD 65 00:04:53,600 --> 00:04:58,300 but it is not about explaining what something is. 66 00:04:58,300 --> 00:05:03,600 We do not explain, we describe. We do not tell, we show. 67 00:05:05,000 --> 00:05:10,300 Would TTS AD work better than human voiced AD 68 00:05:10,300 --> 00:05:15,400 for the same audience? I do not think so. 69 00:05:15,400 --> 00:05:20,100 TTS (text-to-speech) is always improving 70 00:05:20,100 --> 00:05:23,100 but it is still not at a point, 71 00:05:23,100 --> 00:05:28,400 not today, November 2020 72 00:05:28,400 --> 00:05:31,800 where it can make the distinctions, 73 00:05:31,800 --> 00:05:37,600 provide the nuance that increases listenability. 74 00:05:39,000 --> 00:05:42,600 Perhaps for some dry documentaries, 75 00:05:42,600 --> 00:05:48,600 the reading of a book or in a non-fiction text. 76 00:05:50,300 --> 00:05:55,000 But even there, we make meaning with our voices. 77 00:05:55,000 --> 00:05:59,300 I have not met a text-to-speech engine 78 00:05:59,300 --> 00:06:01,900 that can yet provide the same nuance 79 00:06:01,900 --> 00:06:04,800 that a trained audio describer can provide. 80 00:06:04,800 --> 00:06:08,600 When I train audio describers we cover vocal skills 81 00:06:08,600 --> 00:06:12,000 and we use one example, for instance: 82 00:06:12,000 --> 00:06:14,600 “Woman without her man is a savage” 83 00:06:14,600 --> 00:06:19,000 That sounds misogynistic, right? 84 00:06:19,000 --> 00:06:21,000 Who would agree with that? 85 00:06:21,000 --> 00:06:24,200 But say it back using your voice, no punctuation. 86 00:06:24,200 --> 00:06:28,300 An audio describer can insert the punctuation themselves 87 00:06:28,300 --> 00:06:31,200 because they make meaning with their voice. 88 00:06:31,200 --> 00:06:36,200 So it comes out: “Woman: without her, man is a savage”. 89 00:06:37,400 --> 00:06:40,900 I do not know that a text-to-speech engine can be taught 90 00:06:40,900 --> 00:06:45,300 to make that distinction and to bring that nuance. 91 00:06:45,300 --> 00:06:47,600 The other element is tone. 92 00:06:47,600 --> 00:06:53,500 The best audio describers employ what I call “concinence”, 93 00:06:53,500 --> 00:06:57,850 the tone of voice is in concinence with what is described. 94 00:06:57,850 --> 00:07:02,200 If it is a happy scene, there is an upbeat quality. 95 00:07:02,200 --> 00:07:06,600 If it is a serious scene, there is a sober quality to the voice. 96 00:07:06,600 --> 00:07:11,700 I am not sure that text-to-speech can be aware 97 00:07:11,700 --> 00:07:17,100 of what happens over the throughline of a video, 98 00:07:17,100 --> 00:07:22,100 to understand that “this part needs to be this way” 99 00:07:22,100 --> 00:07:26,250 because of what is on the screen and what comes later. 100 00:07:26,250 --> 00:07:30,200 That kind of nuance and subtlety is important 101 00:07:30,200 --> 00:07:33,600 and is something that a good writer of description 102 00:07:33,600 --> 00:07:37,000 and a good voicer of description can provide. 103 00:07:37,800 --> 00:07:40,500 This video was prepared by Elisa Perego, 104 00:07:40,500 --> 00:07:43,500 produced by Angelika De Markis, Laura Marini, 105 00:07:43,500 --> 00:07:46,700 Annalisa Navetta from University of Trieste. 106 00:07:46,700 --> 00:07:48,700 Narrator: Annalisa Navetta.