My On-Air Duel with an AI Audiobook Narrator.

On Friday (Nov 19th), I was invited to speak on the Today programme (BBC Radio 4’s breakfast show) about whether audiobooks will all be narrated by AI voices in the future. As is often the case, the segment was squeezed into the final 5 minutes of the programme. There were two guests – myself and the founder of a company which creates AI voices.

Before the show, the producer had said there would be an on air comparison between my voice and the AI, with both of us reading an extract of A Christmas Carol. The idea was to see if the audience could tell which was which. Here I was, a narrative gladiator, sent into the Coliseum, to represent Team Humanity against the assembled Barbarian Automaton hordes.

The first sentence presented a problem. It’s three words long: “Marley was dead.” So I recorded a first take, listened back and had that dreadful feeling: ‘what if they can’t tell the difference? What if they actually prefer the AI?’ I then recorded two more takes: “Marley …. was dead,” followed by “Marley .. was .. dead.” Neither felt quite right. By now, part of me was thinking I should just go “Marley was DEEEAAAD!!!” People certainly wouldn’t think THAT was a computer – a terrible narrator maybe, but a human one nevertheless!

Eventually, I just realised I would have to read it as I normally would, and trust the (very) cultured ears of the BBC Radio 4 audience. And fortunately, my voice was generally recognised as the human one – no successful (and embarrassing) Turing test there!

The item caused a great detail of interest, not just from fellow narrators and friends in the voiceover industry. The story appeared in The Times newspaper the following day, complete with quotes from yours truly.

The spectre of voice automation is something we as voice artists are having to come to terms with. The genie is out of the bottle, as is the cat from the bag and the ship which has sailed etc.

AI systems are getting better and better … at copying our voices. It is really important to remember this, and to understand what an extraordinary instrument the human voice is. The end goal is to reach a point where we can’t tell the difference – the Bladerunner conundrum. This of course raises important issues of ethics and legality.

The social media platform TikTok recently made an out of court settlement with the Canadian voice artist Bev Standing, after she discovered her voice was being used on its platform without her permission. The UK Government is currently consulting on AI and Intellectual Property, but has bizarrely decided not to include performers’ rights in the consultation. Its aim, apparently, is to “encourage, and not stifle innovation.” Hmmm. The example of the case above suggests that performers’ rights will be a fundamental part of these next steps. Indeed many AI companies are already licensing human voices to be used on their systems.

So is the voiceover industry just burying its head in the sand and hoping these changes will go away? I think there is a level of acceptance that AI will become much more widespread and, in many cases, it can be a force for good – think of how virtual assistants have improved the lives for people who are visually impaired or have mobility issues. And maybe there will be a place for more functional voiceovers such as On-Hold telephone messages. You know the ones: “your call is important to us. Please hold the line while we try to connect you.”

But I was invited on to the airwaves to debate whether AI would replace human-read audiobooks. And here I think there is still a long way to go before the robot readers conquer the world.

Firstly, there is the fact that Audible, which controls around half of the audiobook market, doesn’t allow automated “text-to-speech” recordings on its platform. In fact, it seems to be expanding its “Audible Originals” productions, which would suggest it doesn’t have imminent plans to do so.

Secondly, let’s discuss the issue of cost. This has been one of the primary reasons cited by the AI companies, including my counterpart on the Today programme, as to why they will grab hold of this market. ‘Cut out the humans, let the AI take over,’ goes the argument. However, there are some wildly exaggerated – and just plain inaccurate – figures about the cost of producing an audiobook.

Speaking on the Today programme, my opposite number said the cost of producing was an audiobook was “thousands of pounds.” Well, I’ll let you into a secret: it’s possible to get an audiobook narrated for free. That’s right – free!

A number of audiobook production platforms (ACX and Findaway Voices to name but two) operate a profit share system. The way it works is that a narrator agrees to record the book, and then shares the sales proceeds with the author. So that means there would be no upfront cost to the author.

There are other figures that have been bandied around – $10,000 for a narrator to complete an audiobook. My answer to that is: I wish!! Whenever a news article mentions the growth in the audiobook market, it always focuses on celebrity narrators, such as Stephen Fry or Leslie Manville, who narrated Richard Osman’s runaway success ‘The Thursday Murder Club.’

But that misses the most important point. The real growth of this sector has been driven by narrators with home studios, who record and edit audiobooks at home. We prepare, perform, produce, engineer, self-direct – almost everything. And I can tell you one thing – we ain’t being paid $10,000 a pop. In fact audiobooks is one of the least well-paid disciplines of voice acting. The people who narrate audiobooks do it because they love books, they love language and they are passionate about telling stories.

That passion also extends to the people who are downloading and listening to audiobooks in ever greater numbers. We as humans are storytellers – it’s hardwired in us to listen to, and tell stories. And ironically, its the longer-form of narration like audiobooks which needs that human connection to keep the listener hooked in. AI is starting to create and recognise emotions – but will it really ever be able to convey the whole range? More subtle ones such as: bitterness, sarcasm and – this seems to be most commented upon – understated menace!

Returning to the extract of ‘A Christmas Carol’ which I was asked to read before appearing on the Today programme – the opening paragraphs include this gem of a passage:

“Old Marley was as dead as a door-nail. Mind! I don’t mean to say that I know, of my own knowledge, what there is particularly dead about a door-nail. I might have been inclined, myself, to regard a coffin-nail as the deadest piece of ironmongery in the trade.”

Could a robot ever understand the wit of this passage? Proper Victorian belly laughs from Dickens there, with his quizzical treatise on the nonsensical nature of the phrase ‘dead as a doornail.’

 And take the sentence below as another example. I didn’t think there was any hiding my utter relish at being asked to read such a brilliant description of Scrooge.

Oh! But he was a tight-fisted hand at the grindstone, Scrooge! A squeezing, wrenching, grasping, scraping, clutching, covetous, old sinner!”

As narrators we have a huge number of tricks up our sleeve – pacing, pitch, characterisation, proximity to the microphone, or that moment where we hold a pause just a tiny bit longer than normal. It’s not pre-planned, because it’s what we, at that moment, feel will most benefit the narrative. It’s such a privilege to be able to bring someone’s words to life, and interpret their ideas and vision as best we possibly can.

Can AI really capture, and make those moments more special than a human voice can? As someone on social media said: “AI voices are the blow-up dolls of audiobook narration.” Crude maybe, but let’s say you were an author who was deciding – AI or human voice? You’ve put your heart and soul into writing a book, so wouldn’t you like to hire someone who puts their heart and soul into narrating it?

If you want to listen back to the whole interview, you can find it here.