Roxana Popescu | The San Diego Union-Tribune
If you’ve heard an audiobook or a narrated news article recently, there’s a chance it was not created by a human, but by AI software that imitates a human voice.
At some point in the near or distant future, artificial narrators and actors might be common and accepted without much hesitation, but for now there is room for discussion: What is lost and gained when a machine does the work of a human performer? Who should earn money when jobs like audiobook narration are given to AI?
The co-founder of a San Diego software company called Yembo is getting into this complicated situation with an unprecedented solution to an unprecedented problem. Voice actors in San Diego and beyond are observing this approach to compensating a human for AI-enhanced work with curiosity and concern.
The situation: Yembo’s co-founder wrote and independently published a book about AI, and an actor recorded the English audiobook last year and was compensated for that recording time. Now her AI-cloned voice is being used to narrate 15 translated versions of that audiobook.
The narrator doesn't speak Swedish, Ukrainian or Turkish, but her voice does.
“US English is narrated by the flesh-and-blood Hailey, (the) rest is AI in her likeness,” Zach Rattner, the book’s author and publisher, and Yembo’s co-founder, wrote in an email.
Hailey refers to Hailey Hansard, the actor whose voice is being cloned. Through her contract, Hansard will be paid royalties for audiobooks in her voice, even though she did not narrate the book in any of those languages.
While AI narration of audiobooks and articles is increasingly common, this may be the first case of royalty payment for AI-cloned translations in the audiobook world — a growing industry that is forecasted to reach $39 billion globally by 2033, according to market research company market.us.
“As far as I know, this audiobook project is the first one where the narrator earns royalties on a product that uses their AI likeness, but they didn’t create,” Rattner said. “It’s the first that I know of, and it was enough that when I tried to figure it out, I couldn’t find anything. We had to figure it out from scratch. There weren’t templates we could find.”
Sandra Conde, a San Diego actor whose likeness has been scanned into a generative AI gaming project, reviewed details from the contract and said it addresses the interests of publisher and voice actor in an uncharted, fast-changing territory.
“It’s a new frontier kind of thing, where we don’t know what it’s going look like, even like two years from now, or a year from now,” Conde said.
Robert Sciglimpaglia, a Connecticut-based voice actor and entertainment attorney, said the contract is notable because it is groundbreaking — touching upon audiobook narration, translation and AI.
“This is the wild, wild west,” he said. “The (actors’) union doesn’t have anything for (AI) translation that I know about.”
The contract is important because of what’s at stake: “This is a big issue in the audiobook world right now: whether you use human voices or use cloned voices. Because there are some audiobooks being done with AI, and narrators are trying to protect live narration — trying to protect their livelihood,” he said.
He said AI will definitely take the place of human narrators.
Sciglimpaglia expressed uncertainty about the extent to which AI will dominate the business, posing questions about the percentage of impact it will have.
Tim Friedlander is the president and co-founder of the National Association of Voice Actors, said this contract is significant, even if it’s just one example, because it allows for human narration to be replaced or supplemented by AI generated material.
Friedlander remarked from Los Angeles that the contract terms will matter in cases where synthetic content is normalized.
He added that human actors have an advantage over AI tools due to their humanity, which enables them to provide nuanced readings based on lived experience, culture, and context.
He likened it to the Robert Frost saying about poetry being lost in translation.
Rattner concurs, as he opted to hire a human to record the English audiobook instead of using a cloned voice from the start. Only the translations are cloned.
He mentioned that there are inflections in the audiobook that would be lost by using AI, but also acknowledged scenarios where AI makes sense.
Will it matter to listeners if a voice is human or synthetic?
That may depend on the book or the voice.
Help wanted: chromosomes optional
While actors have been paid for decades for projects that involve recording and recombining their voices, the use of generative AI to clone voices is new and more efficient, requiring only a small sample to create new material. Siri debuted in 2011 with a foreboding backstory
around consent and compensation, marking the advent of using generative AI to clone voices.
Ten years ago, it would have been necessary to hire a voice actor and pay them for a month to record a large variety of sounds and words, but now, just a three-minute sample is sufficient for various applications in different languages. and others.
The Atlantic magazine and inewsource, a San Diego investigative news outlet, use an AI narration plug-in. Additionally, there has been a proliferation of AI text-to-speech narration services such as ElevenLabs, Podcastle, Speechify, Murf AI, Revoicer, and Audiobook.ai.
Acting and audiobook narration were difficult to outsource before the emergence of generative AI. The price of labor might be low in some countries, but capturing a California cadence was something they couldn't replicate. AI serves as a workaround, enabling the use of machines instead of outsourcing to people.
For this reason, creative professionals perceive generative AI as a threat to their existence. and writers Sciglimpaglia mentioned that this was the cause of conflicts between actors and studios in strikes last year.
Should studios be permitted to scan actors' faces and create new material using those scans, or should they continue to hire humans, even though synthetic actors, which do not require breaks or pay, could replace them? If studios do scan an actor's features, should they only pay for the scan, or should they also pay for potential uses that the human actor would have fulfilled?
The conclusion of the actors’ strike allows for AI cloning, but imposes restrictions on future uses and introduces regulations on compensation that provide better protection for actors, according to Sciglimpaglia.
There are approximately 100,000 working voice actors in the U.S., a conservative estimate, and around 80 percent of voiceover work is nonunion, as stated by Friedlander.
Human and machine
Last summer, Rattner — who worked in software innovation at Qualcomm before co-founding Yembo — self-published a book called “Grow Up Fast: Lessons from an AI Startup.” It’s an entrepreneurship memoir about how he helped build Yembo, a company that uses a subset of AI called computer vision to make tools for the moving and insurance industries.
The book’s Spanish translation will be released this month, followed by Ukrainian and more than 10 other languages. All could come out within months — with time built in for tweaking and revising, Rattner said.
AI narration “definitely brings the barrier of entry down for people who wouldn’t have been able to get their message out,” he said.
He broke down the time and money costs of human and machine. The English audiobook took about four weeks to record. (Hansard could only record on weekends and her vocal cords needed breaks.) “Factoring in mastering, editing, QA listening, and retakes, I’d estimate the US English audiobook took about 65 man-hours of work across all parties to create,” he wrote.
Next, they used three hours of her book recording to train an AI tool called a speech synthesis model and used that model to create the other books in translation.
Not counting translation by humans (Rattner hired people to write translations, because “AI translation makes funky mistakes in unpredictable ways”), each AI audiobook narration takes five hours, with the bulk of that spent on quality assurance — weeding out mistakes like reading 2nd not as “second” but as “two-en-dee.”
The dollar difference is more staggering: A human narrator might charge a few hundred dollars per recording session or perhaps $2,500 for an audiobook, he estimated. The voice synthesis software costs $22 a month.
A fair contract
The narrator didn’t have to do extra work to create 15 translated books, but the publisher didn’t have to go out and hire 15 other narrators. When part of an audiobook’s production is outsourced to AI, what payment is fair to creator and publisher?
This contract, which Rattner shared with the Union-Tribune, attempts to minimize losses to one human worker while maximizing the benefits of AI, which for audiobook translations include expanded access to information. Every time a translation of “Grow Up Fast” sells, the narrator will earn money — even though she never recorded in those other languages. So will the publisher, who used AI to narrate translations at a fraction of the cost of using a human actor.
—Hansard was paid $500 per four-hour day of studio recording and gets 10 percent royalties on translated works that use her cloned voice. Payments are quarterly over a 10-year term.
—Her cloned voice can only be used for this book’s translations. Other uses require a new license.
The speaker has a month to examine the product, including translations, and request changes before it is made available.
The publisher can offer the book at any price and give it away for free.
One part discusses labeling. Rattner stated that the use of AI must be revealed in product markings. This way, readers or listeners will know if the audiobook was "Narrated by Hailey Hansard" or "In the voice of Hailey Hansard."
Other individuals who reviewed the main points of the contract found it "encouraging" and said it seems fair to both sides, although some had concerns.
Everyone agreed that the narrator should receive royalties. The publisher is making more money by using AI instead of human actors, and future narrators are losing potential earnings because of AI, according to Sciglimpaglia.
"They just have one person to read in one language and they can use a machine to convert it for nothing," he said.
Friedlander appreciates that the contract covers consent, control, payment, and transparency. But he mentioned that even a fair contract raises questions about setting precedents.
"This one voice actor gets to do all of these different languages," he said. He referred to the "damage it’s done to all of the other narrators who would have done this, in those different languages."
In the future, there might be "a handful of four or five narrators who become the voice of everything," he said. Audiobooks, in particular, are "one of the places that a lot of people get their start" in voice acting. If synthetic voices become the norm, how will new people get started, he asked.
Conde wondered why royalties end after 10 years. "Does the contract expire and her voice can be used anywhere?" she asked. "I would be concerned about what happens after the 10-year period."
Wendy Hovland, a San Diego voice and on-camera actor, said the time limit can help the narrator renegotiate. She also said the publisher "seems to be working openly with her, to tell her how it’s going to be used and find a way to compensate that works for both parties." Voice actors don’t always get that, she said.
"That is a big issue: voices being — I don’t know if ‘stolen’ is the right word, but used in a way that was not originally intended. Voice actors thought they were voicing one thing and found out that their voices are used for something else," she said.
Hansard feels "very protected" by the contract because it forbids other uses for her cloned voice without her OK.
"Like other actors and creators, I do worry about being exploited by AI. But this particular agreement was win-win. Zach was very receptive to taking care of all of the concerns I had," Hansard said.
AI audiobook as proof of concept
To understand why Rattner prioritized creating a fair contract with the narrator — what’s it in for Yembo — it helps to understand what Yembo sells. Yembo’s software scans the insides of homes and creates inventories and 3D models for moving, storage and insurance reconstruction estimates.
The biggest challenge to signing new customers has not been competitors but resistance to change, Rattner said. In an industry where using a typewriter is still feasible — as one moving company he encountered does — how will they trust new technology, whether or not it’s AI? If things have worked fine for decades, why risk it?
His plan is to demonstrate that AI can be used for more than just making money.
“I find the business arrangement just as fascinating as the book itself. I…believe it’s an intriguing story about how AI can be used for good, especially considering all the concern about AI actors,” Rattner expressed .
“AI enables economic value to be linked to the output produced, rather than the effort (such as time for money),” he included. exerted (e.g., time for dollars),” he added.
Rattner mentioned that if it wasn’t for AI, he wouldn’t have pursued the foreign language narrations, since his job is to run a tech startup, not a publishing house. He found the narrator within Yembo’s ranks: Hansard, who is a product manager at Yembo and a former professional actor. She is SAG-eligible but not a union member.
“The only other option (to using AI) was nothing,” he said. By nothing, he meant no translations and no hiring of narrators in different languages.
During an interview from Los Angeles, Hansard discussed the strangeness of hearing her vocal copy. This is her first audiobook, in both English and translation.
“It’s almost startling to hear my voice speaking languages that I’ve never spoken before, but also remarkable that this possibility exists,” she stated.
She felt at ease with the project because she was assured that it wasn’t taking work from others.
“I believe the best result would be that AI doesn’t substitute human actors or human voices,” Hansard expressed. “It only complements if it wouldn’t have been possible without it.”
She continued, “I think that’s where everyone will need to tap into their humanity to ensure that AI doesn’t replace humanity. It only enhances — filling the gap when something wouldn’t have been possible.”
©2024 The San Diego Union-Tribune. Visit sandiegouniontribune.com. Distributed by Tribune Content Agency, LLC.