ChatGPT creator OpenAI offered a sneak peek of a new artificial intelligence (AI) tool on Friday that is capable of producing “natural-sounding speech” and imitating human voices.
The tool, named Voice Engine, needs just “a single 15-second audio sample to create natural-sounding speech that closely mirrors the original speaker,” OpenAI stated in a blog post.
The AI startup noted that Voice Engine can assist with reading, translate content, and offer a voice to individuals who are nonverbal or have a speech condition. However, OpenAI admitted that the tool could pose “significant risks, which are particularly concerning in an election year.”
The company initially developed Voice Engine in late 2022 and commenced private testing with a “small group of trusted partners” late last year.
OpenAI stressed that these partners have agreed to its usage policies, which require explicit and informed consent from the original speaker and forbid the impersonation of individuals without their consent.
The partners are also obligated to disclose that the voices are AI-generated, and any audio produced by Voice Engine includes watermarking to aid in tracing its origin, the company pointed out.
OpenAI stated that it believes any widespread deployment of such a tool should incorporate voice authentication to “confirm that the original speaker is knowingly adding their voice to the service,” as well as a “no-go voice list” to prevent the creation of voices resembling prominent figures.
The company also suggested that institutions phase out the use of voice-based authentication for accessing bank accounts and other sensitive information.
And it still seemed somewhat unsure about whether it would ultimately release the tool more widely.
“We hope to initiate a conversation about the responsible implementation of synthetic voices, and how society can adapt to these new capabilities,” OpenAI stated in the blog post. “Based on these discussions and the results of these small scale tests, we will make a more informed decision about whether and how to deploy this technology at scale.”
The new voice technology arrives amid escalating concerns about the potential for AI-generated deepfakes to propagate election-related misinformation.
In the early part of this year, a message emulating President Biden was sent to voters in New Hampshire prior to the January primary election, urging them not to go to the polls.
Steve Kramer, a seasoned Democratic operative, later admitted to producing the false robocalls and stated he did so to draw attention to the risks of AI in politics.
A local Arizona newsletter similarly released an AI-generated deepfake video of Republican Senate candidate Kari Lake last month to caution readers about “just how advanced this technology is becoming.”