OpenAI voice cloning tech only needs 15 seconds of audio

OpenAI recently announced a small-scale preview of its new voice cloning technology, Voice Engine. This tool can mimic any speaker’s voice by analysing a 15-second audio sample, generating “natural-sounding speech” with “emotive and realistic voices.”

Read: MultiChoice is now 40% owned by Canal+

The technology is built on OpenAI’s pre-existing text-to-speech API and has been in development since 2022. OpenAI has already used a version of the toolset for the preset voices available in its current text-to-speech API and the Read Aloud feature. Samples on OpenAI’s blog showcase the technology’s impressive ability to replicate voices closely. Listening to these samples reveals a range of possibilities, both beneficial and concerning.

OpenAI envisions Voice Engine being helpful for reading assistance, language translation, and supporting those with sudden or degenerative speech conditions. For example, a Brown University pilot program assisted a patient with speech impairment by creating a Voice Engine clone from audio recorded for a school project.

While the technology holds promise, there is potential for misuse by bad actors, particularly in creating deepfake content. Voice Engine isn’t ready for full deployment yet, as serious privacy concerns need addressing before a broad release.

OpenAI acknowledges the risks, especially in an election year, and is incorporating feedback from partners across various sectors to minimize risks. Preview testers must adhere to OpenAI’s usage policies, which prohibit impersonation without consent or legal right. Users must also disclose AI-generated voices to their audience.

Safety measures include watermarking to trace audio origin and proactive monitoring of system usage. When Voice Engine officially launches, a “no-go voice list” will detect and prevent AI-generated voices that closely resemble prominent figures.

OpenAI has not specified a rollout timeline but is keeping its pricing competitive. Potential data suggests Voice Engine could cost $15 per one million characters, or roughly 162,500 words—the length of Stephen King’s “The Shining.” An “HD” version could cost twice as much, but details remain unclear.

Additionally, OpenAI recently announced a partnership with Microsoft to build an AI-based supercomputer called “Stargate,” reportedly costing around $100 billion, according to The Information. OpenAI’s strategic moves underscore its commitment to advancing AI technology.