OpenAI unveils voice cloning tool Voice Engine: All you need to know
Just months after launching its video generator Sora, ChatGPT-maker OpenAI has developed a voice cloning tool called ‘Voice Engine’. While highly anticipated, the company will keep the new feature under wraps, fearing malicious usage and rising cases of duplication and fake content online.
Here’s what we know about it:
About Voice Engine
Voice Engine uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker.
Still in its testing stage, OpenAI said in a blog post that partners testing Voice Engine have agreed to rules including requiring explicit and informed consent of any person whose voice is duplicated using the tool. AI-generated voices must be clearly marked for the audiences, the company said.
Use cases for voice AI
OpenAI has been working with a group of partners to test use cases for this technology. Here are a few the company has identified so far:
Reading assistance: This will be provided to non-readers and children, to generate natural-sounding and emotive voices representing a wider range of speakers than possible while learning and at educational institutions.
Translating content: This is to enable creators and businesses which use media like videos and podcasts to reach more people around the world in their own voices. Voice Engine will preserve the native accent of the original speaker: for example, generating English with an audio sample from a French speaker would produce speech with a French accent.
For non-verbal people: Voice Engine can be used for therapeutic applications for individuals with conditions that affect speech and educational enhancements for those with learning needs.
Danger lurks
OpenAI said it was “taking a cautious and informed approach to a broader release due to the potential for synthetic voice misuse.”
In an election year, the use of AI to spread disinformation through deepfakes, a threat multiplied due to the viral Generative AI technology, is rampant.
Acknowledging the same, the company said, “We recognize that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year.”
“We are engaging with US and international partners from across government, media, entertainment, education, civil society and beyond to ensure we are incorporating their feedback as we build,” it added.
Sora soars
This comes just months after the company released Sora, a tool to generate minute-long videos with text prompts. ET had reported reactions to the “impressive” tool in February. “Sora is remarkable. I think the genie is out of the bottle and things won’t be the same again. The quality of videos it generates is so high that stock video generation agencies will feel an immediate threat,” Hemant Mohapatra, a partner at venture fund Lightspeed India, had told ET.
Sora’s rivals range from startups like Runway Gen-2, Pika Labs, and Stability AI — that offer dedicated AI video-generation models — to the latest from search giant Google, termed Lumiere. But with its one-minute length and realistic imagery, it set a new standard in the industry.