Microsoft unveils text-to-speech avatar tool in deepfake era
Microsoft has introduced a new text-to-speech feature with vision capabilities, that enables users to create talking avatar videos with text input, and to build real-time interactive bots trained using human images.
Called Azure AI Speech text and available in public preview, it allows customers to create synthetic videos of a 2D photorealistic avatar speaking.
“The Neural text to speech Avatar models are trained by deep neural networks based on the human video recording samples, and the voice of the avatar is provided by text to speech voice model,” the company saud during the ‘Microsoft Ignite’ event late on Wednesday.
With text to speech avatar, the users can create more engaging digital interactions. They can use the avatar to build conversational agents, virtual assistants, chatbots, and more.
The text-to-speech avatar is designed with the intention of protecting the rights of individuals and society, fostering transparent human-computer interaction, and counteracting the proliferation of harmful deepfakes and misleading content.
“For this reason, custom avatar is a Limited Access feature available by registration only, and only for certain use cases. To access and use the feature in your business applications, register your use case here and apply for the access,” said the company.
The company is offering two separate text to speech avatar features at this time: prebuilt text to speech avatar and custom text to speech avatar.
“Microsoft offers prebuilt text to speech avatars as out of box products on Azure for its subscribers. These avatars can speak different languages and voices based on the text input. Customers can select an avatar from a variety of options and use it to create video content or interactive applications with real time avatar responses,” the company said.