OpenAI expands AI capabilities with new audio models for voice agents

ChatGPT maker OpenAI has now launched new speech-to-text and text-to-speech audio models in API to enhance voice agents. OpenAI stated that these new models, “set a new state-of-the-art benchmark, outperforming existing solutions in accuracy and reliability—especially in challenging scenarios involving accents, noisy environments, and varying speech speeds.”

These enhancements improve transcription accuracy, making the models particularly effective for applications such as customer service call centres, meeting note-taking, and other similar use cases.

Developers will now be able to instruct the text-to-speech mode to speak in a specific way. To explain this better, OpenAI gave an example of a developer instructing the voice agent to “talk like a sympathetic customer service agent.” The Sam Altman-led company in its blog post claimed that giving such instructions would unlock a new level of customisation for voice agents.

Speech-to-Text audio models: What do we know

OpenAI has introduced the new GPT-4o Transcribe and GPT-4o Mini Transcribe models, that are said to offer enhanced word error rate, improved language recognition, and greater transcription accuracy compared to the original Whisper models.

The GPT-4o Transcribe model is claimed to deliver better Word Error Rate (WER) performance across multiple benchmarks, showcasing significant advancements in speech-to-text technology.

With these upgrades, the new models are said to be more effective at capturing speech nuances, minimising errors, and ensuring higher transcription reliability. OpenAI has claimed that they perform particularly well in challenging conditions, such as strong accents, background noise, and varying speech speeds. These models are now available through the speech-to-text API.

Text-to-Speech audio model: What do we know

OpenAI has introduced the GPT-4o Mini TTS model, offering improved steerability in text-to-speech generation. For the first time, developers can guide the model not only on what to say but also on how to say it, allowing for more personalised and dynamic voice outputs. This advancement enhances applications such as customer service and creative storytelling.

The model is now accessible through the text-to-speech API. The blog post read, “Note that these text-to-speech models are limited to artificial, preset voices, which we monitor to ensure they consistently match synthetic presets.”

All latest models are now accessible to all developers via OpenAI’s API. Additionally, OpenAI has integrated these models with its Agents SDK, streamlining the development process.

For applications requiring real-time, low-latency speech-to-speech functionality, OpenAI recommends utilising its Realtime API for optimal performance.

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

OpenAI expands AI capabilities with new audio models for voice agents

More in Information Technology

UPI transaction value dips 3% to ₹23.9 trillion in April, volume dips 2%

TN CM launches special scheme on electronics components; targets ₹30,000 cr investments

Cognizant Q1 net income rises 21.4%; maintains full year revenue guidance

Must Read Articles

Software services, BPO/ITeS among top industries hiring entry level staff in India: Report

Romania’s BPO industry to hire 10% more within two years

BPO industry report says Africa is becoming global CXM hub

Budget 2022

17 firms under IT hardware PLI to start production this year: IT secy

HCLTech, Cisco launch pervasive wireless mobility service for enterprises

M&E stakeholders urge TRAI to exclude OTT, online gaming and music from Broadcasting policy

Indian IT companies become more conservative in FY25 growth projections

Infosys announces multi-year collaboration with Australian telecom giant

NTIPRIT, Ghaziabad conducts workshop on “Global Standards & IPR” on World Telecommunication and Information Society Day

Subscribe

Archives

You may also like

More in Information Technology

Must Read Articles

Subscribe

Archives