New AI from ChatGPT creators generates fake audio from any voice; see how it works

OpenAI presented the first tests of a new artificial intelligence feature capable of playing audio with a convincing human voice. Called Voice Engine, the text-to-speech technology was previewed by around 10 developers, said a company spokesperson.

The company has not yet decided to launch the new feature for all users, but the invention has already attracted attention for presenting a new frontier for AI, and for raising awareness about new risks of deepfakes (fake content) on the internet – especially in election years in the USA and Brazil.

How it works

Unlike OpenAI’s existing resources for generating audio, the Voice Engine can create speech that sounds exactly like people, imitating the specific cadence and intonations of a human being when speaking. All the software needs is 15 seconds of recorded audio of a person speaking to recreate their voice.

Free Masterclass

Financial Freedom Route

Learn to invest and build wealth from scratch with exclusive InfoMoney training

During a demonstration of the tool, the Bloomberg heard audio of OpenAI CEO Sam Altman briefly explaining the technology in a voice that seemed indistinguishable from his actual speech but was entirely generated by AI.

“If you have the right audio setup, it’s basically a human voice,” said Jeff Harris, product lead at OpenAI. “It’s a very impressive technical quality.” However, Harris said: “There is obviously a lot of security sensitivity around the ability to accurately imitate human speech.”

One of OpenAI’s current developer partners using the tool, the Norman Prince Neurosciences Institute, linked to the nonprofit health system Lifespan, is using the technology to help patients recover their voice. The tool was used, for example, to restore the voice of a young patient who lost the ability to speak clearly due to a brain tumor, replicating her speech from a recording made as part of a school project, the company said.

Continues after advertising

OpenAI’s custom speech model can also translate generated audio into different languages. The feature could be useful for companies like Spotify, which has already used the technology in a pilot program to translate podcasts. OpenAI also highlighted other beneficial applications of the technology, such as creating a wider range of voices for educational content for children.

Scratchs

The company had planned to roll out the tool to up to 100 developers through an application process, according to a press conference held earlier this month. However, it ended up deciding to postpone after receiving contributions from legislators, experts, educators and artists.

“We recognize that generating speech that resembles people’s voices presents serious risks, which are especially important in an election year,” the company wrote in a note on Friday (29). “We are engaging with partners from governments and sectors in print, entertainment, education, civil society and others to ensure we incorporate their feedback as we build [a tecnologia].”

Other AI technologies have already been used to fake voices in some contexts. In January, a fake but realistic-looking phone call purporting to be from President Joe Biden encouraged people in New Hampshire in the United States not to vote in the primaries – an event that fueled fears about AI ahead of a critical election period.

In the testing program, OpenAI requires its partners to agree to its usage policies, obtain consent from the owner of the voice before using it, and disclose to listeners that the voices they are hearing are generated by AI. The company will also apply an inaudible audio watermark to make it possible to distinguish whether a piece of audio was created by its tool.

When will it be released?

Before deciding whether to roll out the feature to everyone, OpenAI said it is soliciting input from more experts. “It’s important that people around the world understand where this technology is going, whether we actually launch it or not,” the company said.

OpenAI also stated that it hopes the demonstration will “motivate the need to bolster social resilience” against the challenges brought by more advanced AI technologies. For example, the company called on banks to phase out voice authentication as a security measure for accessing bank accounts and sensitive information. The company also called for public education about misleading AI content and further development of techniques to detect whether audio content is real or generated by AI.

(With Bloomberg)

The article is in Portuguese

Tags: ChatGPT creators generates fake audio voice works

How it works

Scratchs

When will it be released?

Related posts