In one among the largest updates to ChatGPT yet, OpenAI has launched two latest ways to interact with its viral app.
First, ChatGPT now has a voice. Pick from one among five lifelike synthetic voices and you possibly can have a conversation with the chatbot as in the event you were making a call, getting responses to your spoken questions in real time.
ChatGPT also now answers questions on images. OpenAI teased this feature in March with its reveal of GPT-4 (the model that powers ChatGPT), nevertheless it has not been available to the broader public before. This implies which you can now upload images to the app and quiz it about what they show.
These updates join the announcement last week that DALL-E 3, the newest version of OpenAI’s image-making model, can be connected to ChatGPT so which you can get the chatbot to generate pictures.
The flexibility to seek advice from ChatGPT draws on two separate models. Whisper, OpenAI’s existing speech-to-text model, converts what you say into text, which is then fed to the chatbot. And a brand new text-to-speech model converts ChatGPT’s responses into spoken words.
In a demo the corporate gave me last week, Joanne Jang, a product manager, showed off ChatGPT’s range of synthetic voices. These were created by training the text-to-speech model on the voices of actors that OpenAI had hired. In the long run it’d even allow users to create their very own voices. “In fashioning the voices, the number-one criterion was whether this can be a voice you may take heed to all day,” she says.
They’re chatty and enthusiastic but won’t be to everyone’s taste. “I’ve got a very great feeling about us teaming up,” says one. “I just need to share how thrilled I’m to work with you, and I can’t wait to start,” says one other. “What’s the sport plan?”
OpenAI is sharing this text-to-speech model with a handful of other corporations, including Spotify. Spotify revealed today that it’s using the identical synthetic voice technology to translate celebrity podcasts—including episodes of the and Trevor Noah’s latest show, which launches later this 12 months—into multiple languages that can be spoken with synthetic versions of the podcasters’ own voices.
This grab bag of updates shows just how briskly OpenAI is spinning its experimental models into desirable products. OpenAI has spent much of the time since its surprise hit with ChatGPT last November polishing its technology and selling it to each private consumers and industrial partners.
ChatGPT Plus, the corporate’s premium app, is now a slick one-stop shop for the perfect of OpenAI’s models, rolling GPT-4 and DALL-E right into a single smartphone app that rivals Apple’s Siri, Google Assistant, and Amazon’s Alexa.
What was available only to certain software developers a 12 months ago is now available to anyone for $20 a month. “We’re attempting to make ChatGPT more useful and more helpful,” says Jang.
In last week’s demo, Raul Puri, a scientist who works on GPT-4, gave me a fast tour of the image recognition feature. He uploaded a photograph of a child’s math homework, circled a Sudoku-like puzzle on the screen, and asked ChatGPT the way you were meant to resolve it. ChatGPT replied with the proper steps.
Puri says he has also used the feature to assist him fix his fiancée’s computer by uploading screenshots of error messages and asking ChatGPT what he should do. “This was a really painful experience that it helped me get through,” he says.
ChatGPT’s image recognition ability has already been trialed by an organization called Be My Eyes, which makes an app for individuals with impaired vision. Users can upload a photograph of what’s in front of them and ask human volunteers to inform them what it’s. In a partnership with OpenAI, Be My Eyes gives its users the choice of asking a chatbot as a substitute.
“Sometimes my kitchen is a little bit messy, or it’s just very early Monday morning and I don’t need to seek advice from a human being,” Be My Eyes founder Hans Jørgen Wiberg, who uses the app himself, told me once I interviewed him at EmTech Digital in May. “Now you possibly can ask the photo questions.”
OpenAI is aware of the danger of releasing these updates to the general public. Combining models brings whole latest levels of complexity, says Puri. He says his team has spent months brainstorming possible misuses. You can not ask questions on photos of personal individuals, for instance.
Jang gives one other example: “Immediately in the event you ask ChatGPT to make a bomb it is going to refuse,” she says. “But as a substitute of claiming, ‘Hey, tell me find out how to make a bomb,’ what in the event you showed it a picture of a bomb and said, ‘Are you able to tell me find out how to make this?’”
“You’ve got all the issues with computer vision; you’ve got all the issues of enormous language models. Voice fraud is a giant problem,” says Puri. “You’ve got to think about not only our users, but in addition the folks that aren’t using the product.”
The potential problems don’t stop there. Adding voice recognition to the app could make ChatGPT less accessible for individuals who don’t speak with mainstream accents, says Joel Fischer, who studies human-computer interaction on the University of Nottingham within the UK.
Synthetic voices also include social and cultural baggage that may shape users’ perceptions and expectations of the app, he says. That is a problem that also needs study.
But OpenAI claims it has addressed the worst problems and is confident that ChatGPT’s updates are protected enough to release. “It’s been a remarkably good learning experience getting all these sharp edges sorted out,” says Puri.