is getting some significant updates that will enable the chatbot to deal with voice commands and image-based queries. Users will be able to have a voice conversation with ChatGPT on Android and iOS and to feed images into it on all platforms. is rolling out the features now. They’ll be available to Plus and Enterprise users at first, with other folks gaining access to the image-based features later.
You’ll need to opt in to voice conversations in the ChatGPT app (go to Settings then New Features) if you’d like to try them out. By tapping the microphone button, you’ll be able to choose from five different voices.
OpenAI says the back-and-forth voice conversations are powered by a new text-to-speech model that can generate “human-like audio from just text and a few seconds of sample speech.” It created the five voices with the help of professional actors. Going the other way, the company’s converts a user’s spoken words into text.
The image-based functions are intriguing too. OpenAI says you can, for instance, show the chatbot a photo of your grill and ask why it won’t start, get it to help plan a meal based on a snap of what’s in your fridge or prompt it to solve a math problem you take a picture of. As it happens, Microsoft highlighted the Copilot AI’s in Windows during its Surface event last week.
OpenAI is using GPT-3.5 and GPT-4 to power the image recognition features. To use ChatGPT’s image-based functions, tap the photo button (you’ll need to tap the plus button first on iOS or Android) to take a snap or choose an existing image on your device. You can ask ChatGPT about multiple photos and use a drawing tool to focus on a specific part of the image.
announcing the updates, OpenAI noted the potential for harm. It’s possible for bad actors to mimic the voices of public figures (and everyday folks) and perhaps commit fraud. That’s why OpenAI is focusing on ChatGPT voice conversations with this technology and working with select partners on other limited use cases (more on that in a moment).
As for images, OpenAI worked with , a free app that blind and low-vision people can use to help them better understand their surroundings thanks to volunteers who hop into video calls with them. “Users have told us they find it valuable to have general conversations about images that happen to contain people in the background, like if someone appears on TV while you’re trying to figure out your remote control settings,” OpenAI said. The company noted that it has also limited how ChatGPT can analyze and make direct statements about people that appear in images, “since ChatGPT is not always accurate and these systems should respect individuals’ privacy.” It has on the safety properties of the image-based functionality, which it calls GPT-4 with vision.
ChatGPT is more effective at understanding English text in images than other languages. OpenAI says the chatbot “performs poorly” in other languages for the time being, particularly when it comes to those that use non-Roman scripts. As such, it suggests that non-English users avoid using ChatGPT to deal with text in images for now.
Meanwhile, Spotify has teamed up with OpenAI to use the voice-based technology for an interesting purpose. The former has announced a pilot of a tool called Voice Translation for podcasters. This can translate podcasts into different languages using the voices of the folks who appear on the show. Spotify says the tool can retain the speech characteristics of the original speaker after converting their voice into other languages.
To start with, Spotify is converting select English-based shows into a few languages. Spanish versions of some Armchair Expert and The Diary of a CEO with Steven Bartlett episodes , with French and German variants to follow.
Leave a Reply