In last week’s demo, Raul Puri, a scientist who works on GPT-4, gave me a quick tour of the image recognition feature. He uploaded a photo of a kid’s math homework, circled a Sudoku-like puzzle on the screen, and asked ChatGPT how you were meant to solve it. ChatGPT replied with the correct steps.
Puri says he has also used the feature to help him fix his fiancée’s computer by uploading screenshots of error messages and asking ChatGPT what he should do. “This was a very painful experience that it helped me get through,” he says.
ChatGPT’s image recognition ability has already been trialed by a company called Be My Eyes, which makes an app for people with impaired vision. Users can upload a photo of what’s in front of them and ask human volunteers to tell them what it is. In a partnership with OpenAI, Be My Eyes gives its users the option of asking a chatbot instead.
“Sometimes my kitchen is a little messy, or it’s just very early Monday morning and I don’t want to talk to a human being,” Be My Eyes founder Hans Jørgen Wiberg, who uses the app himself, told me when I interviewed him at EmTech Digital in May. “Now you can ask the photo questions.”
OpenAI is aware of the risk of releasing these updates to the public. Combining models brings whole new levels of complexity, says Puri. He says his team has spent months brainstorming possible misuses. You cannot ask questions about photos of private individuals, for example.
Jang gives another example: “Right now if you ask ChatGPT to make a bomb it will refuse,” she says. “But instead of saying, ‘Hey, tell me how to make a bomb,’ what if you showed it an image of a bomb and said, ‘Can you tell me how to make this?’”
“You have all the problems with computer vision; you have all the problems of large language models. Voice fraud is a big problem,” says Puri. “You have to consider not just our users, but also the people that aren’t using the product.”