Home Community Researchers enhance peripheral vision in AI models

Researchers enhance peripheral vision in AI models

0
Researchers enhance peripheral vision in AI models

Peripheral vision enables humans to see shapes that aren’t directly in our line of sight, albeit with less detail. This ability expands our visual field and will be helpful in lots of situations, equivalent to detecting a vehicle approaching our automotive from the side.

Unlike humans, AI doesn’t have peripheral vision. Equipping computer vision models with this ability could help them detect approaching hazards more effectively or predict whether a human driver would notice an oncoming object.

Taking a step on this direction, MIT researchers developed a picture dataset that enables them to simulate peripheral vision in machine learning models. They found that training models with this dataset improved the models’ ability to detect objects within the visual periphery, although the models still performed worse than humans.

Their results also revealed that, unlike with humans, neither the scale of objects nor the quantity of visual clutter in a scene had a powerful impact on the AI’s performance.

“There’s something fundamental happening here. We tested so many alternative models, and even after we train them, they get a bit bit higher but they will not be quite like humans. So, the query is: What’s missing in these models?” says Vasha DuTell, a postdoc and co-author of a paper detailing this study.

Answering that query may help researchers construct machine learning models that may see the world more like humans do. Along with improving driver safety, such models could possibly be used to develop displays which can be easier for people to view.

Plus, a deeper understanding of peripheral vision in AI models could help researchers higher predict human behavior, adds lead creator Anne Harrington MEng ’23.

“Modeling peripheral vision, if we will really capture the essence of what’s represented within the periphery, may also help us understand the features in a visible scene that make our eyes move to gather more information,” she explains.

Their co-authors include Mark Hamilton, an electrical engineering and computer science graduate student; Ayush Tewari, a postdoc; Simon Stent, research manager on the Toyota Research Institute; and senior authors William T. Freeman, the Thomas and Gerd Perkins Professor of Electrical Engineering and Computer Science and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and Ruth Rosenholtz, principal research scientist within the Department of Brain and Cognitive Sciences and a member of CSAIL. The research will likely be presented on the International Conference on Learning Representations.

“Any time you’ve gotten a human interacting with a machine — a automotive, a robot, a user interface — it’s hugely essential to know what the person can see. Peripheral vision plays a critical role in that understanding,” Rosenholtz says.

Simulating peripheral vision

Extend your arm in front of you and put your thumb up — the small area around your thumbnail is seen by your fovea, the small depression in the midst of your retina that gives the sharpest vision. Every part else you possibly can see is in your visual periphery. Your visual cortex represents a scene with less detail and reliability because it moves farther from that sharp point of focus.

Many existing approaches to model peripheral vision in AI represent this deteriorating detail by blurring the perimeters of images, but the data loss that happens within the optic nerve and visual cortex is way more complex.

For a more accurate approach, the MIT researchers began with a way used to model peripheral vision in humans. Often called the feel tiling model, this method transforms images to represent a human’s visual information loss.  

They modified this model so it could transform images similarly, but in a more flexible way that doesn’t require knowing prematurely where the person or AI will point their eyes.

“That permit us faithfully model peripheral vision the identical way it’s being done in human vision research,” says Harrington.

The researchers used this modified technique to generate an enormous dataset of transformed images that appear more textural in certain areas, to represent the lack of detail that happens when a human looks further into the periphery.

Then they used the dataset to coach several computer vision models and compared their performance with that of humans on an object detection task.

“We needed to be very clever in how we arrange the experiment so we could also test it within the machine learning models. We didn’t wish to must retrain the models on a toy task that they weren’t meant to be doing,” she says.

Peculiar performance

Humans and models were shown pairs of transformed images which were similar, except that one image had a goal object situated within the periphery. Then, each participant was asked to select the image with the goal object.

“One thing that actually surprised us was how good people were at detecting objects of their periphery. We went through no less than 10 different sets of images that were just too easy. We kept needing to make use of smaller and smaller objects,” Harrington adds.

The researchers found that training models from scratch with their dataset led to the best performance boosts, improving their ability to detect and recognize objects. High quality-tuning a model with their dataset, a process that involves tweaking a pretrained model so it may possibly perform a brand new task, resulted in smaller performance gains.

But in every case, the machines weren’t pretty much as good as humans, they usually were especially bad at detecting objects within the far periphery. Their performance also didn’t follow the identical patterns as humans.

“That may suggest that the models aren’t using context in the identical way as humans are to do these detection tasks. The strategy of the models is perhaps different,” Harrington says.

The researchers plan to proceed exploring these differences, with a goal of finding a model that may predict human performance within the visual periphery. This might enable AI systems that alert drivers to hazards they may not see, for example. Additionally they hope to encourage other researchers to conduct additional computer vision studies with their publicly available dataset.

“This work is vital since it contributes to our understanding that human vision within the periphery shouldn’t be considered just impoverished vision as a result of limits within the variety of photoreceptors we have now, but moderately, a representation that’s optimized for us to perform tasks of real-world consequence,” says Justin Gardner, an associate professor within the Department of Psychology at Stanford University who was not involved with this work. “Furthermore, the work shows that neural network models, despite their advancement lately, are unable to match human performance on this regard, which should result in more AI research to learn from the neuroscience of human vision. This future research will likely be aided significantly by the database of images provided by the authors to mimic peripheral human vision.”

This work is supported, partly, by the Toyota Research Institute and the MIT CSAIL METEOR Fellowship.

LEAVE A REPLY

Please enter your comment!
Please enter your name here