Home Artificial Intelligence Ensuring AI works with the appropriate dose of curiosity

Ensuring AI works with the appropriate dose of curiosity

0
Ensuring AI works with the appropriate dose of curiosity

It’s a dilemma as old as time. Friday night has rolled around, and also you’re trying to select a restaurant for dinner. Do you have to visit your most beloved watering hole or try a brand new establishment, within the hopes of discovering something superior? Potentially, but that curiosity comes with a risk: Should you explore the brand new option, the food might be worse. On the flip side, should you persist with what you understand works well, you will not grow out of your narrow pathway. 

Curiosity drives artificial intelligence to explore the world, now in boundless use cases — autonomous navigation, robotic decision-making, optimizing health outcomes, and more. Machines, in some cases, use “reinforcement learning” to perform a goal, where an AI agent iteratively learns from being rewarded for good behavior and punished for bad. Identical to the dilemma faced by humans in choosing a restaurant, these agents also struggle with balancing the time spent discovering higher actions (exploration) and the time spent taking actions that led to high rewards prior to now (exploitation). An excessive amount of curiosity can distract the agent from making good decisions, while too little means the agent won’t ever discover good decisions.

Within the pursuit of constructing AI agents with just the appropriate dose of curiosity, researchers from MIT’s Improbable AI Laboratory and Computer Science and Artificial Intelligence Laboratory (CSAIL) created an algorithm that overcomes the issue of AI being too “curious” and getting distracted by a given task. Their algorithm mechanically increases curiosity when it’s needed, and suppresses it if the agent gets enough supervision from the environment to know what to do.

When tested on over 60 video games, the algorithm was capable of succeed at each hard and simple exploration tasks, where previous algorithms have only been capable of tackle only a tough or easy domain alone. With this method, AI agents use fewer data for learning decision-making rules that maximize incentives.  

“Should you master the exploration-exploitation trade-off well, you’ll be able to learn the appropriate decision-making rules faster — and anything less would require a lot of data, which could mean suboptimal medical treatments, lesser profits for web sites, and robots that do not learn to do the appropriate thing,” says Pulkit Agrawal, an assistant professor of electrical engineering and computer science (EECS) at MIT, director of the Improbable AI Lab, and CSAIL affiliate who supervised the research. “Imagine an internet site attempting to work out the design or layout of its content that can maximize sales. If one doesn’t perform exploration-exploitation well, converging to the appropriate web site design or the appropriate website layout will take a protracted time, which implies profit loss. Or in a health care setting, like with Covid-19, there could also be a sequence of selections that should be made to treat a patient, and if you need to use decision-making algorithms, they should learn quickly and efficiently — you do not need a suboptimal solution when treating numerous patients. We hope that this work will apply to real-world problems of that nature.” 

It’s hard to encompass the nuances of curiosity’s psychological underpinnings; the underlying neural correlates of challenge-seeking behavior are a poorly understood phenomenon. Attempts to categorize the behavior have spanned studies that dived deeply into studying our impulses, deprivation sensitivities, and social and stress tolerances. 

With reinforcement learning, this process is “pruned” emotionally and stripped all the way down to the bare bones, but it surely’s complicated on the technical side. Essentially, the agent should only be curious when there’s not enough supervision available to check out various things, and if there’s supervision, it must adjust curiosity and lower it. 

Since a big subset of gaming is little agents running around fantastical environments on the lookout for rewards and performing a protracted sequence of actions to realize some goal, it gave the look of the logical test bed for the researchers’ algorithm. In experiments, researchers divided games like “Mario Kart” and “Montezuma’s Revenge” into two different buckets: one where supervision was sparse, meaning the agent had less guidance, which were considered “hard” exploration games, and a second where supervision was more dense, or the “easy” exploration games. 

Suppose in “Mario Kart,” for instance, you simply remove all rewards so that you don’t know when an enemy eliminates you. You’re not given any reward whenever you collect a coin or hop over pipes. The agent is barely told ultimately how well it did. This might be a case of sparse supervision. Algorithms that incentivize curiosity do rather well on this scenario. 

But now, suppose the agent is provided dense supervision — a reward for jumping over pipes, collecting coins, and eliminating enemies. Here, an algorithm without curiosity performs rather well since it gets rewarded often. But should you as an alternative take the algorithm that also uses curiosity, it learns slowly. It is because the curious agent might try and run fast in other ways, dance around, go to each a part of the sport screen — things which can be interesting, but don’t help the agent succeed at the sport. The team’s algorithm, nevertheless, consistently performed well, regardless of what environment it was in. 

Future work might involve circling back to the exploration that’s delighted and plagued psychologists for years: an appropriate metric for curiosity — nobody really knows the appropriate strategy to mathematically define curiosity. 

“Getting consistent good performance on a novel problem is amazingly difficult — so by improving exploration algorithms, we will save your effort on tuning an algorithm to your problems of interest, says Zhang-Wei Hong, an EECS PhD student, CSAIL affiliate, and co-lead creator together with Eric Chen ’20, MEng ’21 on a brand new paper concerning the work. “We’d like curiosity to unravel extremely difficult problems, but on some problems it may well hurt performance. We propose an algorithm that removes the burden of tuning the balance of exploration and exploitation. Previously what took, for example, every week to successfully solve the issue, with this recent algorithm, we will get satisfactory leads to a number of hours.”

“Considered one of the best challenges for current AI and cognitive science is find out how to balance exploration and exploitation — the seek for information versus the seek for reward. Children do that seamlessly, but it surely is difficult computationally,” notes Alison Gopnik, professor of psychology and affiliate professor of philosophy on the University of California at Berkeley, who was not involved with the project. “This paper uses impressive recent techniques to perform this mechanically, designing an agent that may systematically balance curiosity concerning the world and the will for reward, [thus taking] one other step towards making AI agents (almost) as smart as children.”

“Intrinsic rewards like curiosity are fundamental to guiding agents to find useful diverse behaviors, but this shouldn’t come at the price of doing well on the given task. That is a very important problem in AI, and the paper provides a strategy to balance that trade-off,” adds Deepak Pathak, an assistant professor at Carnegie Mellon University, who was also not involved within the work. “It will be interesting to see how such methods scale beyond games to real-world robotic agents.”

Chen, Hong, and Agrawal wrote the paper alongside Joni Pajarinen, assistant professor at Aalto University and research leader on the Intelligent Autonomous Systems Group at TU Darmstadt. The research was supported, partially, by the MIT-IBM Watson AI Lab, DARPA Machine Common Sense Program, the Army Research Office by the USA Air Force Research Laboratory, and the USA Air Force Artificial Intelligence Accelerator. The paper might be presented at Neural Information and Processing Systems (NeurIPS) 2022.

LEAVE A REPLY

Please enter your comment!
Please enter your name here