Home Community Latest tool helps people select the proper method for evaluating AI models

Latest tool helps people select the proper method for evaluating AI models

0
Latest tool helps people select the proper method for evaluating AI models

When machine-learning models are deployed in real-world situations, perhaps to flag potential disease in X-rays for a radiologist to review, human users have to know when to trust the model’s predictions.

But machine-learning models are so large and sophisticated that even the scientists who design them don’t understand exactly how the models make predictions. So, they create techniques often known as saliency methods that seek to elucidate model behavior.

With latest methods being released on a regular basis, researchers from MIT and IBM Research created a tool to assist users select the perfect saliency method for his or her particular task. They developed saliency cards, which give standardized documentation of how a technique operates, including its strengths and weaknesses and explanations to assist users interpret it accurately.

They hope that, armed with this information, users can deliberately select an appropriate saliency method for each the style of machine-learning model they’re using and the duty that model is performing, explains co-lead creator Angie Boggust, a graduate student in electrical engineering and computer science at MIT and member of the Visualization Group of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).

Interviews with AI researchers and experts from other fields revealed that the cards help people quickly conduct a side-by-side comparison of various methods and pick a task-appropriate technique. Selecting the proper method gives users a more accurate picture of how their model is behaving, so that they are higher equipped to accurately interpret its predictions.

“Saliency cards are designed to provide a fast, glanceable summary of a saliency method and likewise break it down into essentially the most critical, human-centric attributes. They’re really designed for everybody, from machine-learning researchers to put users who try to grasp which method to make use of and select one for the primary time,” says Boggust.

Joining Boggust on the paper are co-lead creator Harini Suresh, an MIT postdoc; Hendrik Strobelt, a senior research scientist at IBM Research; John Guttag, the Dugald C. Jackson Professor of Computer Science and Electrical Engineering at MIT; and senior creator Arvind Satyanarayan, associate professor of computer science at MIT who leads the Visualization Group in CSAIL. The research can be presented on the ACM Conference on Fairness, Accountability, and Transparency.

Picking the proper method

The researchers have previously evaluated saliency methods using the notion of faithfulness. On this context, faithfulness captures how accurately a technique reflects a model’s decision-making process.

But faithfulness shouldn’t be black-and-white, Boggust explains. A way might perform well under one test of faithfulness, but fail one other. With so many saliency methods, and so many possible evaluations, users often choose a technique since it is popular or a colleague has used it.

Nevertheless, picking the “mistaken” method can have serious consequences. As an illustration, one saliency method, often known as integrated gradients, compares the importance of features in a picture to a meaningless baseline. The features with the most important importance over the baseline are most meaningful to the model’s prediction. This method typically uses all 0s because the baseline, but when applied to pictures, all 0s equates to the colour black.

“It’s going to let you know that any black pixels in your image aren’t vital, even in the event that they are, because they’re similar to that meaningless baseline. This could possibly be an enormous deal if you happen to are taking a look at X-rays since black could possibly be meaningful to clinicians,” says Boggust. 

Saliency cards can assist users avoid these kinds of problems by summarizing how a saliency method works when it comes to 10 user-focused attributes. The attributes capture the way in which saliency is calculated, the connection between the saliency method and the model, and the way a user perceives its outputs.

For instance, one attribute is hyperparameter dependence, which measures how sensitive that saliency method is to user-specified parameters. A saliency card for integrated gradients would describe its parameters and the way they affect its performance. With the cardboard, a user could quickly see that the default parameters — a baseline of all 0s — might generate misleading results when evaluating X-rays.

The cards is also useful for scientists by exposing gaps within the research space. As an illustration, the MIT researchers were unable to discover a saliency method that was computationally efficient, but is also applied to any machine-learning model.

“Can we fill that gap? Is there a saliency method that may do each things? Or possibly these two ideas are theoretically in conflict with each other,” Boggust says.

Showing their cards

Once that they had created several cards, the team conducted a user study with eight domain experts, from computer scientists to a radiologist who was unfamiliar with machine learning. During interviews, all participants said the concise descriptions helped them prioritize attributes and compare methods. And although he was unfamiliar with machine learning, the radiologist was capable of understand the cards and use them to participate within the technique of selecting a saliency method, Boggust says.

The interviews also revealed just a few surprises. Researchers often expect that clinicians want a technique that’s sharp, meaning it focuses on a selected object in a medical image. However the clinician on this study actually preferred some noise in medical images to assist them attenuate uncertainty.

“As we broke it down into these different attributes and asked people, not a single person had the identical priorities as anyone else within the study, even after they were in the identical role,” she says.

Moving forward, the researchers wish to explore among the more under-evaluated attributes and maybe design task-specific saliency methods. In addition they wish to develop a greater understanding of how people perceive saliency method outputs, which may lead to raised visualizations. As well as, they’re hosting their work on a public repository so others can provide feedback that may drive future work, Boggust says.

“We’re really hopeful that these can be living documents that grow as latest saliency methods and evaluations are developed. Ultimately, this is actually just the beginning of a bigger conversation around what the attributes of a saliency method are and the way those play into different tasks,” she says.

The research was supported, partly, by the MIT-IBM Watson AI Lab, the U.S. Air Force Research Laboratory, and the U.S. Air Force Artificial Intelligence Accelerator.

LEAVE A REPLY

Please enter your comment!
Please enter your name here