Home Artificial Intelligence Study: AI models fail to breed human judgements about rule violations

Study: AI models fail to breed human judgements about rule violations

Study: AI models fail to breed human judgements about rule violations

In an effort to enhance fairness or reduce backlogs, machine-learning models are sometimes designed to mimic human decision making, akin to deciding whether social media posts violate toxic content policies.

But researchers from MIT and elsewhere have found that these models often don’t replicate human decisions about rule violations. If models will not be trained with the best data, they’re more likely to make different, often harsher judgements than humans would.

On this case, the “right” data are those which were labeled by humans who were explicitly asked whether items defy a certain rule. Training involves showing a machine-learning model thousands and thousands of examples of this “normative data” so it may well learn a task.

But data used to coach machine-learning models are typically labeled descriptively — meaning humans are asked to discover factual features, akin to, say, the presence of fried food in a photograph. If “descriptive data” are used to coach models that judge rule violations, akin to whether a meal violates a faculty policy that prohibits fried food, the models are inclined to over-predict rule violations.

This drop in accuracy could have serious implications in the true world. For example, if a descriptive model is used to make decisions about whether a person is more likely to reoffend, the researchers’ findings suggest it might forged stricter judgements than a human would, which may lead to higher bail amounts or longer criminal sentences.

“I feel most artificial intelligence/machine-learning researchers assume that the human judgements in data and labels are biased, but this result’s saying something worse. These models will not be even reproducing already-biased human judgments because the information they’re being trained on has a flaw: Humans would label the features of images and text otherwise in the event that they knew those features could be used for a judgment. This has huge ramifications for machine learning systems in human processes,” says Marzyeh Ghassemi, an assistant professor and head of the Healthy ML Group within the Computer Science and Artificial Intelligence Laboratory (CSAIL).

Ghassemi is senior creator of a brand new paper detailing these findings, which was published today in . Joining her on the paper are lead creator Aparna Balagopalan, an electrical engineering and computer science graduate student; David Madras, a graduate student on the University of Toronto; David H. Yang, a former graduate student who’s now co-founder of ML Estimation; Dylan Hadfield-Menell, an MIT assistant professor; and Gillian K. Hadfield, Schwartz Reisman Chair in Technology and Society and professor of law on the University of Toronto.

Labeling discrepancy

This study grew out of a unique project that explored how a machine-learning model can justify its predictions. As they gathered data for that study, the researchers noticed that humans sometimes give different answers in the event that they are asked to offer descriptive or normative labels concerning the same data.

To collect descriptive labels, researchers ask labelers to discover factual features — does this text contain obscene language? To collect normative labels, researchers give labelers a rule and ask if the information violates that rule — does this text violate the platform’s explicit language policy?

Surprised by this finding, the researchers launched a user study to dig deeper. They gathered 4 datasets to mimic different policies, akin to a dataset of dog images that might be in violation of an apartment’s rule against aggressive breeds. Then they asked groups of participants to offer descriptive or normative labels.

In each case, the descriptive labelers were asked to point whether three factual features were present within the image or text, akin to whether the dog appears aggressive. Their responses were then used to craft judgements. (If a user said a photograph contained an aggressive dog, then the policy was violated.) The labelers didn’t know the pet policy. Then again, normative labelers got the policy prohibiting aggressive dogs, after which asked whether it had been violated by each image, and why.

The researchers found that humans were significantly more more likely to label an object as a violation within the descriptive setting. The disparity, which they computed using absolutely the difference in labels on average, ranged from 8 percent on a dataset of images used to evaluate dress code violations to twenty percent for the dog images.

“While we didn’t explicitly test why this happens, one hypothesis is that perhaps how people take into consideration rule violations is different from how they consider descriptive data. Generally, normative decisions are more lenient,” Balagopalan says.

Yet data are often gathered with descriptive labels to coach a model for a specific machine-learning task. These data are sometimes repurposed later to coach different models that perform normative judgements, like rule violations.

Training troubles

To check the potential impacts of repurposing descriptive data, the researchers trained two models to evaluate rule violations using certainly one of their 4 data settings. They trained one model using descriptive data and the opposite using normative data, after which compared their performance.

They found that if descriptive data are used to coach a model, it’s going to underperform a model trained to perform the identical judgements using normative data. Specifically, the descriptive model is more more likely to misclassify inputs by falsely predicting a rule violation. And the descriptive model’s accuracy was even lower when classifying objects that human labelers disagreed about.

“This shows that the information do really matter. It is necessary to match the training context to the deployment context when you are training models to detect if a rule has been violated,” Balagopalan says.

It may well be very difficult for users to find out how data have been gathered; this information will be buried within the appendix of a research paper or not revealed by a personal company, Ghassemi says.

Improving dataset transparency is a method this problem might be mitigated. If researchers know the way data were gathered, then they know the way those data must be used. One other possible strategy is to fine-tune a descriptively trained model on a small amount of normative data. This concept, often called transfer learning, is something the researchers need to explore in future work.

Additionally they need to conduct the same study with expert labelers, like doctors or lawyers, to see if it results in the identical label disparity.

“The option to fix that is to transparently acknowledge that if we would like to breed human judgment, we must only use data that were collected in that setting. Otherwise, we’re going to find yourself with systems which are going to have extremely harsh moderations, much harsher than what humans would do. Humans would see nuance or make one other distinction, whereas these models don’t,” Ghassemi says.

This research was funded, partly, by the Schwartz Reisman Institute for Technology and Society, Microsoft Research, the Vector Institute, and a Canada Research Council Chain.


Please enter your comment!
Please enter your name here