Home Learn The complex math of counterfactuals could help Spotify pick your next favorite song

The complex math of counterfactuals could help Spotify pick your next favorite song

The complex math of counterfactuals could help Spotify pick your next favorite song

A brand new type of machine-learning model built by a team of researchers on the music-streaming firm Spotify captures for the primary time the complex math behind counterfactual evaluation, a precise technique that may be used to discover the causes of past events and predict the results of future ones.

The model, described earlier this yr within the scientific journal Nature Machine Intelligence, could improve the accuracy of automated decision making, especially personalized recommendations, in a spread of applications from finance to health care. 

The essential idea behind counterfactuals is to ask what would have happened in a situation had certain things been different. It’s like rewinding the world, changing a couple of crucial details, after which hitting play to see what happens. By tweaking the appropriate things, it’s possible to separate true causation from correlation and coincidence.

“Understanding cause and effect is super vital for decision making,” says Ciaran Gilligan-Lee, leader of the Causal Inference Research Lab at Spotify, who co-developed the model. “You ought to understand what impact a alternative you are taking now could have on the long run.”

In Spotify’s case, which may mean selecting what songs to indicate you or when artists should drop a brand new album. Spotify isn’t yet using counterfactuals, says Gilligan-Lee. “But they might help answer questions that we cope with on daily basis.”

Counterfactuals are intuitive. People often make sense of the world by imagining how things would have played out if had happened as an alternative of . But they’re monstrous put into math.

“Counterfactuals are very strange-looking statistical objects,” says Gilligan-Lee. “They’re weird things to contemplate. You’re asking the likelihood of something occurring provided that it didn’t occur.”

Gilligan-Lee and his coauthors began working together after reading about one another’s work in a MIT Technology Review story. They based their model on a theoretical framework for counterfactuals called twin networks.

Twin networks were invented within the Nineties by the pc scientists Andrew Balke and Judea Pearl. In 2011, Pearl won the Turing Award—computer science’s Nobel Prize—for his work on causal reasoning and artificial intelligence.

Pearl and Balke used twin networks to work through a handful of easy examples, says Gilligan-Lee. But applying the mathematical framework to larger and more complicated real-world cases by hand is tough.

That’s where machine learning is available in. Twin networks treat counterfactuals as a pair of probabilistic models: one representing the actual world, the opposite representing the fictional one. The models are linked in such a way that the model of the particular world constrains the model of the fictional one, keeping it the identical in every way aside from the facts you desire to change.  

Gilligan-Lee and his colleagues used the framework of dual networks as a blueprint for a neural network after which trained it to make predictions about how events would play out within the fictional world. The result’s a general-purpose computer program for doing counterfactual reasoning. “It allows you to answer any counterfactual query a couple of scenario that you simply want,” says Gilligan-Lee.

Dirty water

The Spotify team tested their model using several real-world case studies, including one taking a look at credit approval in Germany, one taking a look at a global clinical trial for stroke medication, and one other taking a look at the security of the water supply in Kenya.

In 2020 researchers investigated whether installing pipes and concrete containers to guard springs from bacterial contamination in a region of Kenya would scale back levels of childhood diarrhea. They found a positive effect. But you could make certain what caused it, says Gilligan-Lee. Before installing concrete partitions around wells across the country, you could make sure that the drop in sickness was in truth brought on by that intervention and never a side effect of it.

It’s possible that when researchers got here in to do the study and install concrete partitions across the wells, it made people more aware of the risks of contaminated water and so they began boiling it at home. In that case, “education can be a less expensive method to scale up the intervention,” says Gilligan-Lee.

Gilligan-Lee and his colleagues ran this scenario through their model, asking whether children who got sick after drinking from an unprotected well within the actual world also got sick after drinking from a protected well within the fictional world. They found that changing just the detail of where the kid drank and maintaining other conditions, equivalent to how the water was treated at home, didn’t have a major impact on the consequence, suggesting that the reduced levels of childhood diarrhea weren’t (directly) brought on by installing pipes and concrete containers.  

This replicates the results of the 2020 study, which also used counterfactual reasoning. But those researchers built a bespoke statistical model by hand simply to ask that one query, says Gilligan-Lee. In contrast, the Spotify team’s machine-learning model is general purpose and may be used to ask multiple counterfactual questions on many alternative scenarios.

Spotify just isn’t the one tech company racing to construct machine-learning models that may reason about cause and effect. In the previous few years, firms equivalent to Meta, Amazon, LinkedIn, and TikTok’s owner ByteDance have also begun to develop the technology.

“Causal reasoning is critical for machine learning,” says Nailong Zhang, a software engineer at Meta. Meta is using causal inference in a machine-learning model that manages what number of and what sorts of notifications Instagram should send its users to maintain them coming back. 

Romila Pradhan, a knowledge scientist at Purdue University in Indiana, is using counterfactuals to make automated decision making more transparent. Organizations now use machine-learning models to decide on who gets credit, jobs, parole, even housing (and who doesn’t). Regulators have began to require organizations to elucidate the consequence of lots of these decisions to those affected by them. But reconstructing the steps made by a fancy algorithm is tough. 

Pradhan thinks counterfactuals can assist. Let’s say a bank’s machine-learning model rejects your loan application and you desire to know why. One method to answer that query is with counterfactuals. Provided that the appliance was rejected within the actual world, would it not have been rejected in a fictional world through which your credit history was different? What about should you had a special zip code, job, income, and so forth? Constructing the power to reply such questions into future loan approval programs, Pradhan says, would give banks a method to offer customers reasons quite than simply a yes or no.    

Counterfactuals are vital since it’s how people take into consideration different outcomes, says Pradhan: “They’re an excellent method to capture explanations.”

They can even help firms predict people’s behavior. Because counterfactuals make it possible to infer what might occur in a selected situation, not only on average, tech platforms can use it to pigeonhole individuals with more precision than ever. 

The identical logic that may disentangle the results of dirty water or lending decisions may be used to hone the impact of Spotify playlists, Instagram notifications, and ad targeting. If we play this song, will that user listen for longer? If we show this picture, will that person keep scrolling? “Corporations want to grasp easy methods to give recommendations to specific users quite than the typical user,” says Gilligan-Lee.


Please enter your comment!
Please enter your name here