Home Artificial Intelligence Balancing Act: Addressing Popularity Bias in Suggestion Systems Why is it crucial to measure popularity bias in advice systems ? The best way to measure popularity bias ?

Balancing Act: Addressing Popularity Bias in Suggestion Systems Why is it crucial to measure popularity bias in advice systems ? The best way to measure popularity bias ?

0
Balancing Act: Addressing Popularity Bias in Suggestion Systems
Why is it crucial to measure popularity bias in advice systems ?
The best way to measure popularity bias ?

Towards Data Science
Photo by Melanie Pongratz on Unsplash

You woke up one morning and decided to treat yourself by buying a brand new pair of shoes. You went in your favorite sneaker website and browsed the recommendations given to you. One pair particularly caught your eye — you really liked the style and design. To procure them without hesitation, excited to wear your recent kicks.

When the shoes arrived, you couldn’t wait to point out them off. You made the choice to interrupt them in at an upcoming concert you were going to. Nevertheless, whenever you got to the venue you noticed no less than 10 other people wearing the very same shoes! What were the percentages?

Suddenly you felt disenchanted. Despite the fact that you initially loved the shoes, seeing so many others with the identical pair made you’re feeling like your purchase wasn’t so special in any case. The shoes you thought would make you stand out ended up making you mix in.

In that moment you vowed to never buy from that sneaker website again. Despite the fact that their advice algorithm suggested an item you liked, it ultimately didn’t bring you the satisfaction and uniqueness you desired. So when you initially appreciated the really helpful item, the general experience left you sad.

This highlights how advice systems have limitations — suggesting a “good” product doesn’t guarantee it’s going to result in a positive and fulfilling experience for the shopper. So was it a superb advice in any case ?

Popularity bias occurs when advice systems suggest a number of items items which can be globally popular slightly than personalized picks. This happens since the algorithms are sometimes trained to maximise engagement by recommending content that’s liked by many users.

While popular items can still be relevant, relying too heavily on popularity results in a scarcity of personalization. The recommendations grow to be generic and fail to account for individual interests. Many advice algorithms are optimized using metrics that reward overall popularity. This systematic bias towards what’s already well-liked may be problematic over time. It results in excessive promotion of things which can be trending or viral slightly than unique suggestions. On the business side, popularity bias may also result in a situation where an organization has an enormous inventory of area of interest, lesser-known items that go undiscovered by users, making them difficult to sell.

Personalized recommendations that take a selected user’s preferences under consideration can bring tremendous value, especially for area of interest interests that differ from the mainstream. They assist users discover recent and unexpected items tailored only for them.

Ideally, a balance must be struck between popularity and personalization in advice systems. The goal must be to surface hidden gems that resonate with each user while also sprinkling in universally appealing content every now and then.

Average Suggestion Popularity

Average Suggestion Popularity (ARP) is a metric used to guage the recognition of really helpful items in an inventory. It calculates the common popularity of the items based on the variety of rankings they’ve received within the training set. Mathematically, ARP is calculated as follows:

Where:

  • |U_t| is the variety of users
  • |L_u| is the variety of items within the really helpful list L_u for user u .
  • ϕ(i) is the variety of times “item i” has been rated within the training set.

In easy terms, ARP measures the common popularity of things within the really helpful lists by summing up the recognition (variety of rankings) of all items in those lists after which averaging this popularity across all users within the test set.

Example: Let’s say we’ve a test set with 100 users |U_t| = 100. For every user, we offer a really helpful list of 10 items |L_u| = 10. If item A has been rated 500 times within the training set (ϕ(A) =. 500), and item B has been rated 300 times (ϕ(B) =. 300), the ARP for these recommendations may be calculated as:

In this instance, the ARP value is 8, indicating that the common popularity of the really helpful items across all users is 8, based on the variety of rankings they received within the training set.

The Average Percentage of Long Tail Items (APLT)

The Average Percentage of Long Tail Items (APLT) metric, calculates the common proportion of long tail items present in really helpful lists. It’s expressed as:

Here:

  • |Ut| represents the entire variety of users.
  • u ∈ Ut signifies each user.
  • Lu represents the really helpful list for user u.
  • Γ represents the set of long tail items.

In simpler terms, APLT quantifies the common percentage of less popular or area of interest items within the recommendations provided to users. The next APLT indicates that recommendations contain a bigger portion of such long tail items.

Example: Let’s say there are 100 users (|Ut| = 100). For every user’s advice list, on average, 20 out of fifty items (|Lu| = 50) belong to the long tail set (Γ). Using the formula, the APLT could be:

APLT = Σ (20 / 50) / 100 = 0.4

So, the APLT on this scenario is 0.4 or 40%, implying that, on average, 40% of things within the really helpful lists are from the long tail set.

The Average Coverage of Long Tail items (ACLT)

The Average Coverage of Long Tail items (ACLT) metric evaluates the proportion of long-tail items which can be included in the general recommendations. Unlike APLT, ACLT considers the coverage of long-tail items across all users and assesses whether these things are effectively represented within the recommendations. It’s defined as:

ACLT = Σ Σ 1(i ∈ Γ) / |Ut| / |Lu|

Here:

  • |Ut| represents the entire variety of users.
  • u ∈ Ut signifies each user.
  • Lu represents the really helpful list for user u.
  • Γ represents the set of long-tail items.
  • 1(i ∈ Γ) is an indicator function equal to 1 if item i is within the long tail set Γ, and 0 otherwise.

In simpler terms, ACLT calculates the common proportion of long-tail items which can be covered within the recommendations for every user.

Example: Let’s say there are 100 users (|Ut| = 100) and a complete of 500 long-tail items (|Γ| = 500). Across all users’ advice lists, there are 150 instances of long-tail items being really helpful (Σ Σ 1(i ∈ Γ) = 150). The overall variety of items across all advice lists is 3000 (Σ |Lu| = 3000). Using the formula, the ACLT could be:

ACLT = 150 / 100 / 3000 = 0.0005

So, the ACLT on this scenario is 0.0005 or 0.05%, indicating that, on average, 0.05% of long-tail items are covered in the general recommendations. This metric helps assess the coverage of area of interest items within the recommender system.

The best way to fix reduce popularity bias in a advice system

Popularity Aware Learning

This concept takes inspiration from Position Aware Learning (PAL) where the approach is to rank suggests asking your ML model to optimize each rating relevancy and position impact at the identical time. We are able to use the identical approach with popularity rating, this rating can any of the above mentioned scores like Average Suggestion Popularity.

  • On training time, you utilize item popularity as one in every of the input features
  • Within the prediction stage, you replace it with a relentless value.
Image by Creator

xQUAD Framework

One interesting method to repair popularity bias is to make use of something called at xQUAD Framework. It takes a protracted list of recommendations (R) together with probability/likelihood scores out of your current model, and builds a brand new list (S) which is so much more diverse, where |S| < |R|. The range of this recent list is controlled by a hyper-parameter λ.

I actually have tried to wrap the logic of the framework :

Image by Creator

We calculate a rating for all documents in set R. We take the document with the utmost rating and add it to set S and at the identical time we remove it from set R.

Image by Creator
Image by Creator

To pick out next item so as to add to ‘S’, we compute the shop for every item in RS (R excluding S). For each item chosen for adding to “S”, P(v/u) goes up so the possibility of a non-popular item getting picked up again also goes up.

Should you liked this content, find me on linkedin :).

LEAVE A REPLY

Please enter your comment!
Please enter your name here