Home Artificial Intelligence Dynamic Pricing with Contextual Bandits: Learning by Doing From Multi-armed to Contextual Bandits

Dynamic Pricing with Contextual Bandits: Learning by Doing From Multi-armed to Contextual Bandits

0
Dynamic Pricing with Contextual Bandits: Learning by Doing
From Multi-armed to Contextual Bandits

Adding context to your dynamic pricing problem can increase opportunities in addition to challenges

Towards Data Science
Photo by Artem Beliaikin on Unsplash

In my previous article, I conducted a radical evaluation of the most well-liked strategies for tackling the dynamic pricing problem using easy Multi-armed Bandits. Should you’ve come here from that piece, firstly, thanks. It’s not at all a straightforward read, and I actually appreciate your enthusiasm for the topic. Secondly, prepare, as this recent article guarantees to be much more demanding. Nevertheless, if that is your introduction to the subject, I strongly advise starting with the previous article. There, I present foundational concepts, which I’ll assume readers are accustomed to on this discussion.

Anyway, a temporary recap: the prior evaluation aimed to simulate a dynamic pricing scenario. The major goal was to evaluate as quickly as possible various price points to seek out the one yielding the best cumulated reward. We explored 4 distinct algorithms: greedy, ε-greedy, Thompson Sampling, and UCB1, detailing the strengths and weaknesses of every. Although the methodology employed in that article is theoretically sound, it bears oversimplifications that don’t delay in additional complex, real-world situations. Probably the most problematic of those simplifications is the belief that the underlying process is stationary — meaning the optimal price stays constant regardless of the external environment. That is clearly not the case. Consider, for instance, fluctuations in demand during holiday seasons, sudden shifts in competitor pricing, or changes in raw material costs.

To resolve this issue, Contextual Bandits come into play. Contextual Bandits are an extension of the Multi-armed Bandit problem where the decision-making agent not only receives a reward for every motion (or “arm”) but additionally has access to context or environment-related information before selecting an arm. The context might be any piece of data that may influence the consequence, resembling customer demographics or external market conditions.

Here’s how they work: before deciding which arm to drag (or, in our case, which price to set), the agent observes the present…

LEAVE A REPLY

Please enter your comment!
Please enter your name here