An introduction to Q-Learning with a practical Python example
- Introduction
- A primer on Reinforcement Learning
2.1 Key concepts
2.2 Q-function
2.3 Q-value
2.4 Q-Learning
2.5 The Bellman equation
2.6 Exploration vs. exploitation
2.7 Q-Table - The Dynamic Pricing problem
3.1 Problem statement
3.2 Implementation - Conclusions
- References
On this post, we introduce the core concepts of Reinforcement Learning and dive into Q-Learning, an approach that empowers intelligent agents to learn optimal policies by making informed decisions based on rewards and experiences.
We also share a practical Python example built from the bottom up. Specifically, we train an agent to master the art of pricing, a vital aspect of business, in order that it could learn methods to maximize profit.
Without further ado, allow us to begin our journey.
2.1 Key concepts
Reinforcement Learning (RL) is an area of Machine Learning where an agent learns to perform a task by trial and error.
Briefly, the agent tries actions that are associated to a positive or negative feedback through a reward mechanism. The agent adjusts its behavior to maximise a reward, thus learning one of the best plan of action to realize the ultimate goal.
Allow us to introduce the important thing concepts of RL through a practical example. Imagine a simplified arcade game, where a cat should navigate a maze to gather treasures — a glass of milk and a ball of yarn — while avoiding construction sites:
- The agent is the one selecting the course of actions. In the instance, the agent is the player who controls the joystick deciding the following move of the cat.
- The environment is the…