Home Artificial Intelligence Convenient Reinforcement Learning With Stable-Baselines3 A Short Recap

Convenient Reinforcement Learning With Stable-Baselines3 A Short Recap

Convenient Reinforcement Learning With Stable-Baselines3
A Short Recap

Reinforcement learning without the boilerplate code

Towards Data Science
Created by the creator with Leonardo Ai.

In my previous articles about reinforcement learning, I even have shown you learn how to implement (deep) Q-learning using nothing but a little bit of numpy and TensorFlow. While this was a crucial step towards understanding how these algorithms work under the hood, the code tended to get lengthy — and I even merely implemented one of the vital basic versions of deep Q-learning.

Given the reasons in this text, understanding the code ought to be quite straightforward. Nonetheless, if we really wish to get things done, we should always depend on well-documented, maintained, and optimized libraries. Just as we don’t wish to implement linear regression over and yet again, we don’t wish to do the identical for reinforcement learning.

In this text, I’ll show you the reinforcement library Stable-Baselines3 which is as easy to make use of as scikit-learn. As a substitute of coaching models to predict labels, though, we get trained agents that may navigate well of their environment.

In case you should not sure what (deep) Q-learning is about, I suggest reading my previous articles. On a high level, we would like to coach an agent that interacts with its environment with the goal of maximizing its total reward. A very powerful a part of reinforcement learning is to search out a very good reward function for the agent.

I normally imagine a personality in a game searching its approach to get the very best rating, e.g., Mario running from start to complete without dying and — in the very best case — as fast as possible.

Image by the creator.

In an effort to accomplish that, in Q-learning, we learn quality values for every pair (s, a) where s is a state and a is an motion the agent can take. Q(s, a) is the…


Please enter your comment!
Please enter your name here