Reinforcement Learning Demystified: Markov Decision Processes (Part 1)

Episode 2, demystifying Markov Processes, Markov Reward Processes, Bellman Equation, and Markov Decision Processes.

1 min readApr 11, 2018

In the previous blog post we talked about reinforcement learning and its characteristics. We mentioned the process of the agent observing the environment output consisting of a reward and the next state, and then acting upon that. This whole process is a Markov Decision Process or an MDP for short.

This blog post is a bit mathy. Grab your coffee and a comfortable chair, and just dive in.

MDPs are meant to be a straightforward framing of the problem of learning from interaction to achieve a goal. The agent and the environment interact continually, the agent selecting actions and the environment responding to these actions and presenting new situations to the agent.

Formally, an MDP is used to describe an environment for reinforcement learning, where the environment is fully observable. Almost all RL problems can be formalized as MDPs.

To continue reading this article, just follow this link to my new website “becomesentient.com” where I discuss all AI related topics. Thank you for your consideration.

Reinforcement Learning Demystified: Markov Decision Processes (Part 1)

Episode 2, demystifying Markov Processes, Markov Reward Processes, Bellman Equation, and Markov Decision Processes.

Written by Mohammad Ashraf

Responses (13)