It’s less than a decade when technology has started to replace human muscle to drive a rapid economic progress, and we are fortunate enough to see an era when human brain will soon be replaced by what we call “Artificial Intelligence”. 1,812 more words

## Tags » Reinforcement Learning

#### Exhaustive search, Monte Carlo, DP, and TD

If we know the full picture and have enough computational power, we can optimize policy by calculating outcome of all possible scenarios (Exhaustive search). Or, we can peek one step ahead for the full horizon (Dynamic Programming). 36 more words

#### Dynamic Programming

DP is optimization method (for policy) for sequential problems. It works well in a situation where the problem can be broken down to subproblems (optimal substructure) that recur repeatedly (overlapping subproblems) and the solutions for subproblems can be cached to be reused (as in value functions) and put together to solve the original problem. 41 more words

#### Prediction vs. Control

There are two different problems that we need to solve: prediction and control. Prediction is to evaluate the future (i.e., “what is going to happen if I keep this policy?”), while control is to optimize the future by improving policy. 51 more words

#### Reinforcement Learning vs. Planning

In reinforcement learning, the environment is initially not known. So the agent needs to learn about the environment on the go by interacting with it. Because the environment is unknown, exploration and exploitation become an important problem in RL. 59 more words

#### The Components of RL Agents

An RL agent may include one or more of the three components: policy, value function, and model. Policy regards to state-action mapping, not value of them. 79 more words

#### Characteristics of Reinforcement Learning

In RL, the goal of the agent is to maximize expected cumulative reward (reward hypothesis). An agent learns from delayed reward, not from the answer key. 156 more words