### Value Function

### Optimal Policy

Learning Via Value Function is not possible when we don’t know immediate rewards and next state. If they are known we can use dynamic programming methods described at… 285 more words

Learning Via Value Function is not possible when we don’t know immediate rewards and next state. If they are known we can use dynamic programming methods described at… 285 more words

In this post, we will explore our first reinforcement learning methods for estimating value. It’s the first taste of real RL in this series. I bet you’ve heard the term… 1,832 more words

In this article the multi-armed bandit framework problem and a few algorithms to solve the problem is going to be discussed. This problem appeared as a lab assignment in the edX course DAT257x: Reinforcement Learning Explained by Microsoft. 1,709 more words

I am interested in reinforcement learning.

It is difficult for me. @_@

I tried to implement very simple and famous problem called ‘multi-armed bandit’.

Image from wikipedia.. 522 more words

Two years ago, I wrote my article about Computer Go player “AlphaGo” and talk about “Brain as a service” in future. Because AlphaGo is so strong and it can improve itself by… 708 more words

**Part Of**: Reinforcement Learning sequence

Sorry it’s been so long since my last post! I’ve been teaching a Deep Learning class, based on Andrew Ng’s… 55 more words

*Few updates before I move on to Space Invaders. I have updated my GitHub repo and updated DQN to support multiple layers and also managed to fix some bugs.* 172 more words