Tags » Reinforcement Learning


Value Function

Optimal Policy

Learning Via Value Function is not possible when we don’t know immediate rewards and next state. If they are known we can use dynamic programming methods described at… 285 more words

Reinforcement Learning

Monte Carlo method in Python

In this post, we will explore our first reinforcement learning methods for estimating value. It’s the first taste of real RL in this series. I bet you’ve heard the term… 1,832 more words

Artificial Intelligence

Some Reinforcement Learning: The Greedy and Explore-Exploit Algorithms for the Multi-Armed Bandit Framework in Python

In this article the multi-armed bandit framework problem and a few algorithms to solve the problem is going to be discussed. This problem appeared as a lab assignment in the edX course DAT257x: Reinforcement Learning Explained by Microsoft. 1,709 more words


Multi-armed Bandit problem

I am interested in reinforcement learning.
It is difficult for me. @_@
I tried to implement very simple and famous problem called ‘multi-armed bandit’.
Image from wikipedia.. 522 more words


Let us consider "Brain as a service" again now!

Two years ago, I wrote my article about Computer Go player “AlphaGo” and talk about “Brain as a service” in future. Because AlphaGo is so strong and it can improve itself by… 708 more words

Artificial Intelligence

[Video] An introduction to reinforcement learning

Part Of: Reinforcement Learning sequence

Sorry it’s been so long since my last post!  I’ve been teaching a Deep Learning class, based on Andrew Ng’s… 55 more words

Markov Decision Process

Playing Space Invaders using DQN, Julia and MXNet

Few updates before I move on to Space Invaders. I have updated my GitHub repo and updated DQN to support multiple layers and also managed to fix some bugs. 172 more words