“A Simple Alpha(Go) Zero Tutorial”

Bonobo · #1

http://web.stanford.edu/~surag/posts/alphazero.html

Quote:

This tutorial walks through a synchronous single-thread single-GPU (read malnourished) game-agnostic implementation of the recent AlphaGo Zero paper by DeepMind. It's a beautiful piece of work that trains an agent for the game of Go through pure self-play without any human knowledge except the rules of the game. The methods are fairly simple compared to previous papers by DeepMind, and AlphaGo Zero ends up beating AlphaGo (trained using data from expert games and beat the best human Go players) convincingly. Recently, DeepMind published a preprint of Alpha Zero on arXiv that extends AlphaGo Zero methods to Chess and Shogi.

The aim of this post is to distil out the key ideas from the AlphaGo Zero paper and understand them concretely through code. It assumes basic familiarity with machine learning and reinforcement learning concepts, and should be accessible if you understand neural network basics and Monte Carlo Tree Search. Before starting out (or after finishing this tutorial), I would recommend reading the original paper. It's well-written, very readable and has beautiful illustrations! AlphaGo Zero is trained by self-play reinforcement learning. It combines a neural network and Monte Carlo Tree Search in an elegant policy iteration framework to achieve stable learning. But that's just words- let's dive into the details straightaway.

[..]

(via Online Go on G+)

(Disclaimer: I understand nothing of this, just passing it on)

Sneegurd · #2

Thanks for posting. Will be some work, but I see a good chance in understanding it. I'm a programmer 50% of my job and know some university information science grade maths..., but have no clue about machine learning at all and this is very welcome.

Charlie · #3

Sneegurd wrote:

... but have no clue about machine learning at all and this is very welcome.

David Silver's Reinforcement Learning course is free, on YouTube. Before diving in, there, I can recommend Andrew Ng's courses on Neural Networks (and, lately, his Coursera stuff covering CNNs) if you're looking for a way in to the "learning" field. For a free option, there's A. Karpathy's recorded Stanford lecture series (Google: "CS231n Winter 2016") knocking about in various places but the lectures were taken off YouTube for dubious reasons so you'll need to dig, a bit.

“A Simple Alpha(Go) Zero Tutorial”

Who is online