It is currently Fri Apr 19, 2024 4:39 pm

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 3 posts ] 
Author Message
Offline
 Post subject: “A Simple Alpha(Go) Zero Tutorial”
Post #1 Posted: Sat Dec 30, 2017 10:02 am 
Oza
User avatar

Posts: 2221
Location: Germany
Liked others: 8262
Was liked: 924
Rank: OGS 9k
OGS: trohde
Universal go server handle: trohde
http://web.stanford.edu/~surag/posts/alphazero.html

Quote:
This tutorial walks through a synchronous single-thread single-GPU (read malnourished) game-agnostic implementation of the recent AlphaGo Zero paper by DeepMind. It's a beautiful piece of work that trains an agent for the game of Go through pure self-play without any human knowledge except the rules of the game. The methods are fairly simple compared to previous papers by DeepMind, and AlphaGo Zero ends up beating AlphaGo (trained using data from expert games and beat the best human Go players) convincingly. Recently, DeepMind published a preprint of Alpha Zero on arXiv that extends AlphaGo Zero methods to Chess and Shogi.

The aim of this post is to distil out the key ideas from the AlphaGo Zero paper and understand them concretely through code. It assumes basic familiarity with machine learning and reinforcement learning concepts, and should be accessible if you understand neural network basics and Monte Carlo Tree Search. Before starting out (or after finishing this tutorial), I would recommend reading the original paper. It's well-written, very readable and has beautiful illustrations! AlphaGo Zero is trained by self-play reinforcement learning. It combines a neural network and Monte Carlo Tree Search in an elegant policy iteration framework to achieve stable learning. But that's just words- let's dive into the details straightaway.

[..]


(via Online Go on G+)

(Disclaimer: I understand nothing of this, just passing it on)

_________________
“The only difference between me and a madman is that I’m not mad.” — Salvador Dali ★ Play a slooooow correspondence game with me on OGS? :)


This post by Bonobo was liked by 3 people: Charlie, EdLee, emeraldemon
Top
 Profile  
 
Offline
 Post subject: Re: “A Simple Alpha(Go) Zero Tutorial”
Post #2 Posted: Wed Jan 10, 2018 9:45 am 
Lives with ko

Posts: 129
Liked others: 20
Was liked: 17
Thanks for posting. Will be some work, but I see a good chance in understanding it. I'm a programmer 50% of my job and know some university information science grade maths..., but have no clue about machine learning at all and this is very welcome.

Top
 Profile  
 
Offline
 Post subject: Re: “A Simple Alpha(Go) Zero Tutorial”
Post #3 Posted: Thu Jan 11, 2018 7:16 am 
Lives in gote
User avatar

Posts: 310
Location: Deutschland
Liked others: 272
Was liked: 126
Rank: EGF 4 kyu
Sneegurd wrote:
... but have no clue about machine learning at all and this is very welcome.


David Silver's Reinforcement Learning course is free, on YouTube. Before diving in, there, I can recommend Andrew Ng's courses on Neural Networks (and, lately, his Coursera stuff covering CNNs) if you're looking for a way in to the "learning" field. For a free option, there's A. Karpathy's recorded Stanford lecture series (Google: "CS231n Winter 2016") knocking about in various places but the lectures were taken off YouTube for dubious reasons so you'll need to dig, a bit.

I watched an interview in which Karpathy said he released the videos on the Internet because he wanted to "hand out spanners" to anybody willing to learn. I guess he handed out a lot of spanners before the lawyers stepped in.


This post by Charlie was liked by 3 people: Bonobo, ez4u, Sneegurd
Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group