AlphaGo Zero: Learning from scratch

Uberdude · Post by **Uberdude** » Wed Oct 18, 2017 10:12 am

https://deepmind.com/blog/alphago-zero- ... g-scratch/

Holy crap!!

jeromie · Post by **jeromie** » Wed Oct 18, 2017 10:19 am

I got to this point and felt surprised...

AlphaGo Zero skips this step and learns to play simply by playing games against itself, starting from completely random play. In doing so, it quickly surpassed human level of play and defeated the previously published champion-defeating version of AlphaGo by 100 games to 0.

and then I kept going and got to this point:

It also differs from previous versions in other notable ways.

AlphaGo Zero only uses the black and white stones from the Go board as its input, whereas previous versions of AlphaGo included a small number of hand-engineered features.

It uses one neural network rather than two. Earlier versions of AlphaGo used a “policy network” to select the next move to play and a ”value network” to predict the winner of the game from each position. These are combined in AlphaGo Zero, allowing it to be trained and evaluated more efficiently.

AlphaGo Zero does not use “rollouts” - fast, random games used by other Go programs to predict which player will win from the current board position. Instead, it relies on its high quality neural networks to evaluate positions.

Wow, this is so cool! I desperately hope they release self-play games from this version.

jeromie · Post by **jeromie** » Wed Oct 18, 2017 10:24 am

Sorry for double posting, but I just glanced at the freely available version of the paper (don't have time for more right now).

A couple interesting points:

AlphaGo learned and used several common joseki sequences during the course of learning. They show when in the training process it learned each one and which it preferred at various stages.

The full online version ~~(only available with a Nature subscription)~~ (edit: as Uberdude pointed out, this was wrong) includes the first 100 moves of several games at various points in the learning process.

Uberdude · Post by **Uberdude** » Wed Oct 18, 2017 10:32 am

Here is the game they show after 70 hours:

Uberdude · Post by **Uberdude** » Wed Oct 18, 2017 10:35 am

jeromie wrote: The full online version (only available with a Nature subscription) includes the first 100 moves of several games at various points in the learning process.

https://deepmind.com/documents/119/agz_ ... nature.pdf has many game diagrams in the appendix

EDIT: and Andrew Jackson just posted a zip of the sgfs on reddit:
https://www.nature.com/nature/journal/v ... 270-s2.zip

An example AG Zero beating AG Master:

Kirby · Post by **Kirby** » Wed Oct 18, 2017 8:00 pm

Fun to watch the progression among the self-play games.

One of the earlier games:

And after a bit of learning from self-play:

Gomoto · Post by **Gomoto** » Wed Oct 18, 2017 9:20 pm

Where can I buy shares?

Thank me later!

(I wont buy myself)

alphaville · Post by **alphaville** » Wed Oct 18, 2017 10:00 pm

Kirby wrote:Fun to watch the progression among the self-play games.

So the "20 block" self-play games are from various stages of training, while the "40 block" folder come only from the strongest version?
That is confusing, I wish that they labeled the "in-training" games somehow, to be able to tell the strength.

alphaville · Post by **alphaville** » Wed Oct 18, 2017 10:22 pm

alphaville wrote:
Kirby wrote:Fun to watch the progression among the self-play games.
So the "20 block" self-play games are from various stages of training, while the "40 block" folder come only from the strongest version?
That is confusing, I wish that they labeled the "in-training" games somehow, to be able to tell the strength.

I think I got it now: both groups of self-play games show progression during training, according to Nature.

For the "20 block" folder:
"The 3-day training run was subdivided into 20 periods. The best player from each period (as selected by the evaluator) played a single game against itself, with 2 h time controls"

For the 40-block" folder:
"The 40-day training run was subdivided into 20 periods. The best player from each period (as selected by the evaluator) played a single game against itself, with 2 h time controls."

If the "20 periods" are divided equally by time, then the weakest game in the 40-bucket folder matches random-playing engines, 2nd game matches engines after 2 days of training, etc.

Uberdude · Post by **Uberdude** » Thu Oct 19, 2017 12:17 am

Marcel Grünauer wrote:A patriotic side note - I just learned that Julian Schrittwieser, one of the main authors of that paper, is from Austria and studied at the Technical University of Vienna. He has worked for Google since 2012 and switched to DeepMind when he heard Demis Hassabis talk about AlphaGo. His background is, naturally, in machine learning.

Yes, AlphaGo is an international effort and shows the remarkable success that comes from assembling the best talents from around the world. I really wonder if it would still be possible post-Brexit. Maybe so as Google is a big rich name with admin staff to help sponsor through our kafkaesque visa process, but maybe not...

RobertJasiek · Post by **RobertJasiek** » Thu Oct 19, 2017 12:30 am

Uberdude wrote:the best talents

It is not like all best talents would be in one place. Rather call it a "selection of some of the allegedly best talents"

***

Elsewhere, I read about a plan to learn "the" rules on its own. a) If this relies on an input of existing played games, it is possible. b) If there is nothing but the playing material, there cannot be a learning of _the_ rules - there can only be a learning of possible rulesets of possible games that might be played with the playing material.

***

Learning from scratch so fast and successfully is exceptionally impressive. However, it is still possible that AlphaGo Zero fails at expert positions that rarely occur in practical play. IOW, neural nets can err. Self-driving cars can kill. Self-replicating or war-fighting AI-(nano)-robots might cause extinction of mankind. We must never forget this, regardless however impressive an AI might seem.

Uberdude · Post by **Uberdude** » Thu Oct 19, 2017 4:36 am

There was some welcome clarification about the different versions of AlphaGo, from the Methods section; though no mention of Ke Jie version (this paper was submitted before that match, sitting on a big secret!) but I think that and the 55 self-play released after were basically the same as Master with just minor incremental improvements (I think not exactly the same: 55 self-play version does seem to like early 3-3 more than online 60 Master version, and that's so early it can't be explained away as the more chaotic style can with "when it is winning against weak humans it simplifies, but against itself it plays 100%").

AlphaGo versions We compare three distinct versions of AlphaGo:

1. AlphaGo Fan is the previously published program that played against Fan Hui in October
2015. This program was distributed over many machines using 176 GPUs.

2. AlphaGo Lee is the program that defeated Lee Sedol 4–1 in March, 2016. It was previously
unpublished but is similar in most regards to AlphaGo Fan . However, we highlight several
key differences to facilitate a fair comparison. First, the value network was trained from
the outcomes of fast games of self-play by AlphaGo, rather than games of self-play by the
policy network; this procedure was iterated several times – an initial step towards the tabula
rasa algorithm presented in this paper. Second, the policy and value networks were larger
than those described in the original paper – using 12 convolutional layers of 256 planes
respectively – and were trained for more iterations. This player was also distributed over
many machines using 48 TPUs, rather than GPUs, enabling it to evaluate neural networks
faster during search.

3. AlphaGo Master is the program that defeated top human players by 60–0 in January, 2017.
It was previously unpublished but uses the same neural network architecture, reinforcement
learning algorithm, and MCTS algorithm as described in this paper. However, it uses the
same handcrafted features and rollouts as AlphaGo Lee and training was initialised by
supervised learning from human data.

4. AlphaGo Zero is the program described in this paper. It learns from self-play reinforcement
learning, starting from random initial weights, without using rollouts, with no human supervision,
and using only the raw board history as input features. It uses just a single machine
in the Google Cloud with 4 TPUs (AlphaGo Zero could also be distributed but we chose to
use the simplest possible search algorithm).

pookpooi · Post by **pookpooi** » Thu Oct 19, 2017 5:28 am

Aja Huang mentioned that it is the same version but slightly stronger (perhaps due to longer time setting?)

source: http://sports.sina.com.cn/go/2017-05-24 ... 9285.shtml

Also in the DeepMind website there's animated graph that says Master version and version that play 3 match with Ke Jie is the same version
source: https://storage.googleapis.com/deepmind ... 20Time.gif

Uberdude · Post by **Uberdude** » Thu Oct 19, 2017 7:07 am

So what new moves is AlphaGo Zero playing? One very noticeable pattern in the AlphaGo Zero (40 blocks) [strongest version] vs AlphaGo Master 20 games is shown below. This happens after a low plus high double approach against a 4-4. This in itself is remarkable as AlphaGo Master so rarely pincered approaches to its 4-4s that such opportunities rarely arose. However, AG Zero seems to like pincering a lot more now, often the 3-space low, or 2-space high as below. This corner sequence happened in 8 games of the 20, with AG Zero always being the one capturing the inside stones, and it won 7 of the 8 games. So it seems like AG Zero thinks it is even to good for it, and AG Master likewise thinks the sequence from the other side is even to good for it, but probably AG Zero is closer to the truth given the results/strengths. According to waltheri this sequence has never happened before in pro games. My initial feeling was it looked an interesting sacrifice for white compared to the normal entering the corner after attachment (maybe with hane first) and you also get some nice forcing moves on the outside with the cut aji, but set against that black is solid and almost 100% alive which AG tends to place a lot of value on (and white isn't: in some games the white group gets into trouble later; but actually in the one game AG Master won the black group does die!).

Click Here To Show Diagram Code: [go]$$W $$ | . . . , . . . . . , $$ | . . . . . . . . . . $$ | . . . . . . . . . . $$ | . 0 8 6 . . . . . . $$ | . . 7 1 2 . . . . . $$ | . 9 . 4 3 5 . . . . $$ | . . . X . . . . X , $$ | . . . . . O . . . . $$ | . . . . . . . . . . $$ | . . . . . . . . . . $$ +--------------------[/go]

Click Here To Show Diagram Code: [go]$$Wm11 $$ | . . . , . . . . . , $$ | . . . . . . . . . . $$ | . . . . . . . . . . $$ | . X X X . . . . . . $$ | . . O O X . . . . . $$ | . O . X O O . . . . $$ | . 4 2 X . . . . X , $$ | . . 3 1 . O . . . . $$ | . . . . . . . . . . $$ | . . . . . . . . . . $$ +--------------------[/go]

tartaric · Post by **tartaric** » Thu Oct 19, 2017 7:22 am

This version is the one which played Ke Jie so not that strong cause Ke matched it equally during the first game and also Alpha go Master managed to win some games in the 20 games serie which was released.

Life In 19x19

AlphaGo Zero: Learning from scratch

AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch