It is currently Thu Mar 28, 2024 1:21 pm

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 39 posts ]  Go to page 1, 2  Next
Author Message
Offline
 Post subject: AlphaGo Zero: Learning from scratch
Post #1 Posted: Wed Oct 18, 2017 10:12 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
https://deepmind.com/blog/alphago-zero- ... g-scratch/

Holy crap!!


This post by Uberdude was liked by 8 people: Bill Spight, Bonobo, ez4u, gamesorry, Gomoto, luigi, Monadology, toannguyenthanh
Top
 Profile  
 
Offline
 Post subject: Re: AlphaGo Zero: Learning from scratch
Post #2 Posted: Wed Oct 18, 2017 10:19 am 
Lives in sente

Posts: 902
Location: Fort Collins, CO
Liked others: 319
Was liked: 287
Rank: AGA 3k
Universal go server handle: jeromie
I got to this point and felt surprised...

Quote:
AlphaGo Zero skips this step and learns to play simply by playing games against itself, starting from completely random play. In doing so, it quickly surpassed human level of play and defeated the previously published champion-defeating version of AlphaGo by 100 games to 0.


and then I kept going and got to this point:


Quote:
It also differs from previous versions in other notable ways.
  • AlphaGo Zero only uses the black and white stones from the Go board as its input, whereas previous versions of AlphaGo included a small number of hand-engineered features.
  • It uses one neural network rather than two. Earlier versions of AlphaGo used a “policy network” to select the next move to play and a ”value network” to predict the winner of the game from each position. These are combined in AlphaGo Zero, allowing it to be trained and evaluated more efficiently.
  • AlphaGo Zero does not use “rollouts” - fast, random games used by other Go programs to predict which player will win from the current board position. Instead, it relies on its high quality neural networks to evaluate positions.


:shock:

Wow, this is so cool! I desperately hope they release self-play games from this version.

Top
 Profile  
 
Offline
 Post subject: Re: AlphaGo Zero: Learning from scratch
Post #3 Posted: Wed Oct 18, 2017 10:24 am 
Lives in sente

Posts: 902
Location: Fort Collins, CO
Liked others: 319
Was liked: 287
Rank: AGA 3k
Universal go server handle: jeromie
Sorry for double posting, but I just glanced at the freely available version of the paper (don't have time for more right now).

A couple interesting points:

AlphaGo learned and used several common joseki sequences during the course of learning. They show when in the training process it learned each one and which it preferred at various stages.

The full online version (only available with a Nature subscription) (edit: as Uberdude pointed out, this was wrong) includes the first 100 moves of several games at various points in the learning process.


Last edited by jeromie on Wed Oct 18, 2017 10:38 am, edited 1 time in total.
Top
 Profile  
 
Offline
 Post subject: Re: AlphaGo Zero: Learning from scratch
Post #4 Posted: Wed Oct 18, 2017 10:32 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
Here is the game they show after 70 hours:



This post by Uberdude was liked by: Bonobo
Top
 Profile  
 
Offline
 Post subject: Re: AlphaGo Zero: Learning from scratch
Post #5 Posted: Wed Oct 18, 2017 10:35 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
jeromie wrote:
The full online version (only available with a Nature subscription) includes the first 100 moves of several games at various points in the learning process.

https://deepmind.com/documents/119/agz_ ... nature.pdf has many game diagrams in the appendix

EDIT: and Andrew Jackson just posted a zip of the sgfs on reddit:
https://www.nature.com/nature/journal/v ... 270-s2.zip

An example AG Zero beating AG Master:


This post by Uberdude was liked by 2 people: Bonobo, ez4u
Top
 Profile  
 
Offline
 Post subject: Re: AlphaGo Zero: Learning from scratch
Post #6 Posted: Wed Oct 18, 2017 8:00 pm 
Honinbo

Posts: 9545
Liked others: 1600
Was liked: 1711
KGS: Kirby
Tygem: 커비라고해
Fun to watch the progression among the self-play games.

One of the earlier games:


And after a bit of learning from self-play:

_________________
be immersed

Top
 Profile  
 
Offline
 Post subject: Re: AlphaGo Zero: Learning from scratch
Post #7 Posted: Wed Oct 18, 2017 9:20 pm 
Gosei

Posts: 1733
Location: Earth
Liked others: 621
Was liked: 310
Where can I buy shares?

Thank me later!

(I wont buy myself)

Top
 Profile  
 
Offline
 Post subject: Re: AlphaGo Zero: Learning from scratch
Post #8 Posted: Wed Oct 18, 2017 10:00 pm 
Dies with sente

Posts: 101
Liked others: 24
Was liked: 16
Kirby wrote:
Fun to watch the progression among the self-play games.


So the "20 block" self-play games are from various stages of training, while the "40 block" folder come only from the strongest version?
That is confusing, I wish that they labeled the "in-training" games somehow, to be able to tell the strength.

Top
 Profile  
 
Offline
 Post subject: Re: AlphaGo Zero: Learning from scratch
Post #9 Posted: Wed Oct 18, 2017 10:22 pm 
Dies with sente

Posts: 101
Liked others: 24
Was liked: 16
alphaville wrote:
Kirby wrote:
Fun to watch the progression among the self-play games.


So the "20 block" self-play games are from various stages of training, while the "40 block" folder come only from the strongest version?
That is confusing, I wish that they labeled the "in-training" games somehow, to be able to tell the strength.


I think I got it now: both groups of self-play games show progression during training, according to Nature.

For the "20 block" folder:
"The 3-day training run was subdivided into 20 periods. The best player from each period (as selected by the evaluator) played a single game against itself, with 2 h time controls"

For the 40-block" folder:
"The 40-day training run was subdivided into 20 periods. The best player from each period (as selected by the evaluator) played a single game against itself, with 2 h time controls."

If the "20 periods" are divided equally by time, then the weakest game in the 40-bucket folder matches random-playing engines, 2nd game matches engines after 2 days of training, etc.


This post by alphaville was liked by: Kirby
Top
 Profile  
 
Offline
 Post subject: Re: AlphaGo Zero: Learning from scratch
Post #10 Posted: Thu Oct 19, 2017 12:17 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
Marcel Grünauer wrote:
A patriotic side note - I just learned that Julian Schrittwieser, one of the main authors of that paper, is from Austria and studied at the Technical University of Vienna. He has worked for Google since 2012 and switched to DeepMind when he heard Demis Hassabis talk about AlphaGo. His background is, naturally, in machine learning.


Yes, AlphaGo is an international effort and shows the remarkable success that comes from assembling the best talents from around the world. I really wonder if it would still be possible post-Brexit. Maybe so as Google is a big rich name with admin staff to help sponsor through our kafkaesque visa process, but maybe not...

Top
 Profile  
 
Offline
 Post subject: Re: AlphaGo Zero: Learning from scratch
Post #11 Posted: Thu Oct 19, 2017 12:30 am 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
Uberdude wrote:
the best talents


It is not like all best talents would be in one place. Rather call it a "selection of some of the allegedly best talents"

***

Elsewhere, I read about a plan to learn "the" rules on its own. a) If this relies on an input of existing played games, it is possible. b) If there is nothing but the playing material, there cannot be a learning of _the_ rules - there can only be a learning of possible rulesets of possible games that might be played with the playing material.

***

Learning from scratch so fast and successfully is exceptionally impressive. However, it is still possible that AlphaGo Zero fails at expert positions that rarely occur in practical play. IOW, neural nets can err. Self-driving cars can kill. Self-replicating or war-fighting AI-(nano)-robots might cause extinction of mankind. We must never forget this, regardless however impressive an AI might seem.

Top
 Profile  
 
Offline
 Post subject: Re: AlphaGo Zero: Learning from scratch
Post #12 Posted: Thu Oct 19, 2017 4:36 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
There was some welcome clarification about the different versions of AlphaGo, from the Methods section; though no mention of Ke Jie version (this paper was submitted before that match, sitting on a big secret!) but I think that and the 55 self-play released after were basically the same as Master with just minor incremental improvements (I think not exactly the same: 55 self-play version does seem to like early 3-3 more than online 60 Master version, and that's so early it can't be explained away as the more chaotic style can with "when it is winning against weak humans it simplifies, but against itself it plays 100%").

Quote:
AlphaGo versions We compare three distinct versions of AlphaGo:

1. AlphaGo Fan is the previously published program that played against Fan Hui in October
2015. This program was distributed over many machines using 176 GPUs.

2. AlphaGo Lee is the program that defeated Lee Sedol 4–1 in March, 2016. It was previously
unpublished but is similar in most regards to AlphaGo Fan . However, we highlight several
key differences to facilitate a fair comparison. First, the value network was trained from
the outcomes of fast games of self-play by AlphaGo, rather than games of self-play by the
policy network; this procedure was iterated several times – an initial step towards the tabula
rasa algorithm presented in this paper. Second, the policy and value networks were larger
than those described in the original paper – using 12 convolutional layers of 256 planes
respectively – and were trained for more iterations. This player was also distributed over
many machines using 48 TPUs, rather than GPUs, enabling it to evaluate neural networks
faster during search.

3. AlphaGo Master is the program that defeated top human players by 60–0 in January, 2017.
It was previously unpublished but uses the same neural network architecture, reinforcement
learning algorithm, and MCTS algorithm as described in this paper. However, it uses the
same handcrafted features and rollouts as AlphaGo Lee and training was initialised by
supervised learning from human data.

4. AlphaGo Zero is the program described in this paper. It learns from self-play reinforcement
learning, starting from random initial weights, without using rollouts, with no human supervision,
and using only the raw board history as input features. It uses just a single machine
in the Google Cloud with 4 TPUs (AlphaGo Zero could also be distributed but we chose to
use the simplest possible search algorithm).

Top
 Profile  
 
Offline
 Post subject: Re: AlphaGo Zero: Learning from scratch
Post #13 Posted: Thu Oct 19, 2017 5:28 am 
Lives in sente

Posts: 727
Liked others: 44
Was liked: 218
GD Posts: 10
Aja Huang mentioned that it is the same version but slightly stronger (perhaps due to longer time setting?)

source: http://sports.sina.com.cn/go/2017-05-24 ... 9285.shtml

Also in the DeepMind website there's animated graph that says Master version and version that play 3 match with Ke Jie is the same version
source: https://storage.googleapis.com/deepmind ... 20Time.gif

Top
 Profile  
 
Offline
 Post subject: Re: AlphaGo Zero: Learning from scratch
Post #14 Posted: Thu Oct 19, 2017 7:07 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
So what new moves is AlphaGo Zero playing? One very noticeable pattern in the AlphaGo Zero (40 blocks) [strongest version] vs AlphaGo Master 20 games is shown below. This happens after a low plus high double approach against a 4-4. This in itself is remarkable as AlphaGo Master so rarely pincered approaches to its 4-4s that such opportunities rarely arose. However, AG Zero seems to like pincering a lot more now, often the 3-space low, or 2-space high as below. This corner sequence happened in 8 games of the 20, with AG Zero always being the one capturing the inside stones, and it won 7 of the 8 games. So it seems like AG Zero thinks it is even to good for it, and AG Master likewise thinks the sequence from the other side is even to good for it, but probably AG Zero is closer to the truth given the results/strengths. According to waltheri this sequence has never happened before in pro games. My initial feeling was it looked an interesting sacrifice for white compared to the normal entering the corner after attachment (maybe with hane first) and you also get some nice forcing moves on the outside with the cut aji, but set against that black is solid and almost 100% alive which AG tends to place a lot of value on (and white isn't: in some games the white group gets into trouble later; but actually in the one game AG Master won the black group does die!).

Click Here To Show Diagram Code
[go]$$W
$$ | . . . , . . . . . ,
$$ | . . . . . . . . . .
$$ | . . . . . . . . . .
$$ | . 0 8 6 . . . . . .
$$ | . . 7 1 2 . . . . .
$$ | . 9 . 4 3 5 . . . .
$$ | . . . X . . . . X ,
$$ | . . . . . O . . . .
$$ | . . . . . . . . . .
$$ | . . . . . . . . . .
$$ +--------------------[/go]


Click Here To Show Diagram Code
[go]$$Wm11
$$ | . . . , . . . . . ,
$$ | . . . . . . . . . .
$$ | . . . . . . . . . .
$$ | . X X X . . . . . .
$$ | . . O O X . . . . .
$$ | . O . X O O . . . .
$$ | . 4 2 X . . . . X ,
$$ | . . 3 1 . O . . . .
$$ | . . . . . . . . . .
$$ | . . . . . . . . . .
$$ +--------------------[/go]

Top
 Profile  
 
Offline
 Post subject: Re: AlphaGo Zero: Learning from scratch
Post #15 Posted: Thu Oct 19, 2017 7:22 am 
Dies in gote

Posts: 24
Liked others: 1
Was liked: 0
KGS: 4 dan
This version is the one which played Ke Jie so not that strong cause Ke matched it equally during the first game and also Alpha go Master managed to win some games in the 20 games serie which was released.

Top
 Profile  
 
Offline
 Post subject: Re: AlphaGo Zero: Learning from scratch
Post #16 Posted: Thu Oct 19, 2017 7:39 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
tartaric wrote:
This version is the one which played Ke Jie so not that strong cause Ke matched it equally during the first game and also Alpha go Master managed to win some games in the 20 games serie which was released.

If by "this" you mean AlphaGo Zero you are wrong, Ke Jie played a slightly stronger version of AlphaGo Master, not AlphaGo Zero (and he got stomped in the 1st game, the 0.5 score was a gift from AlphaGo to reduce the win margin, it was the 2nd game he played better and kept level for some time). But yes AG Master (which is much stronger than top humans) beat it 11 times out of 100, so if you want to call that "not that strong" you could do so, but others might find that an odd choice of words: I might say "not invincible" or "very strong but still beatable".

Top
 Profile  
 
Offline
 Post subject: Re: AlphaGo Zero: Learning from scratch
Post #17 Posted: Thu Oct 19, 2017 7:52 am 
Dies in gote

Posts: 24
Liked others: 1
Was liked: 0
KGS: 4 dan
Uberdude wrote:
tartaric wrote:
This version is the one which played Ke Jie so not that strong cause Ke matched it equally during the first game and also Alpha go Master managed to win some games in the 20 games serie which was released.

If by "this" you mean AlphaGo Zero you are wrong, Ke Jie played a slightly stronger version of AlphaGo Master, not AlphaGo Zero (and he got stomped in the 1st game, the 0.5 score was a gift from AlphaGo to reduce the win margin, it was the 2nd game he played better and kept level for some time). But yes AG Master (which is much stronger than top humans) beat it 11 times out of 100, so if you want to call that "not that strong" you could do so, but others might find that an odd choice of words: I might say "not invincible" or "very strong but still beatable".


Thanks for your message :) It's clearer now. Maybe I am muddling with the Alpha vs Alpha Go series recently released but it was already talked about an Alpha go without human data. I thought it was the one who played Ke Jie because they said it was stronger than Alpha Go Master.

Top
 Profile  
 
Offline
 Post subject: Re: AlphaGo Zero: Learning from scratch
Post #18 Posted: Thu Oct 19, 2017 10:00 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
Uberdude wrote:
AlphaGo Master so rarely pincered approaches to its 4-4s that such opportunities rarely arose. However, AG Zero seems to like pincering a lot more now, often the 3-space low, or 2-space high as below.


Yes. AlphaGo Master pincers at a low rate, compared to humans. Perhaps the more frequent pincering by AlphaGo Zero in these games has to do with the longer time limits. My impression with AlphaGo Master was that with longer time limits it tended to play more like humans. :) OC, there is not enough data to draw a conclusion, and "more like humans" is not well defined. (The frequency of pincers is well defined, OC. :))

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.


Last edited by Bill Spight on Thu Oct 19, 2017 12:41 pm, edited 1 time in total.
Top
 Profile  
 
Offline
 Post subject: Re: AlphaGo Zero: Learning from scratch
Post #19 Posted: Thu Oct 19, 2017 12:40 pm 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
RobertJasiek wrote:
Elsewhere, I read about a plan to learn "the" rules on its own. a) If this relies on an input of existing played games, it is possible. b) If there is nothing but the playing material, there cannot be a learning of _the_ rules - there can only be a learning of possible rulesets of possible games that might be played with the playing material.


What do they mean by "learn the rules"? Certainly not the ability to quote the rules, and maybe not even the ability to handle the example positions that are published with the rules, or which are considered rules beasts. Rather, they mean that the program does not make an illegal move in thousands, perhaps millions, of games. They don't even have to know how to score.

Even rather dumb programs can learn the rules in that sense through self-play and reinforcement learning. Illegal moves are penalized, that is enough.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: AlphaGo Zero: Learning from scratch
Post #20 Posted: Fri Oct 20, 2017 5:59 am 
Lives in gote

Posts: 392
Liked others: 29
Was liked: 176
GD Posts: 1072
The AlphaGo blog at the beginning of this thread has an animated graph showing the (Elo) strength of AlphaGo Zero as a function of time. Two things struck me.

First, the strength of the engine that beat Lee Sedol happens just as the graph starts to roll over. Clearly diminishing returns set in there.

Second, after about 15 days the rate of improvement is quite slow, as we might expect. Nevertheless at two points, roughly days 33 and 36, there appears to be comparatively sharp jumps upward. We can only speculate what the neural networks learned to make those jumps, but I'd love to know what it is.

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 39 posts ]  Go to page 1, 2  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group