AlphaGo Zero: Learning from scratch

fwiffo · **#21**

A really common technique in ML is to reduce the "learning rate" as a model starts to converge. And it produces bumps in the model performance exactly like that. So it probably didn't learn specific knowledge at that point, or anything more important than other parts of the learning process, it was just a momentary acceleration of learning.

For those confused, learning rate is a parameter used in gradient descent (the standard algorithm used for training machine learning models these days). It's a bit misleadingly named; it's the size of the steps that the model should make as it's progressively walking down the gradient of the loss function. There is usually an empirically discovered "optimal" learning rate for any given model that gets the best performance. Either a higher or lower learning rate results in slower learning. And the ideal learning rate usually decreases as the model learns.

For those who are now more confused, there is a number that you can tweak when training models to make them work better, and it causes bumps in graphs when you tweak it.

Pio2001 · **#22**

Bill Spight wrote:

Even rather dumb programs can learn the rules in that sense through self-play and reinforcement learning. Illegal moves are penalized, that is enough.

The pictures posted says that the program already had the "basic rules" as input when it started learning.

Pippen · **#23**

RobertJasiek wrote:

Elsewhere, I read about a plan to learn "the" rules on its own. a) If this relies on an input of existing played games, it is possible. b) If there is nothing but the playing material, there cannot be a learning of _the_ rules - there can only be a learning of possible rulesets of possible games that might be played with the playing material.

IMO it is impossible to learn Go from scratch, i.e. just the board and stones. The program could learn the playing rules, but because there are countable infinitely ways to score and to determine an outcome of a game, it would get lost. The real interesting things will be if they apply their algorithm to real-life problems, because there you don't have precise rules, definite results and much more complexity (e.g. Go's sample space has about 10^170 elements, but soccer should have easily 10^10^170 even if you look at it discretely.)

alphaville · **#24**

Pippen wrote:

IMO it is impossible to learn Go from scratch, i.e. just the board and stones. The program could learn the playing rules, but because there are countable infinitely ways to score and to determine an outcome of a game, it would get lost.

What do you mean - AlphaGo Zero did just that, it learned Go from scratch (from just a board and stones location representation).

pookpooi · **#25**

alphaville wrote:

Pippen wrote:

IMO it is impossible to learn Go from scratch, i.e. just the board and stones. The program could learn the playing rules, but because there are countable infinitely ways to score and to determine an outcome of a game, it would get lost.

What do you mean - AlphaGo Zero did just that, it learned Go from scratch (from just a board and stones location representation).

I think the 'rule' was absent from 'IMO it is impossible to learn Go from scratch'

fwiffo · **#26**

Some go knowledge was involved, but indirectly. A winner is determined (by a simple, non-ML portion of the program) at the terminal state of each game. For training the network, the position on the board is given, along with a history of recent moves, and an estimated winning probability given various possible moves. The network is trained to predict the likelihood of next moves and the probabilities for the eventual winner. It learns what board configurations are likely wins or losses, and how to get there. (this is a simplification)

There is still a part external to the neural network that has enough of the rules in order to handle capture, to be able to score the game, to handle ko rules, etc. The AI is not asked to reinvent the rules of go (any number of other games could be played on a go board too). It's more precise to say that there was no go strategy hardwired in and no human games to learn from.

So the network learns how to select moves that maximize the probability of arriving at a winning condition. It doesn't itself determine which side won the game. How its go knowledge is represented in the network (and whether it represents something like rules) is probably not interpretable.

How ko and other illegal moves is handled is not in the paper, but there are several ways to do it (e.g. simply masking out illegal moves from the network's predictions, or disqualifying the player that plays them and scoring it as a loss, or imposing a penalty of some kind.)

This is similar to Deepmind's Atari game demonstrations. The network is given raw pixels, and the score. It's not told the rules of Breakout or whatever, it just learns how to make moves to get to the highest score.

Pippen · **#27**

Obviously AG had to be given 1) the basic rules, 2) counting & scoring and 3) the goal of the game in advance. You cannot learn that as an AI. We has humans can because we can look outside of the game and learn 1)-3) from that "higher perspective", e.g. when we see multiple times that someone smiles with a trophy after having more points on the board surrounded than his opponent we can figure that in this game someone wins if he has more points. AG doesn't have this perspective, but soon it might have.

Recusant · **#28**

I've looked in the various threads about AlphaGo Zero and failed to find mention of the article linked below. It begins with some famous Go history then goes on to quote some thoughts from Michael Redmond and others.

"The AI That Has Nothing to Learn From Humans" | The Atlantic

pookpooi · **#29**

Recusant wrote:

I've looked in the various threads about AlphaGo Zero and failed to find mention of the article linked below. It begins with some famous Go history then goes on to quote some thoughts from Michael Redmond and others.

"The AI That Has Nothing to Learn From Humans" | The Atlantic

I read almost every article on Zero, this article has pro in that it include go history and very well-written, but Redmond, Shi Yue, Lockhart comments are actually toward Master version, not Zero. I'd take any expert in a non go-related field comments on Zero over old information anytime. But yes, this article was posted in reddit baduk and many people like it, and many people in lifein19x19 will like it too, that's for sure https://www.reddit.com/r/baduk/comments ... om_humans/

Bill Spight · **#30**

I found the following exchange on reddit interesting ( https://www.reddit.com/r/MachineLearnin ... ittwieser/ )

cassandra wrote:

Do you think that AlphaGo would be able to solve Igo Hatsuyôron's problem 120, the "most difficult problem ever", i. e. winning a given middle game position, or confirm an existing solution (e.g. http://igohatsuyoron120.de/2015/0039.htm)?

David_SilverDeepMind wrote:

We just asked Fan Hui about this position. He says AlphaGo would solve the problem, but the more interesting question would be if AlphaGo found the book answer, or another solution that no one has ever imagined. That's the kind of thing which we have seen with so many moves in AlphaGo’s play!

As if Fan Hui would know!

Besides, Fan Hui is a cheerleader for AlphaGo. I doubt whether AlphaGo would solve that problem without spending a very long time on it, to build a humungous search tree. Chess programs are more search oriented than AlphaGo, and, despite their superhuman strength, sometimes miss plays that humans have found in actual play.

Garf · **#31**

Pippen wrote:

Obviously AG had to be given 1) the basic rules, 2) counting & scoring and 3) the goal of the game in advance. You cannot learn that as an AI. We has humans can because we can look outside of the game and learn 1)-3) from that "higher perspective", e.g. when we see multiple times that someone smiles with a trophy after having more points on the board surrounded than his opponent we can figure that in this game someone wins if he has more points. AG doesn't have this perspective, but soon it might have.

There's no need for AG to have this perspective, and it did not. It just needs someone to tell it at the end of the game "you won" or "you lost". It can then figure out who won by itself eventually - that's the value network.

Of course they just implemented that in the program rather have a human arbiter hand out trophies, but it's not an intrinsic part of the program. They could have made it pop up 2 buttons for Aja and have him count the final positions out :-)

HermanHiddema · **#32**

Bill Spight wrote:

I doubt whether AlphaGo would solve that problem without spending a very long time on it, to build a humungous search tree.

I do not just doubt AlphaGo could solve it, I think it is flat out impossible that it would even get close to a solution. The tree would be too large.

The position is artificial, and very specifically crafted, and the way to solve it is not really related to general go strength. It requires specific domain knowledge, such as hanezeki, rather than things like intuition, shape, or positional judgement. The fact that three amateurs could improve on previous professional work should be ample evidence that playing strength is not the main requirement.

Similarly, I do not think AlphaGo would be able to reliably solve any of Bill's tedomari puzzles, again because it requires very different skills not directly related to playing strength.

RobertJasiek · **#33**

HermanHiddema wrote:

I do not think AlphaGo would be able to reliably solve any of Bill's tedomari puzzles

AlphaGo Zero uses reading on top of the NN so it could find the correct solutions in most cases of such small puzzles. The question is rather whether it would find them for puzzles with enough reading complexity. Would adding 30 simple corridors already provide sufficient confusion? If it has learnt to play simple gotes in decreasing order, corridors with a rich end before empty corridors and empty corridors in order of decreasing length, such puzzles might be played correctly. Without having learnt such, the brute force reading complexity is too great.

HermanHiddema · **#34**

RobertJasiek wrote:

HermanHiddema wrote:

I do not think AlphaGo would be able to reliably solve any of Bill's tedomari puzzles

AlphaGo Zero uses reading on top of the NN so it could find the correct solutions in most cases of such small puzzles. The question is rather whether it would find them for puzzles with enough reading complexity. Would adding 30 simple corridors already provide sufficient confusion? If it has learnt to play simple gotes in decreasing order, corridors with a rich end before empty corridors and empty corridors in order of decreasing length, such puzzles might be played correctly. Without having learnt such, the brute force reading complexity is too great.

Yes, I do mean those puzzles where the end of the game is beyond a reasonable horizon for tree search. The small ones it could well solve.

The term corridor is another good example of an abstraction that is not likely to be part of AlphaGo's repertoire. The advantage of such specific abstraction is far too small (1 point max for tedomari) to find it's way into a generic go playing neural net (though you could perhaps train one if for all the inputs the win depended on tedomari). It would be drowned out by other patterns.

Bill Spight · **#35**

HermanHiddema wrote:

Bill Spight wrote:

I doubt whether AlphaGo would solve that problem without spending a very long time on it, to build a humungous search tree.

I do not just doubt AlphaGo could solve it, I think it is flat out impossible that it would even get close to a solution. The tree would be too large.

Maybe so. But give it a few days and a few terabytes and who knows?

However, I do think that AlphaGo's machine learning algorithm could solve it by training on that specific problem. Maybe even in one day. Maybe even starting from scratch. (But starting from AlphaGo Zero would probably be quicker.)

Quote:

Similarly, I do not think AlphaGo would be able to reliably solve any of Bill's tedomari puzzles, again because it requires very different skills not directly related to playing strength.

As you and Robert have pointed out, my puzzles are small enough and easy enough that any of the strong programs could probably solve them very quickly. But large problems like those in Mathematical Go are another story.

billyswong · **#36**

fwiffo wrote:

Some go knowledge was involved, but indirectly. A winner is determined (by a simple, non-ML portion of the program) at the terminal state of each game. For training the network, the position on the board is given, along with a history of recent moves, and an estimated winning probability given various possible moves. The network is trained to predict the likelihood of next moves and the probabilities for the eventual winner. It learns what board configurations are likely wins or losses, and how to get there. (this is a simplification)

From what I read in the paper, AlphaGo Zero is not provided a winning probability estimation. Whether it acquire similar knowledge after many millions of self-play is another issue.

fwiffo wrote:

How ko and other illegal moves is handled is not in the paper, but there are several ways to do it (e.g. simply masking out illegal moves from the network's predictions, or disqualifying the player that plays them and scoring it as a loss, or imposing a penalty of some kind.)

The paper wrote that the external system provides a map of possible legal play to AlphaGo for each move.

Uberdude · **#37**

On Igo Hatsuyuron 120

HermanHiddema wrote:

Bill Spight wrote:

I doubt whether AlphaGo would solve that problem without spending a very long time on it, to build a humungous search tree.

I do not just doubt AlphaGo could solve it, I think it is flat out impossible that it would even get close to a solution. The tree would be too large.

The position is artificial, and very specifically crafted, and the way to solve it is not really related to general go strength. It requires specific domain knowledge, such as hanezeki, rather than things like intuition, shape, or positional judgement. The fact that three amateurs could improve on previous professional work should be ample evidence that playing strength is not the main requirement.

For a comically bad attempt at using LeelaZero to solve it, see here. White goes wrong on move 4, failing to cut black to make the temporary seki (forget about making the hanezeki, some other clueless person suggested putting a bunch of white stones in its path to cancel out the komi LZ expects) and losses by 60 points.

He only used 1 minute per move on I presume his home computer, so it was never going to be good, but I'm not sure I expected it to be that terrible.

Bill Spight · **#38**

Uberdude wrote:

On Igo Hatsuyuron 120

HermanHiddema wrote:

Bill Spight wrote:

I doubt whether AlphaGo would solve that problem without spending a very long time on it, to build a humungous search tree.

I do not just doubt AlphaGo could solve it, I think it is flat out impossible that it would even get close to a solution. The tree would be too large.

The position is artificial, and very specifically crafted, and the way to solve it is not really related to general go strength. It requires specific domain knowledge, such as hanezeki, rather than things like intuition, shape, or positional judgement. The fact that three amateurs could improve on previous professional work should be ample evidence that playing strength is not the main requirement.

For a comically bad attempt at using LeelaZero to solve it, see here. White goes wrong on move 4, failing to cut black to make the temporary seki (forget about making the hanezeki, some other clueless person suggested putting a bunch of white stones in its path to cancel out the komi LZ expects) and losses by 60 points.

He only used 1 minute per move on I presume his home computer, so it was never going to be good, but I'm not sure I expected it to be that terrible.

David_SilverDeepMind wrote:

We just asked Fan Hui about this position. He says AlphaGo would solve the problem, but the more interesting question would be if AlphaGo found the book answer, or another solution that no one has ever imagined. That's the kind of thing which we have seen with so many moves in AlphaGo’s play!

To repeat myself:

Bill Spight wrote:

However, I do think that AlphaGo's machine learning algorithm could solve it by training on that specific problem. Maybe even in one day. Maybe even starting from scratch. (But starting from AlphaGo Zero would probably be quicker.)

You could train on that problem, with 0 komi, OC.

EDIT: And, if you want a somewhat more general problem solver, with 8 fold symmetry.

Bill Spight · **#39**

Quoting Uberdude from his Journal:

Uberdude wrote:

Addendum: However, as a warning against treating LZ's word as gospel, if I make it play the book's 1, then kosumi for white (#2) and next moves all top choices it gives black only 45% to continue at 7 as best move (on 1k playouts).

Click Here To Show Diagram Code: [go]$$B LZ blind spot setup? $$ +---------------------------------------+ $$ | . . . . . . . . . . . . . . . . . . . | $$ | . O O . a . . . . . . . . . . . . . . | $$ | . . X O . O . 1 . . . . . . . . . . . | $$ | . . X , . . . . . , . . . X . X . . . | $$ | . . . X . X . 3 . . . . . . . . X . . | $$ | . . . . . . . . . . . . . . . O O . . | $$ | . . . O . . . . . . . . . . . . . . . | $$ | . . . . 2 . . . . . . . . . . . . . . | $$ | . . X . . . . . . . . . . . . . O . . | $$ | . . . 4 . . . . . , . . . . . , . . . | $$ | . . 5 7 . . . . . . . . . . . X . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . 6 . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . O . . . . . , . . . . . , . . . | $$ | . . . . . . . . . O . . . . . X . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ +---------------------------------------+[/go]

However, if 7 is placement here it's happy to block (59%) but then if black "unexpectedly" pushes through white drops to 40%. It wants to connect at 6 and sacrifice the corner in a trade for pushing through outside rather than squeeze under and take the small gote life. So it seems like early in this variation LZ was willing to tenuki the checking extension the book recommends, but underestimated the severity of the peep later. Not sure if more playouts would solve this (I'm using network #139 if anyone else wants to try before I get my new computer).

Click Here To Show Diagram Code: [go]$$Bm7 LZ blind spot? $$ +---------------------------------------+ $$ | . . . . . . . . . . . . . . . . . . . | $$ | . O O 5 1 2 . . . . . . . . . . . . . | $$ | . . X O 3 O . X . . . . . . . . . . . | $$ | . . X , 4 a . . . , . . . X . X . . . | $$ | . . . X . X . X . . . . . . . . X . . | $$ | . . . . . . . . . . . . . . . O O . . | $$ | . . . O . . . . . . . . . . . . . . . | $$ | . . . . O . . . . . . . . . . . . . . | $$ | . . X . . . . . . . . . . . . . O . . | $$ | . . . O . . . . . , . . . . . , . . . | $$ | . . X . . . . . . . . . . . . X . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . O . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . O . . . . . , . . . . . , . . . | $$ | . . . . . . . . . O . . . . . X . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ +---------------------------------------+[/go]

(Emphasis mine.)

IIUC, Monte Carlo Tree Search can overcome this problem in infinite time.

It seems to me that this illustrates a problem with self play. Both you and your opponent probably have the same blind spots, because you are so similar. It's like, using the hill climbing metaphor, you are climbing the same hill. But there are other hills which may be more important in a given position. These other hills indicate other skills. Given enough exploration and enough time, self play will learn those skills. But they may be learned more quickly with adversarial play. That is, instead of playing against a version of yourself, play against a version of a player that has been trained to beat you. An adversary will probe your blind spots.

Now, if your aim is to produce a superhuman player, we have seen that self play from scratch is the most efficient way known to do that. However, if we are climbing a single finite hill we will reach the point of diminishing returns. (I think that happened in chess.) Training adversaries will initially take more time to reach the same level--in chess, 8 hours instead of 4?--, but may take less time to produce robust, well rounded players.

Another idea is to train for known go skills. For instance, train initially on classic go problems with 0 komi. Then train on the whole board with komi. Will the initial training be a help or a hindrance?

AlphaGo Zero: Learning from scratch

Who is online