AlphaGo Zero: Learning from scratch

Garf · Post by **Garf** » Wed Oct 25, 2017 12:59 am

Pippen wrote:Obviously AG had to be given 1) the basic rules, 2) counting & scoring and 3) the goal of the game in advance. You cannot learn that as an AI. We has humans can because we can look outside of the game and learn 1)-3) from that "higher perspective", e.g. when we see multiple times that someone smiles with a trophy after having more points on the board surrounded than his opponent we can figure that in this game someone wins if he has more points. AG doesn't have this perspective, but soon it might have.

There's no need for AG to have this perspective, and it did not. It just needs someone to tell it at the end of the game "you won" or "you lost". It can then figure out who won by itself eventually - that's the value network.

Of course they just implemented that in the program rather have a human arbiter hand out trophies, but it's not an intrinsic part of the program. They could have made it pop up 2 buttons for Aja and have him count the final positions out

HermanHiddema · Post by **HermanHiddema** » Wed Oct 25, 2017 3:36 am

Bill Spight wrote:I doubt whether AlphaGo would solve that problem without spending a very long time on it, to build a humungous search tree.

I do not just doubt AlphaGo could solve it, I think it is flat out impossible that it would even get close to a solution. The tree would be too large.

The position is artificial, and very specifically crafted, and the way to solve it is not really related to general go strength. It requires specific domain knowledge, such as hanezeki, rather than things like intuition, shape, or positional judgement. The fact that three amateurs could improve on previous professional work should be ample evidence that playing strength is not the main requirement.

Similarly, I do not think AlphaGo would be able to reliably solve any of Bill's tedomari puzzles, again because it requires very different skills not directly related to playing strength.

RobertJasiek · Post by **RobertJasiek** » Wed Oct 25, 2017 3:49 am

HermanHiddema wrote:I do not think AlphaGo would be able to reliably solve any of Bill's tedomari puzzles

AlphaGo Zero uses reading on top of the NN so it could find the correct solutions in most cases of such small puzzles. The question is rather whether it would find them for puzzles with enough reading complexity. Would adding 30 simple corridors already provide sufficient confusion? If it has learnt to play simple gotes in decreasing order, corridors with a rich end before empty corridors and empty corridors in order of decreasing length, such puzzles might be played correctly. Without having learnt such, the brute force reading complexity is too great.

HermanHiddema · Post by **HermanHiddema** » Wed Oct 25, 2017 7:29 am

RobertJasiek wrote:
HermanHiddema wrote:I do not think AlphaGo would be able to reliably solve any of Bill's tedomari puzzles
AlphaGo Zero uses reading on top of the NN so it could find the correct solutions in most cases of such small puzzles. The question is rather whether it would find them for puzzles with enough reading complexity. Would adding 30 simple corridors already provide sufficient confusion? If it has learnt to play simple gotes in decreasing order, corridors with a rich end before empty corridors and empty corridors in order of decreasing length, such puzzles might be played correctly. Without having learnt such, the brute force reading complexity is too great.

Yes, I do mean those puzzles where the end of the game is beyond a reasonable horizon for tree search. The small ones it could well solve.

The term corridor is another good example of an abstraction that is not likely to be part of AlphaGo's repertoire. The advantage of such specific abstraction is far too small (1 point max for tedomari) to find it's way into a generic go playing neural net (though you could perhaps train one if for all the inputs the win depended on tedomari). It would be drowned out by other patterns.

Bill Spight · Post by **Bill Spight** » Wed Oct 25, 2017 8:05 am

HermanHiddema wrote:
Bill Spight wrote:I doubt whether AlphaGo would solve that problem without spending a very long time on it, to build a humungous search tree.
I do not just doubt AlphaGo could solve it, I think it is flat out impossible that it would even get close to a solution. The tree would be too large.

Maybe so. But give it a few days and a few terabytes and who knows?

However, I do think that AlphaGo's machine learning algorithm could solve it by training on that specific problem. Maybe even in one day. Maybe even starting from scratch. (But starting from AlphaGo Zero would probably be quicker.)

Similarly, I do not think AlphaGo would be able to reliably solve any of Bill's tedomari puzzles, again because it requires very different skills not directly related to playing strength.

As you and Robert have pointed out, my puzzles are small enough and easy enough that any of the strong programs could probably solve them very quickly. But large problems like those in Mathematical Go are another story.

billyswong · Post by **billyswong** » Fri Oct 27, 2017 6:53 pm

fwiffo wrote:Some go knowledge was involved, but indirectly. A winner is determined (by a simple, non-ML portion of the program) at the terminal state of each game. For training the network, the position on the board is given, along with a history of recent moves, and an estimated winning probability given various possible moves. The network is trained to predict the likelihood of next moves and the probabilities for the eventual winner. It learns what board configurations are likely wins or losses, and how to get there. (this is a simplification)

From what I read in the paper, AlphaGo Zero is not provided a winning probability estimation. Whether it acquire similar knowledge after many millions of self-play is another issue.

fwiffo wrote:How ko and other illegal moves is handled is not in the paper, but there are several ways to do it (e.g. simply masking out illegal moves from the network's predictions, or disqualifying the player that plays them and scoring it as a loss, or imposing a penalty of some kind.)

The paper wrote that the external system provides a map of possible legal play to AlphaGo for each move.

Uberdude · Post by **Uberdude** » Thu May 24, 2018 12:23 am

On Igo Hatsuyuron 120

HermanHiddema wrote:
Bill Spight wrote:I doubt whether AlphaGo would solve that problem without spending a very long time on it, to build a humungous search tree.
I do not just doubt AlphaGo could solve it, I think it is flat out impossible that it would even get close to a solution. The tree would be too large.

The position is artificial, and very specifically crafted, and the way to solve it is not really related to general go strength. It requires specific domain knowledge, such as hanezeki, rather than things like intuition, shape, or positional judgement. The fact that three amateurs could improve on previous professional work should be ample evidence that playing strength is not the main requirement.

For a comically bad attempt at using LeelaZero to solve it, see here. White goes wrong on move 4, failing to cut black to make the temporary seki (forget about making the hanezeki, some other clueless person suggested putting a bunch of white stones in its path to cancel out the komi LZ expects) and losses by 60 points.

He only used 1 minute per move on I presume his home computer, so it was never going to be good, but I'm not sure I expected it to be that terrible.

Bill Spight · Post by **Bill Spight** » Thu May 24, 2018 5:07 am

Uberdude wrote:On Igo Hatsuyuron 120

HermanHiddema wrote:
Bill Spight wrote:I doubt whether AlphaGo would solve that problem without spending a very long time on it, to build a humungous search tree.
I do not just doubt AlphaGo could solve it, I think it is flat out impossible that it would even get close to a solution. The tree would be too large.

The position is artificial, and very specifically crafted, and the way to solve it is not really related to general go strength. It requires specific domain knowledge, such as hanezeki, rather than things like intuition, shape, or positional judgement. The fact that three amateurs could improve on previous professional work should be ample evidence that playing strength is not the main requirement.
For a comically bad attempt at using LeelaZero to solve it, see here. White goes wrong on move 4, failing to cut black to make the temporary seki (forget about making the hanezeki, some other clueless person suggested putting a bunch of white stones in its path to cancel out the komi LZ expects) and losses by 60 points.

He only used 1 minute per move on I presume his home computer, so it was never going to be good, but I'm not sure I expected it to be that terrible.

David_SilverDeepMind wrote: We just asked Fan Hui about this position. He says AlphaGo would solve the problem, but the more interesting question would be if AlphaGo found the book answer, or another solution that no one has ever imagined. That's the kind of thing which we have seen with so many moves in AlphaGo’s play!

To repeat myself:

Bill Spight wrote:However, I do think that AlphaGo's machine learning algorithm could solve it by training on that specific problem. Maybe even in one day. Maybe even starting from scratch. (But starting from AlphaGo Zero would probably be quicker.)

You could train on that problem, with 0 komi, OC.

EDIT: And, if you want a somewhat more general problem solver, with 8 fold symmetry.

Bill Spight · Post by **Bill Spight** » Thu May 24, 2018 7:02 am

Quoting Uberdude from his Journal:

Uberdude wrote:Addendum: However, as a warning against treating LZ's word as gospel, if I make it play the book's 1, then kosumi for white (#2) and next moves all top choices it gives black only 45% to continue at 7 as best move (on 1k playouts).

$$B LZ blind spot setup?
$$ +---------------------------------------+
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . O O . a . . . . . . . . . . . . . . |
$$ | . . X O . O . 1 . . . . . . . . . . . |
$$ | . . X , . . . . . , . . . X . X . . . |
$$ | . . . X . X . 3 . . . . . . . . X . . |
$$ | . . . . . . . . . . . . . . . O O . . |
$$ | . . . O . . . . . . . . . . . . . . . |
$$ | . . . . 2 . . . . . . . . . . . . . . |
$$ | . . X . . . . . . . . . . . . . O . . |
$$ | . . . 4 . . . . . , . . . . . , . . . |
$$ | . . 5 7 . . . . . . . . . . . X . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . 6 . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . O . . . . . , . . . . . , . . . |
$$ | . . . . . . . . . O . . . . . X . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ +---------------------------------------+
Click Here To Show Diagram Code
[go]$$B LZ blind spot setup? $$ +---------------------------------------+ $$ | . . . . . . . . . . . . . . . . . . . | $$ | . O O . a . . . . . . . . . . . . . . | $$ | . . X O . O . 1 . . . . . . . . . . . | $$ | . . X , . . . . . , . . . X . X . . . | $$ | . . . X . X . 3 . . . . . . . . X . . | $$ | . . . . . . . . . . . . . . . O O . . | $$ | . . . O . . . . . . . . . . . . . . . | $$ | . . . . 2 . . . . . . . . . . . . . . | $$ | . . X . . . . . . . . . . . . . O . . | $$ | . . . 4 . . . . . , . . . . . , . . . | $$ | . . 5 7 . . . . . . . . . . . X . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . 6 . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . O . . . . . , . . . . . , . . . | $$ | . . . . . . . . . O . . . . . X . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ +---------------------------------------+[/go]
However, if 7 is placement here it's happy to block (59%) but then if black "unexpectedly" pushes through white drops to 40%. It wants to connect at 6 and sacrifice the corner in a trade for pushing through outside rather than squeeze under and take the small gote life. So it seems like early in this variation LZ was willing to tenuki the checking extension the book recommends, but underestimated the severity of the peep later. Not sure if more playouts would solve this (I'm using network #139 if anyone else wants to try before I get my new computer).

$$Bm7 LZ blind spot?
$$ +---------------------------------------+
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . O O 5 1 2 . . . . . . . . . . . . . |
$$ | . . X O 3 O . X . . . . . . . . . . . |
$$ | . . X , 4 a . . . , . . . X . X . . . |
$$ | . . . X . X . X . . . . . . . . X . . |
$$ | . . . . . . . . . . . . . . . O O . . |
$$ | . . . O . . . . . . . . . . . . . . . |
$$ | . . . . O . . . . . . . . . . . . . . |
$$ | . . X . . . . . . . . . . . . . O . . |
$$ | . . . O . . . . . , . . . . . , . . . |
$$ | . . X . . . . . . . . . . . . X . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . O . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . O . . . . . , . . . . . , . . . |
$$ | . . . . . . . . . O . . . . . X . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ +---------------------------------------+
Click Here To Show Diagram Code
[go]$$Bm7 LZ blind spot? $$ +---------------------------------------+ $$ | . . . . . . . . . . . . . . . . . . . | $$ | . O O 5 1 2 . . . . . . . . . . . . . | $$ | . . X O 3 O . X . . . . . . . . . . . | $$ | . . X , 4 a . . . , . . . X . X . . . | $$ | . . . X . X . X . . . . . . . . X . . | $$ | . . . . . . . . . . . . . . . O O . . | $$ | . . . O . . . . . . . . . . . . . . . | $$ | . . . . O . . . . . . . . . . . . . . | $$ | . . X . . . . . . . . . . . . . O . . | $$ | . . . O . . . . . , . . . . . , . . . | $$ | . . X . . . . . . . . . . . . X . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . O . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . O . . . . . , . . . . . , . . . | $$ | . . . . . . . . . O . . . . . X . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ +---------------------------------------+[/go]

(Emphasis mine.)

IIUC, Monte Carlo Tree Search can overcome this problem in infinite time.

It seems to me that this illustrates a problem with self play. Both you and your opponent probably have the same blind spots, because you are so similar. It's like, using the hill climbing metaphor, you are climbing the same hill. But there are other hills which may be more important in a given position. These other hills indicate other skills. Given enough exploration and enough time, self play will learn those skills. But they may be learned more quickly with adversarial play. That is, instead of playing against a version of yourself, play against a version of a player that has been trained to beat you. An adversary will probe your blind spots.

Now, if your aim is to produce a superhuman player, we have seen that self play from scratch is the most efficient way known to do that. However, if we are climbing a single finite hill we will reach the point of diminishing returns. (I think that happened in chess.) Training adversaries will initially take more time to reach the same level--in chess, 8 hours instead of 4?--, but may take less time to produce robust, well rounded players.

Another idea is to train for known go skills. For instance, train initially on classic go problems with 0 komi. Then train on the whole board with komi. Will the initial training be a help or a hindrance?

Life In 19x19

AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch