This 'n' that

Bill Spight · Post by **Bill Spight** » Thu Jun 22, 2017 7:14 am

I am tempted to wait for the speculation about neural nets to die down before talking about an AlphaGo game, but this seems pertinent to the present discussion.

Click Here To Show Diagram Code: [go]$$Bc AlphaGo vs. AlphaGo Game 12 $$ --------------------------------------- $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . 2 . . . . . , . . . . . 1 . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . , . . . . . , . . . . . , . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . 6 . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . 4 . . . . . , . . . . . , 3 . . | $$ | . . . . . 5 . . . . . 7 . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ ---------------------------------------[/go]

is, I suppose, the newest move so far, but it was not an AlphaGo invention. What if its policy network had not been trained on human play, but from scratch? What would its first seven plays look like?

The next play, however, is an AlphaGo innovation. You guessed it, the 3-3 invasion.

moha · Post by **moha** » Thu Jun 22, 2017 7:24 am

Uberdude wrote:Isn't the point of reinforcement learning that as you train the networks, you are essentially transferring skill that was initially derived from tree search into the weights of the neural network: e.g. AlphaGo started by playing the slide to 2-4 after approaching a 4-4 like humans did in the initial training data, but after millions of games it found that tended to lead to poor results so now the policy network doesn't much like that move.

This sounds good in theory, but you may be underestimating the number of possible positions on 19x19, and the extent of "drying up" when you try to fill a wholeboard NN with data.

Partly for Bill's example: the opening is different though, the number of reasonable moves are too high for search, while the number of positions are still not too high, so a wholeboard net is useful at this stage (like an opening book in chess

).

Bill Spight · Post by **Bill Spight** » Thu Jun 22, 2017 8:51 am

Traditional joseki

Now, the direct 3-3 invasion was frowned on for quite some time. Usually some sort of preparation was made, and in his 21st Century Go set, Go Seigen shows it a number of times, almost always after a light reduction. But this kind of 3-3 invasion by a strong player is a significant innovation by AlphaGo.

Coming along, I was never attracted to this invasion, mainly because I felt that, on a relatively empty board, Black's resulting thickness was too strong, after the usual joseki. We warn beginners against this invasion. I guess we are going to have to stop doing that.

Click Here To Show Diagram Code: [go]$$Wcm8 AlphaGo game 12, Traditional joseki $$ --------------------------------------- $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . 7 5 . . . . | $$ | . . . . . . . . . . . . 8 6 4 3 1 . . | $$ | . . . O . . . . . , . . . . . X 2 . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . , . . . . . , . . . . . , . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . O . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . O . . . . . , . . . . . , X . . | $$ | . . . . . X . . . . . X . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ ---------------------------------------[/go]

looks like the right side to block on, but you never know. Maybe we can think of the 3-3 invasion as a probe.

Anyway,

-

follow the traditional joseki.

Click Here To Show Diagram Code: [go]$$Wcm16 Traditional joseki, continued $$ --------------------------------------- $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . O O . . . . | $$ | . . . . . . . . . . . . X X X O O 3 . | $$ | . . . O . . . . . , . . . . . X X 1 . | $$ | . . . . . . . . . . . . . . . . . 2 . | $$ | . . . . . . . . . . . . . . . . 4 . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . , . . . . . , . . . . . , . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . O . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . O . . . . . , . . . . . , X . . | $$ | . . . . . X . . . . . X . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ ---------------------------------------[/go]

After

-

White has sente, but I do not like White's chances. AlphaGo apparently agrees, because it does not play

. It plays else where. Even the great Go Seigen did not see that! I suppose that AlphaGo has pretty well killed

as joseki. It still may be a situational move, OC.

Where do you think AlphaGo played? You can probably guess.

Bill Spight · Post by **Bill Spight** » Thu Jun 22, 2017 9:22 am

AlphaGo Game 12

Click Here To Show Diagram Code: [go]$$Wcm16 Wedge $$ --------------------------------------- $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . O O . . . . | $$ | . . . . . . . . . . . . X X X O O . . | $$ | . . . O . . . . . , . . . . . X X . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . c . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . b . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . , . . . . . , . . . . . , . . . | $$ | . . . . . . . . . . . . . . . . 1 . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . O . . . . . . . . . . . . . a . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . O . . . . . , . . . . . , X . . | $$ | . . . . . X . . . . . X . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ ---------------------------------------[/go]

A wedge seems obvious, and

looks like just the right spot. It has room to make a base with "a" or "b", and if White plays at "b", there is room for another extension to "c", if need be. The wedge is not too close to Black's thickness. It feels just right.

Click Here To Show Diagram Code: [go]$$Wcm16 How to approach the wedge? $$ --------------------------------------- $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . O O . . . . | $$ | . . . . . . . . . . . . X X X O O . . | $$ | . . . O . . . . . , . . . . . X X a . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . 2 . . | $$ | . . . , . . . . . , . . . . . , . . . | $$ | . . . . . . . . . . . . . . . . 1 . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . O . . . . . . . . . . . . . 3 . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . O . . . . . , . . . . 4 , X . . | $$ | . . . . . X . . . . . X . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ ---------------------------------------[/go]

Approaching from the bottom has little appeal, despite the saying to drive your opponent's stones towards your thickness. (And is it really thickness? as John Fairbairn might point out.

). Black would not have much of an attack that way. Many strong players of yore, perhaps even into the 20th century, would have had few qualms about approaching

from the top, despite a bit of overconcentration, anticipating something like

-

, securing the corner. After

a White hane at "a" would be bothersome, so they would probably descend to "a" first, with sente, before approaching the wedge.

Well, AlphaGo as Black did neither. The board is open. Where do you think it played? I would not have guessed right, BTW. I am hiding its move if you want to guess. More discussion later.

Baywa · Post by **Baywa** » Thu Jun 22, 2017 12:44 pm

Bill Spight wrote: I may be wrong, but my impression is that neural networks generalize from what they are trained on, and so they can produce some new things from time to time.

(This is also in reply to moha)

I think so, too by what I've heard (and I really, really have to read the Nature paper - but it is quite condensed). The NNs I think try to emulate intuition, gut feeling, coming from experience. That is playing lots and lots of games. For the policy network you give it a 19x19 bitmap of colour-values (black, white, blank). This input activates the network and in the output layer one (or maybe more than one) neuron out of about 19x19 neurons is activated as a candidate for the next move or the hot spot. Now, in order to train such a network you don't have to feed it all possible 10^100 or so board positions. This is the whole point of NNs! Somehow (by magic, or better by variants of the gradient descent method) it learns from a far smaller number of training examples.

Compare that to skilled go players. They played a lot of games and got the gut feeling. They also learned rules (direction of play, proper distance etc.). But I'm not so sure how important this rationalizing really is during the actual go playing process. Somehow they find the best spot to play and play immediately or try a few or many variations to decide which point to choose.

So, to summarize, NNs reduce the complexity of the game and make it then possible to make a choice.

Kirby · Post by **Kirby** » Thu Jun 22, 2017 1:29 pm

Bill Spight wrote: I may be wrong, but my impression is that neural networks generalize from what they are trained on, and so they can produce some new things from time to time.

Here's my understanding of how AlphaGo works, described in layman's terms - at least the Fan Hui version, of which the Nature paper was based:

Step 1.) Train policy network to construct a function that is able to predict moves that a strong player makes. Initially, this is done with supervised learning - give it a bunch of high dan player games, train neural network to maximize weights so that you have a (non-linear) function that predicts the next move given a new high dan player game.
Step 2.) Improve policy network through reinforcement learning. To do this, have the latest version of the policy network play A against an older version of the policy network B. See who wins. Update the weights for the function trained by the policy network, giving positive value for a win, negative value for a loss.
Step 3.) Train a value network, not with sample data like in Step 1, but directly by playing games like in Step 2: At a random board position, predict who will win the game. Use the Policy network from Step 2 to play out the rest of the game, and see who won. Just like before, if the prediction was correct, add positive value to the weights; if prediction was wrong, add negative value.
Step 4.) Combine policy network, value network, and Monte Carlo Tree Search: Tree is constructed, starting at root board state. At each node in the tree, the policy network gives a prior probability that any of the given moves will be good (e.g. 62% chance I should play move X). Then you can traverse the tree to search for best outcome. Outcome is defined by a linear combination of the value as defined by the value network PLUS the outcome that would occur by doing monte carlo simulation from that point in the tree. How much weight to give to MTCS vs. the value network is not clear to me.
Step 5.) Profit (beat Lee Sedol, Ke Jie, earn millions, and start the robot revolution).

So anyway, this allows for generalization to occur, as Bill suggests. Fundamentally, the program still does a search. But the moves the breadth of actions from a given state that are likely to lead to a good result is really reduced due to the policy network (which has been trained first on training data, and then refined by playing against itself). And leaf evaluation combines the trained value network and monte carlo search from that position. The neural networks themselves basically produce non-linear functions with weights that have been adjusted through training. Given a totally new situation and board position, it can be fed to that function to produce a result.

This is basically my understanding of how things work. Please feel free to correct any misunderstandings that I have, because I'm interested in learning more about it, too.

moha · Post by **moha** » Thu Jun 22, 2017 2:21 pm

Baywa wrote:Now, in order to train such a network you don't have to feed it all possible 10^100 or so board positions. This is the whole point of NNs! Somehow (by magic, or better by variants of the gradient descent method) it learns from a far smaller number of training examples.

The NN can only give answers based on the data it received. The point is, IMO this answer will not be "globally" correct (except for the opening), there is simply not enough data for this. It will contain similar generalizations, localizations, simplifications to human intuition. For local shapes, this can be quite accurate because a distilled, generalized view can emerge for them (though will still blunder from time to time without search), but for global strategy, effect of one part of the board on another, sente, attacking maneuvers etc. you need search. Even for local fights it can only give options to search on, and not correct answers (pro level). But you CAN search, if you have NN for pruning! This is the real innovation here.

One example is the Lee Sedol match, game 3, from move 16 (IIRC) onward. This is one of the rare cases where Alphago "commits" itself to a line, that could end the game with a misread. This is different from the flexible and souba style it normally plays. IMO such commitment is only possible with very deep search. This is also what Lee Sedol concluded after the match - feeling (his NN) is not enough, you need to read out everything. Playing at that level is simply not possible otherwise.

(A minor correction: it seems the value net is simply used to refine the evaluation, averaging into the MC result, not for optimization as I guessed. -- I now see Kirby already corrected that.)

EdLee · Post by **EdLee** » Thu Jun 22, 2017 2:22 pm

Hi Kirby,

How much weight to give to MTCS vs. the value network is not clear to me.

My understanding is anything that's not explicitly spelled out in their paper(s) -- ie. anything that's implemention dependent, or "user" adjustable -- is part of DM's "secret sauce". And that, plus the massive resource requirements (custom TPU's, other custom hardware, massive power supply, etc.) are why AG has a significant (?) lead from its nearest machines (DeepZen, etc.).

Seems they'll continue to develop new and improved methods for other fields (eg. medicine).

moha · Post by **moha** » Thu Jun 22, 2017 2:27 pm

Kirby wrote:How much weight to give to MTCS vs. the value network is not clear to me.

I recall seeing a weight of 0.5 mentioned somewhere.

Baywa · Post by **Baywa** » Thu Jun 22, 2017 3:24 pm

moha wrote:
Baywa wrote:Now, in order to train such a network you don't have to feed it all possible 10^100 or so board positions. This is the whole point of NNs! Somehow (by magic, or better by variants of the gradient descent method) it learns from a far smaller number of training examples.
The NN can only give answers based on the data it received. The point is, IMO this answer will not be "globally" correct (except for the opening), there is simply not enough data for this.

The question whether a method is globally correct (that is will always give the best answer) is pretty theoretical and practically irrelevant. You can only prove that a move is not optimal by finding a better move. How do you find such a move if the machine beats you every time? Especially for opening moves this is very difficult to determine. From a practical standpoint the selflearning AlphaGo plays very good moves. But it is more than likely that AlphaGo still does not play the optimal move.

It will contain similar generalizations, localizations, simplifications to human intuition.

Of course! But that's the point of the AlphaGo-architecture. However, by playing itself many many times and by learning from it it gets new intuition.

For local shapes, this can be quite accurate

Maybe just the opposite! Those local situations with crosscuts, shortages of liberty require heavy reading.

because a distilled, generalized view can emerge for them

Sorry, I don't understand that. ...

but for global strategy, effect of one part of the board on another, sente, attacking maneuvers etc. you need search.

Well of course you do. But what would you do without intuition? Searching for a needle in a haysteck...

Even for local fights it can only give options to search on, and not correct answers (pro level).

Forget your notion of correctness! (see above) Even local losses in many cases turn out to be globally good. AlphaGo and skilled human players show that in many games. Of course that requires good reading but also good intuition to even consider.

But you CAN search, if you have NN for pruning! This is the real innovation here.

Now, you're with me. Did we go in circles? Sorry, I have to stop here. That part with Lee Sedol's game 3 I have to read up.

Bill Spight · Post by **Bill Spight** » Thu Jun 22, 2017 5:45 pm

EdLee wrote:My understanding is anything that's not explicitly spelled out in their paper(s) -- ie. anything that's implemention dependent, or "user" adjustable -- is part of DM's "secret sauce".

I have no quarrel with the AlphaGo team. However, I have a certain disappointment with computer science papers in general. I have written a couple of papers on the mathematics of go, which got published in computers and games collections. When I made use of a computer program, I included it in an appendix. My programs were in Prolog, and I wrote them to be human readable.

Quite often computer science papers include pseudocode, which is also human readable. However, I often found that the pseudocode was not enough for me to verify the claims that papers made. I shrugged it off because I am not really a computer scientist. Recently, though, I have heard of some research where computer scientists other than the authors were also not able to verify the results in some papers. In at least some cases the authors stated that they had not, in fact, included the tweaks necessary to produce the results in their papers, and were not going to make them public. This practice seems to be quite common. Sorry, but irreproducible results do not science make, IMO.

Bill Spight · Post by **Bill Spight** » Thu Jun 22, 2017 5:46 pm

moha wrote:
Kirby wrote:How much weight to give to MTCS vs. the value network is not clear to me.
I recall seeing a weight of 0.5 mentioned somewhere.

Yes, I have seen that, too.

Edit: To be clear, I believe that they average the MC playout results with the value network results, not the results of the MC tree search.

Bill Spight · Post by **Bill Spight** » Thu Jun 22, 2017 6:16 pm

Kirby wrote:Step 3.) Train a value network, not with sample data like in Step 1, but directly by playing games like in Step 2: At a random board position, predict who will win the game. Use the Policy network from Step 2 to play out the rest of the game, and see who won. Just like before, if the prediction was correct, add positive value to the weights; if prediction was wrong, add negative value.
Step 4.) Combine policy network, value network, and Monte Carlo Tree Search: Tree is constructed, starting at root board state. At each node in the tree, the policy network gives a prior probability that any of the given moves will be good (e.g. 62% chance I should play move X). Then you can traverse the tree to search for best outcome. Outcome is defined by a linear combination of the value as defined by the value network PLUS the outcome that would occur by doing monte carlo simulation from that point in the tree. How much weight to give to MTCS vs. the value network is not clear to me.

As moha points out, I believe that they weight each equally. Since they both produce probabilities, I suppose that they use the geometric mean, but I don't know.

From what you say in 3) it sounds like the value network gives an estimate of the probability of winning, given AlphaGo vs. AlphaGo. As I have stated elsewhere, the MC probabilities are those of semi-random player vs. semi-random player. which is not the situation at hand. Why, then, use the MC probabilities at all, since they are inherently inaccurate? (I think I could get rich by betting against the MC probabilities.

) The value network may produce play that is too "honest", particularly if you are a bit behind. Then you need to play for errors, and using an intermediate value may help to do that. Another possible advantage is what moha points out, that the MC probabilities are based upon the actual position at hand, not some aggregate of more or less similar positions.

EdLee · Post by **EdLee** » Thu Jun 22, 2017 7:00 pm

Hi Bill,

In at least some cases the authors stated that they had not, in fact, included the tweaks necessary to produce the results in their papers, and were not going to make them public. This practice seems to be quite common. Sorry, but irreproducible results do not science make, IMO.

An interesting point. Where to draw the line between "fundamental pure research" and commerce. Toward the end of 2015, before AG was made public, the top non-AG engines were about 4-5 stones from pro (they lost to pros at 5 stones; I forget the name of the computer tourney; afterwards, exhibition matches between the top engine and pros). That was when many who didn't know about AG were still saying "not for at least another decade" to beat top pros. After DM published their paper(s), the other engines jumped to near pro level (MLily: DeepZen v. human pros, 1-1). AG's results were consistent and reproducible (AG-Master's 60-0, and AG-2017's 3-0 v. Mr. Lee Sedol), and like Coca-Cola's secret formula, they were reproducible by the proprietor, just not (yet) by others.

moha · Post by **moha** » Thu Jun 22, 2017 7:45 pm

Bill Spight wrote:As I have stated elsewhere, the MC probabilities are those of semi-random player vs. semi-random player. which is not the situation at hand. Why, then, use the MC probabilities at all, since they are inherently inaccurate?

I think this is the essence of Alphago (& co).

A mostly random MC is next to useless since it measures the distribution of legal moves from a node, so if you have lots of bad options, that would result in low winrate. But in reality the number of bad options does not matter, only the outcome of the best option (or at most the number of good options) counts.

But with a NN, you can get the "amateur dan" level playouts mentioned earlier, which IS informative for winrates. (For Baywa: the NN itself is weak without search, strategically as well as tactically, it have no hope of producing a pro level game - this is what I meant by correctness.) Even if the MC winrate is just a rough estimate, this is also true for the value net. There may be positions where one works better than the other (such as early game vs late game).

Life In 19x19

This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re:

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that