This 'n' that

dfan · **#301**

moha wrote:

Alphago's lines are nowehere near optimal (IIRC the NN's are low dan levels by themselves),

This was true in 2015, when they were first putting together the system. My understanding is that the policy network is much stronger now. I would be curious what level it is now, although of course the fact that it can't do any reading is a large handicap. From the comments of Silver and Hassabis, it sounded like one of the big steps forward with the current version is that the policy network has been trained on millions of self-play games (that include search).

Uberdude · **#302**

dfan wrote:

moha wrote:

Alphago's lines are nowehere near optimal (IIRC the NN's are low dan levels by themselves),

This was true in 2015, when they were first putting together the system. My understanding is that the policy network is much stronger now. I would be curious what level it is now, although of course the fact that it can't do any reading is a large handicap. From the comments of Silver and Hassabis, it sounded like one of the big steps forward with the current version is that the policy network has been trained on millions of self-play games (that include search).

Even at the end of 2014 before AlphaGo was born and they were just doing what would become the policy network, it was about as strong as Aja Huang (5 dan EGF): https://arxiv.org/abs/1412.6564

moha · **#303**

Uberdude wrote:

Even at the end of 2014 before AlphaGo was born and they were just doing what would become the policy network, it was about as strong as Aja Huang (5 dan EGF): https://arxiv.org/abs/1412.6564

AFAIS that paper says it's pro prediction accuracy was similar, not that it's playing strength. The latter was measured against GnuGo and other programs. Anyway, surely the NN also improved since then, but there are upper bounds - I doubt a net without search could reasonably expected to reach even the weakest pro levels. Strength does not depend on the good moves, but the fequency and size of the mistakes.

Bill Spight · **#304**

moha wrote:

Bill Spight wrote:

moha wrote:

I think human strategy and reasoning works, just does not always work, and will always lose to reading in the end. This is why it is less reliable in 9p / Alphago games - at that levels there is simply no acceptable alternative to deep reading. Which is basically what Lee Sedol said after his match.

However, it seems clear that in most of the Master games this winter, Master took an early lead, too early for even deep reading to have much effect. The lead was based upon strategic superiority. The version of AlphaGo that played Lee Sedol last year was rather weaker that the current version. Humans have a lot to learn about strategy from AlphaGo (and probably other programs, as they get stronger). And I am confident that humans will learn a lot.

On the latter I agree. But why strategic superiority? I'm pretty sure it's leads were based on wholeboard minimaxing. This is where it's strength lies, it's main innovation: with NN the tree can be pruned enough that deep minimaxing is possible, even in a non-local sense. And it makes MC effective for the first time. Even in the opening, reading dozens of moves ahead, without significant oversights, always taking the biggest points, best approach moves, rarely losing sente, and even then leaving the best followups, etc. - wholeboard minimaxing was never seen before, so most people underestimate it's power.

Oh, I think that we have seen whole board minimaxing before, going back at least as far as Dosaku in the 17th century. And, as far as computer programs are concerned, I am unaware of any of the stronger programs of their time that did not use whole board minimaxing, even back in the 1990s and probably before.

It is true that the policy network —originally trained on human move choice, BTW

— enables deeper game tree search by move choice, but that move choice does not depend upon search, it is strategic. The value network is also strategic. These are where the AlphaGo's advances lie, and where humans can learn from AlphaGo. It is true that humans have to figure out the lessons by studying AlphaGo's play, but that is why I think that a scientific approach is useful.

Quote:

Fortunately, most of it's leads could be explained by human reasoning as well - these are the cases we can learn from.

You just did apply human reasoning to AlphaGo's moves.

Quote:

always taking the biggest points, best approach moves, rarely losing sente, and even then leaving the best followups

Biggest points, approach moves, sente, and followups are all human concepts.

Kirby · **#305**

moha wrote:

I think human strategy and reasoning works, just does not always work, and will always lose to reading in the end. This is why it is less reliable in 9p / Alphago games - at that levels there is simply no acceptable alternative to deep reading. Which is basically what Lee Sedol said after his match.

I don't disagree. I'm not arguing against reading.

The point Kasparov was trying to make, as I understand it, was that just accepting a computer's result (minimax, I think you referred to it as) is inferior to using human reasoning. Not because human reasoning brings about a better result, but rather, because human reasoning brings about understanding.

Think of it this way: I can calculate integrals on my calculator with greater accuracy and speed than doing by hand. But doing by hand gives deeper understanding of the result.

Besides, when a computer is wrong, there's no way to verify if you always take the computer result at face value.

Bill Spight · **#306**

dfan wrote:

moha wrote:

Alphago's lines are nowehere near optimal (IIRC the NN's are low dan levels by themselves),

This was true in 2015, when they were first putting together the system. My understanding is that the policy network is much stronger now. I would be curious what level it is now, although of course the fact that it can't do any reading is a large handicap.

Well, first, the policy network guides search, so not doing search itself is no handicap.

Edit: Ah! I guess you mean playing with the policy network alone, doing no search is a handicap. OK.

Sorry.

And my impression from the graph of AlphaGo's progress is that it is not close to peaking. It has retired, but there is plenty of room for other programs to forge ahead in the coming years. Now I have to wonder. Could God give Ke Jie 9 stones? (I doubt it, but maybe 6?)

Bill Spight · **#307**

Kirby wrote:

Besides, when a computer is wrong, there's no way to verify if you always take the computer result at face value.

Isaac Asimov wrote a short story based on exactly that point. IIRC, in it a man was chosen to go on a long space journey because he had taught himself to do arithmetic and was not solely dependent upon computers or calculators. As I recall, he was asked in an interview, "Is 3x4 always 12?"

Ah! I DuckDuckGoed it. "The Feeling of Power".

moha · **#308**

Bill Spight wrote:

moha wrote:

wholeboard minimaxing was never seen before, so most people underestimate it's power.

Oh, I think that we have seen whole board minimaxing before, going back at least as far as Dosaku in the 17th century. And, as far as computer programs are concerned, I am unaware of any of the stronger programs of their time that did not use whole board minimaxing, even back in the 1990s and probably before.

I meant effective and deep minimaxing. Without pruning this does not work, reading ahead just a few moves is not really useful. (Neither with poor pruning, often missing key moves.)

Quote:

Fortunately, most of it's leads could be explained by human reasoning as well - these are the cases we can learn from.

You just did apply human reasoning to AlphaGo's moves.

Quote:

always taking the biggest points, best approach moves, rarely losing sente, and even then leaving the best followups

Biggest points, approach moves, sente, and followups are all human concepts.

Sure. As I wrote: human reasoning and strategy works - to an extent. And is the only reasonable option for a human - since a game of pure minimaxing is not really a game meant for intelligent beings (but a tedious task for machines). But search is stronger and more accurate, and works even where strategy fails.

Think about evolution and biology. Survival does not require finding the best answer for a particular problem, but it does require finding a good enough answer for every problem. This is how human intelligence works. Faced with a task, search for tools, concepts, generalizations. It will not give the best answer, but it will never produce a very bad answer either. So a decent strength in go as well - but not more.

Bill Spight · **#309**

moha wrote:

Bill Spight wrote:

moha wrote:

wholeboard minimaxing was never seen before, so most people underestimate it's power.

Oh, I think that we have seen whole board minimaxing before, going back at least as far as Dosaku in the 17th century. And, as far as computer programs are concerned, I am unaware of any of the stronger programs of their time that did not use whole board minimaxing, even back in the 1990s and probably before.

I meant effective and deep minimaxing. Without pruning this does not work, reading ahead just a few moves is not really useful. (Neither with poor pruning, often missing key moves.)

Quote:

Fortunately, most of it's leads could be explained by human reasoning as well - these are the cases we can learn from.

You just did apply human reasoning to AlphaGo's moves.

Quote:

always taking the biggest points, best approach moves, rarely losing sente, and even then leaving the best followups

Biggest points, approach moves, sente, and followups are all human concepts.

Sure. As I wrote: human reasoning and strategy works - to an extent. And is the only reasonable option for a human - since a game of pure minimaxing is not really a game meant for intelligent beings (but a tedious task for machines). But search is stronger and more accurate, and works even where strategy fails.

The point is that it is strategy, in the form of the neural networks, that makes AlphaGo's search effective.

moha · **#310**

I think the NNs have more to do with tactics, local fights and shapes etc. What we see as wholeboard strategy (Kasparov's "understanding"), the Alphago "style", the new moves, shoulder hits, the awareness of influence and weak groups vs. strong groups, the obsession with sente mostly comes from search IMO. The NN could not produce anything new by itself (at least until the human bootstrap step is omitted).

dfan · **#311**

moha wrote:

I think the NNs have more to do with tactics, local fights and shapes etc. What we see as wholeboard strategy (Kasparov's "understanding"), the Alphago "style", the new moves, shoulder hits, the awareness of influence and weak groups vs. strong groups, the obsession with sente mostly comes from search IMO. The NN could not produce anything new by itself (at least until the human bootstrap step is omitted).

Interesting! My impression is exactly the opposite, that the strategy is largely handled by the policy and value networks, and the tactics is largely handled by the MCTS. I don't think there is time to do enough playouts to perform effective whole-board strategy via tree search, especially at its level of "understanding".

Bill Spight · **#312**

moha wrote:

I think the NNs have more to do with tactics, local fights and shapes etc. What we see as wholeboard strategy (Kasparov's "understanding"), the Alphago "style", the new moves, shoulder hits, the awareness of influence and weak groups vs. strong groups, the obsession with sente mostly comes from search IMO. The NN could not produce anything new by itself (at least until the human bootstrap step is omitted).

Well the value network, which produces a probability estimate for winning the game, has to be whole board. As for the policy network, which proposes moves, I think that it does look at small regions of the board, but it looks at the whole board as well. As for producing something new, what does it do when it has not seen the whole board before? AlphaGo's famous 5th line shoulder hit was new, but I am sure that the policy network had seen shoulder hits before, and maybe even some 5th line shoulder hits.

I may be wrong, but my impression is that neural networks generalize from what they are trained on, and so they can produce some new things from time to time.

Uberdude · **#313**

Isn't the point of reinforcement learning that as you train the networks, you are essentially transferring skill that was initially derived from tree search into the weights of the neural network: e.g. AlphaGo started by playing the slide to 2-4 after approaching a 4-4 like humans did in the initial training data, but after millions of games it found that tended to lead to poor results so now the policy network doesn't much like that move.

moha · **#314**

dfan wrote:

moha wrote:

I think the NNs have more to do with tactics, local fights and shapes etc. What we see as wholeboard strategy (Kasparov's "understanding"), the Alphago "style", the new moves, shoulder hits, the awareness of influence and weak groups vs. strong groups, the obsession with sente mostly comes from search IMO. The NN could not produce anything new by itself (at least until the human bootstrap step is omitted).

Interesting! My impression is exactly the opposite, that the strategy is largely handled by the policy and value networks, and the tactics is largely handled by the MCTS. I don't think there is time to do enough playouts to perform effective whole-board strategy via tree search, especially at its level of "understanding".

Well I just wrote my personal opinion, so could also be wrong. The tactics require search of course, but search is only possible with heavy pruning (with the policy net - which may be too error-prone tactically without search). But most of the strategic elements I mentioned are quite dynamic concepts, that would seem pretty hard to handle with a static NN (sente for example!). About MC, I think it is useful at leaf nodes (getting the estimated value of the end position - this also needs the policy net, for reasonable playouts). About the value net, I think it is used as a partial substitute for the costly MC, also near leaf points. But this is purely my guess OC.

I think the high shoulder hit - shoulder hits in general - are present in the NN as a tactical shape, but that it is a good move at that particular situation, even on the 5th line, this seems to be a search-based decision. Generally, I would think the NN is there to drop bad and meaningless moves, not to produce the best moves. But maybe just the best-10 or best-20, that could be searched on (the slide into the corner is still searched for example, at least in some positions, since AG did play it a few times as an attacking move).

dfan · **#315**

Uberdude wrote:

Isn't the point of reinforcement learning that as you train the networks, you are essentially transferring skill that was initially derived from tree search into the weights of the neural network: e.g. AlphaGo started by playing the slide to 2-4 after approaching a 4-4 like humans did in the initial training data, but after millions of games it found that tended to lead to poor results so now the policy network doesn't much like that move.

Yes, exactly.

Bill Spight · **#316**

I am tempted to wait for the speculation about neural nets to die down before talking about an AlphaGo game, but this seems pertinent to the present discussion.

Click Here To Show Diagram Code: [go]$$Bc AlphaGo vs. AlphaGo Game 12 $$ --------------------------------------- $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . 2 . . . . . , . . . . . 1 . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . , . . . . . , . . . . . , . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . 6 . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . 4 . . . . . , . . . . . , 3 . . | $$ | . . . . . 5 . . . . . 7 . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ ---------------------------------------[/go]

is, I suppose, the newest move so far, but it was not an AlphaGo invention. What if its policy network had not been trained on human play, but from scratch? What would its first seven plays look like?

The next play, however, is an AlphaGo innovation. You guessed it, the 3-3 invasion.

moha · **#317**

Uberdude wrote:

Isn't the point of reinforcement learning that as you train the networks, you are essentially transferring skill that was initially derived from tree search into the weights of the neural network: e.g. AlphaGo started by playing the slide to 2-4 after approaching a 4-4 like humans did in the initial training data, but after millions of games it found that tended to lead to poor results so now the policy network doesn't much like that move.

This sounds good in theory, but you may be underestimating the number of possible positions on 19x19, and the extent of "drying up" when you try to fill a wholeboard NN with data.

Partly for Bill's example: the opening is different though, the number of reasonable moves are too high for search, while the number of positions are still not too high, so a wholeboard net is useful at this stage (like an opening book in chess

).

Bill Spight · **#318**

Traditional joseki

Now, the direct 3-3 invasion was frowned on for quite some time. Usually some sort of preparation was made, and in his 21st Century Go set, Go Seigen shows it a number of times, almost always after a light reduction. But this kind of 3-3 invasion by a strong player is a significant innovation by AlphaGo.

Coming along, I was never attracted to this invasion, mainly because I felt that, on a relatively empty board, Black's resulting thickness was too strong, after the usual joseki. We warn beginners against this invasion. I guess we are going to have to stop doing that.

Click Here To Show Diagram Code: [go]$$Wcm8 AlphaGo game 12, Traditional joseki $$ --------------------------------------- $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . 7 5 . . . . | $$ | . . . . . . . . . . . . 8 6 4 3 1 . . | $$ | . . . O . . . . . , . . . . . X 2 . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . , . . . . . , . . . . . , . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . O . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . O . . . . . , . . . . . , X . . | $$ | . . . . . X . . . . . X . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ ---------------------------------------[/go]

looks like the right side to block on, but you never know. Maybe we can think of the 3-3 invasion as a probe.

Anyway,

-

follow the traditional joseki.

Click Here To Show Diagram Code: [go]$$Wcm16 Traditional joseki, continued $$ --------------------------------------- $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . O O . . . . | $$ | . . . . . . . . . . . . X X X O O 3 . | $$ | . . . O . . . . . , . . . . . X X 1 . | $$ | . . . . . . . . . . . . . . . . . 2 . | $$ | . . . . . . . . . . . . . . . . 4 . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . , . . . . . , . . . . . , . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . O . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . O . . . . . , . . . . . , X . . | $$ | . . . . . X . . . . . X . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ ---------------------------------------[/go]

After

-

White has sente, but I do not like White's chances. AlphaGo apparently agrees, because it does not play :w16:

. It plays else where. Even the great Go Seigen did not see that! I suppose that AlphaGo has pretty well killed :w16:

as joseki. It still may be a situational move, OC.

Where do you think AlphaGo played? You can probably guess.

Bill Spight · **#319**

AlphaGo Game 12

Click Here To Show Diagram Code: [go]$$Wcm16 Wedge $$ --------------------------------------- $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . O O . . . . | $$ | . . . . . . . . . . . . X X X O O . . | $$ | . . . O . . . . . , . . . . . X X . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . c . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . b . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . , . . . . . , . . . . . , . . . | $$ | . . . . . . . . . . . . . . . . 1 . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . O . . . . . . . . . . . . . a . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . O . . . . . , . . . . . , X . . | $$ | . . . . . X . . . . . X . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ ---------------------------------------[/go]

A wedge seems obvious, and :w16:

looks like just the right spot. It has room to make a base with "a" or "b", and if White plays at "b", there is room for another extension to "c", if need be. The wedge is not too close to Black's thickness. It feels just right.

Click Here To Show Diagram Code: [go]$$Wcm16 How to approach the wedge? $$ --------------------------------------- $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . O O . . . . | $$ | . . . . . . . . . . . . X X X O O . . | $$ | . . . O . . . . . , . . . . . X X a . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . 2 . . | $$ | . . . , . . . . . , . . . . . , . . . | $$ | . . . . . . . . . . . . . . . . 1 . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . O . . . . . . . . . . . . . 3 . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . O . . . . . , . . . . 4 , X . . | $$ | . . . . . X . . . . . X . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ ---------------------------------------[/go]

Approaching from the bottom has little appeal, despite the saying to drive your opponent's stones towards your thickness. (And is it really thickness? as John Fairbairn might point out.

). Black would not have much of an attack that way. Many strong players of yore, perhaps even into the 20th century, would have had few qualms about approaching :w16:

from the top, despite a bit of overconcentration, anticipating something like :w18:

-

, securing the corner. After :b17:

a White hane at "a" would be bothersome, so they would probably descend to "a" first, with sente, before approaching the wedge.

Well, AlphaGo as Black did neither. The board is open. Where do you think it played? I would not have guessed right, BTW. I am hiding its move if you want to guess. More discussion later.

Baywa · **#320**

Bill Spight wrote:

I may be wrong, but my impression is that neural networks generalize from what they are trained on, and so they can produce some new things from time to time.

(This is also in reply to moha)

I think so, too by what I've heard (and I really, really have to read the Nature paper - but it is quite condensed). The NNs I think try to emulate intuition, gut feeling, coming from experience. That is playing lots and lots of games. For the policy network you give it a 19x19 bitmap of colour-values (black, white, blank). This input activates the network and in the output layer one (or maybe more than one) neuron out of about 19x19 neurons is activated as a candidate for the next move or the hot spot. Now, in order to train such a network you don't have to feed it all possible 10^100 or so board positions. This is the whole point of NNs! Somehow (by magic, or better by variants of the gradient descent method) it learns from a far smaller number of training examples.

Compare that to skilled go players. They played a lot of games and got the gut feeling. They also learned rules (direction of play, proper distance etc.). But I'm not so sure how important this rationalizing really is during the actual go playing process. Somehow they find the best spot to play and play immediately or try a few or many variations to decide which point to choose.

So, to summarize, NNs reduce the complexity of the game and make it then possible to make a choice.

This 'n' that

Who is online