This 'n' that

dfan · Post by **dfan** » Wed Jun 21, 2017 6:19 am

moha wrote:Alphago's lines are nowehere near optimal (IIRC the NN's are low dan levels by themselves),

This was true in 2015, when they were first putting together the system. My understanding is that the policy network is much stronger now. I would be curious what level it is now, although of course the fact that it can't do any reading is a large handicap. From the comments of Silver and Hassabis, it sounded like one of the big steps forward with the current version is that the policy network has been trained on millions of self-play games (that include search).

Uberdude · Post by **Uberdude** » Wed Jun 21, 2017 6:41 am

dfan wrote:
moha wrote:Alphago's lines are nowehere near optimal (IIRC the NN's are low dan levels by themselves),
This was true in 2015, when they were first putting together the system. My understanding is that the policy network is much stronger now. I would be curious what level it is now, although of course the fact that it can't do any reading is a large handicap. From the comments of Silver and Hassabis, it sounded like one of the big steps forward with the current version is that the policy network has been trained on millions of self-play games (that include search).

Even at the end of 2014 before AlphaGo was born and they were just doing what would become the policy network, it was about as strong as Aja Huang (5 dan EGF): https://arxiv.org/abs/1412.6564

moha · Post by **moha** » Wed Jun 21, 2017 7:58 am

Uberdude wrote: Even at the end of 2014 before AlphaGo was born and they were just doing what would become the policy network, it was about as strong as Aja Huang (5 dan EGF): https://arxiv.org/abs/1412.6564

AFAIS that paper says it's pro prediction accuracy was similar, not that it's playing strength. The latter was measured against GnuGo and other programs. Anyway, surely the NN also improved since then, but there are upper bounds - I doubt a net without search could reasonably expected to reach even the weakest pro levels. Strength does not depend on the good moves, but the fequency and size of the mistakes.

Bill Spight · Post by **Bill Spight** » Wed Jun 21, 2017 8:15 am

moha wrote:
Bill Spight wrote:
moha wrote:I think human strategy and reasoning works, just does not always work, and will always lose to reading in the end. This is why it is less reliable in 9p / Alphago games - at that levels there is simply no acceptable alternative to deep reading. Which is basically what Lee Sedol said after his match.
However, it seems clear that in most of the Master games this winter, Master took an early lead, too early for even deep reading to have much effect. The lead was based upon strategic superiority. The version of AlphaGo that played Lee Sedol last year was rather weaker that the current version. Humans have a lot to learn about strategy from AlphaGo (and probably other programs, as they get stronger). And I am confident that humans will learn a lot.
On the latter I agree. But why strategic superiority? I'm pretty sure it's leads were based on wholeboard minimaxing. This is where it's strength lies, it's main innovation: with NN the tree can be pruned enough that deep minimaxing is possible, even in a non-local sense. And it makes MC effective for the first time. Even in the opening, reading dozens of moves ahead, without significant oversights, always taking the biggest points, best approach moves, rarely losing sente, and even then leaving the best followups, etc. - wholeboard minimaxing was never seen before, so most people underestimate it's power.

Oh, I think that we have seen whole board minimaxing before, going back at least as far as Dosaku in the 17th century. And, as far as computer programs are concerned, I am unaware of any of the stronger programs of their time that did not use whole board minimaxing, even back in the 1990s and probably before.

It is true that the policy network —originally trained on human move choice, BTW

— enables deeper game tree search by move choice, but that move choice does not depend upon search, it is strategic. The value network is also strategic. These are where the AlphaGo's advances lie, and where humans can learn from AlphaGo. It is true that humans have to figure out the lessons by studying AlphaGo's play, but that is why I think that a scientific approach is useful.

Fortunately, most of it's leads could be explained by human reasoning as well - these are the cases we can learn from.

You just did apply human reasoning to AlphaGo's moves.

always taking the biggest points, best approach moves, rarely losing sente, and even then leaving the best followups

Biggest points, approach moves, sente, and followups are all human concepts.

Kirby · Post by **Kirby** » Wed Jun 21, 2017 8:25 am

moha wrote: I think human strategy and reasoning works, just does not always work, and will always lose to reading in the end. This is why it is less reliable in 9p / Alphago games - at that levels there is simply no acceptable alternative to deep reading. Which is basically what Lee Sedol said after his match.

I don't disagree. I'm not arguing against reading.

The point Kasparov was trying to make, as I understand it, was that just accepting a computer's result (minimax, I think you referred to it as) is inferior to using human reasoning. Not because human reasoning brings about a better result, but rather, because human reasoning brings about understanding.

Think of it this way: I can calculate integrals on my calculator with greater accuracy and speed than doing by hand. But doing by hand gives deeper understanding of the result.

Besides, when a computer is wrong, there's no way to verify if you always take the computer result at face value.

Bill Spight · Post by **Bill Spight** » Wed Jun 21, 2017 8:28 am

dfan wrote:
moha wrote:Alphago's lines are nowehere near optimal (IIRC the NN's are low dan levels by themselves),
This was true in 2015, when they were first putting together the system. My understanding is that the policy network is much stronger now. I would be curious what level it is now, although of course the fact that it can't do any reading is a large handicap.

Well, first, the policy network guides search, so not doing search itself is no handicap.

Edit: Ah! I guess you mean playing with the policy network alone, doing no search is a handicap. OK.

Sorry.

And my impression from the graph of AlphaGo's progress is that it is not close to peaking. It has retired, but there is plenty of room for other programs to forge ahead in the coming years. Now I have to wonder. Could God give Ke Jie 9 stones? (I doubt it, but maybe 6?)

Bill Spight · Post by **Bill Spight** » Wed Jun 21, 2017 8:35 am

Kirby wrote:Besides, when a computer is wrong, there's no way to verify if you always take the computer result at face value.

Isaac Asimov wrote a short story based on exactly that point. IIRC, in it a man was chosen to go on a long space journey because he had taught himself to do arithmetic and was not solely dependent upon computers or calculators. As I recall, he was asked in an interview, "Is 3x4 always 12?"

Ah! I DuckDuckGoed it. "The Feeling of Power".

moha · Post by **moha** » Wed Jun 21, 2017 8:53 am

Bill Spight wrote:
moha wrote:wholeboard minimaxing was never seen before, so most people underestimate it's power.
Oh, I think that we have seen whole board minimaxing before, going back at least as far as Dosaku in the 17th century. And, as far as computer programs are concerned, I am unaware of any of the stronger programs of their time that did not use whole board minimaxing, even back in the 1990s and probably before.

I meant effective and deep minimaxing. Without pruning this does not work, reading ahead just a few moves is not really useful. (Neither with poor pruning, often missing key moves.)

Fortunately, most of it's leads could be explained by human reasoning as well - these are the cases we can learn from.
You just did apply human reasoning to AlphaGo's moves.
always taking the biggest points, best approach moves, rarely losing sente, and even then leaving the best followups
Biggest points, approach moves, sente, and followups are all human concepts.

Sure. As I wrote: human reasoning and strategy works - to an extent. And is the only reasonable option for a human - since a game of pure minimaxing is not really a game meant for intelligent beings (but a tedious task for machines). But search is stronger and more accurate, and works even where strategy fails.

Think about evolution and biology. Survival does not require finding the best answer for a particular problem, but it does require finding a good enough answer for every problem. This is how human intelligence works. Faced with a task, search for tools, concepts, generalizations. It will not give the best answer, but it will never produce a very bad answer either. So a decent strength in go as well - but not more.

Bill Spight · Post by **Bill Spight** » Wed Jun 21, 2017 9:38 am

moha wrote:
Bill Spight wrote:
moha wrote:wholeboard minimaxing was never seen before, so most people underestimate it's power.
Oh, I think that we have seen whole board minimaxing before, going back at least as far as Dosaku in the 17th century. And, as far as computer programs are concerned, I am unaware of any of the stronger programs of their time that did not use whole board minimaxing, even back in the 1990s and probably before.
I meant effective and deep minimaxing. Without pruning this does not work, reading ahead just a few moves is not really useful. (Neither with poor pruning, often missing key moves.)

Fortunately, most of it's leads could be explained by human reasoning as well - these are the cases we can learn from.
You just did apply human reasoning to AlphaGo's moves.
always taking the biggest points, best approach moves, rarely losing sente, and even then leaving the best followups
Biggest points, approach moves, sente, and followups are all human concepts.
Sure. As I wrote: human reasoning and strategy works - to an extent. And is the only reasonable option for a human - since a game of pure minimaxing is not really a game meant for intelligent beings (but a tedious task for machines). But search is stronger and more accurate, and works even where strategy fails.

The point is that it is strategy, in the form of the neural networks, that makes AlphaGo's search effective.

moha · Post by **moha** » Thu Jun 22, 2017 6:26 am

I think the NNs have more to do with tactics, local fights and shapes etc. What we see as wholeboard strategy (Kasparov's "understanding"), the Alphago "style", the new moves, shoulder hits, the awareness of influence and weak groups vs. strong groups, the obsession with sente mostly comes from search IMO. The NN could not produce anything new by itself (at least until the human bootstrap step is omitted).

dfan · Post by **dfan** » Thu Jun 22, 2017 6:45 am

moha wrote:I think the NNs have more to do with tactics, local fights and shapes etc. What we see as wholeboard strategy (Kasparov's "understanding"), the Alphago "style", the new moves, shoulder hits, the awareness of influence and weak groups vs. strong groups, the obsession with sente mostly comes from search IMO. The NN could not produce anything new by itself (at least until the human bootstrap step is omitted).

Interesting! My impression is exactly the opposite, that the strategy is largely handled by the policy and value networks, and the tactics is largely handled by the MCTS. I don't think there is time to do enough playouts to perform effective whole-board strategy via tree search, especially at its level of "understanding".

Bill Spight · Post by **Bill Spight** » Thu Jun 22, 2017 6:54 am

moha wrote:I think the NNs have more to do with tactics, local fights and shapes etc. What we see as wholeboard strategy (Kasparov's "understanding"), the Alphago "style", the new moves, shoulder hits, the awareness of influence and weak groups vs. strong groups, the obsession with sente mostly comes from search IMO. The NN could not produce anything new by itself (at least until the human bootstrap step is omitted).

Well the value network, which produces a probability estimate for winning the game, has to be whole board. As for the policy network, which proposes moves, I think that it does look at small regions of the board, but it looks at the whole board as well. As for producing something new, what does it do when it has not seen the whole board before? AlphaGo's famous 5th line shoulder hit was new, but I am sure that the policy network had seen shoulder hits before, and maybe even some 5th line shoulder hits.

I may be wrong, but my impression is that neural networks generalize from what they are trained on, and so they can produce some new things from time to time.

Uberdude · Post by **Uberdude** » Thu Jun 22, 2017 6:55 am

Isn't the point of reinforcement learning that as you train the networks, you are essentially transferring skill that was initially derived from tree search into the weights of the neural network: e.g. AlphaGo started by playing the slide to 2-4 after approaching a 4-4 like humans did in the initial training data, but after millions of games it found that tended to lead to poor results so now the policy network doesn't much like that move.

moha · Post by **moha** » Thu Jun 22, 2017 7:09 am

dfan wrote:
moha wrote:I think the NNs have more to do with tactics, local fights and shapes etc. What we see as wholeboard strategy (Kasparov's "understanding"), the Alphago "style", the new moves, shoulder hits, the awareness of influence and weak groups vs. strong groups, the obsession with sente mostly comes from search IMO. The NN could not produce anything new by itself (at least until the human bootstrap step is omitted).
Interesting! My impression is exactly the opposite, that the strategy is largely handled by the policy and value networks, and the tactics is largely handled by the MCTS. I don't think there is time to do enough playouts to perform effective whole-board strategy via tree search, especially at its level of "understanding".

Well I just wrote my personal opinion, so could also be wrong. The tactics require search of course, but search is only possible with heavy pruning (with the policy net - which may be too error-prone tactically without search). But most of the strategic elements I mentioned are quite dynamic concepts, that would seem pretty hard to handle with a static NN (sente for example!). About MC, I think it is useful at leaf nodes (getting the estimated value of the end position - this also needs the policy net, for reasonable playouts). About the value net, I think it is used as a partial substitute for the costly MC, also near leaf points. But this is purely my guess OC.

I think the high shoulder hit - shoulder hits in general - are present in the NN as a tactical shape, but that it is a good move at that particular situation, even on the 5th line, this seems to be a search-based decision. Generally, I would think the NN is there to drop bad and meaningless moves, not to produce the best moves. But maybe just the best-10 or best-20, that could be searched on (the slide into the corner is still searched for example, at least in some positions, since AG did play it a few times as an attacking move).

dfan · Post by **dfan** » Thu Jun 22, 2017 7:09 am

Uberdude wrote:Isn't the point of reinforcement learning that as you train the networks, you are essentially transferring skill that was initially derived from tree search into the weights of the neural network: e.g. AlphaGo started by playing the slide to 2-4 after approaching a 4-4 like humans did in the initial training data, but after millions of games it found that tended to lead to poor results so now the policy network doesn't much like that move.

Yes, exactly.

Life In 19x19

This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that