This was true in 2015, when they were first putting together the system. My understanding is that the policy network is much stronger now. I would be curious what level it is now, although of course the fact that it can't do any reading is a large handicap. From the comments of Silver and Hassabis, it sounded like one of the big steps forward with the current version is that the policy network has been trained on millions of self-play games (that include search).moha wrote:Alphago's lines are nowehere near optimal (IIRC the NN's are low dan levels by themselves),
This 'n' that
-
dfan
- Gosei
- Posts: 1598
- Joined: Wed Apr 21, 2010 8:49 am
- Rank: AGA 2k Fox 3d
- GD Posts: 61
- KGS: dfan
- Has thanked: 891 times
- Been thanked: 534 times
- Contact:
Re: This 'n' that
-
Uberdude
- Judan
- Posts: 6727
- Joined: Thu Nov 24, 2011 11:35 am
- Rank: UK 4 dan
- GD Posts: 0
- KGS: Uberdude 4d
- OGS: Uberdude 7d
- Location: Cambridge, UK
- Has thanked: 436 times
- Been thanked: 3718 times
Re: This 'n' that
Even at the end of 2014 before AlphaGo was born and they were just doing what would become the policy network, it was about as strong as Aja Huang (5 dan EGF): https://arxiv.org/abs/1412.6564dfan wrote:This was true in 2015, when they were first putting together the system. My understanding is that the policy network is much stronger now. I would be curious what level it is now, although of course the fact that it can't do any reading is a large handicap. From the comments of Silver and Hassabis, it sounded like one of the big steps forward with the current version is that the policy network has been trained on millions of self-play games (that include search).moha wrote:Alphago's lines are nowehere near optimal (IIRC the NN's are low dan levels by themselves),
-
moha
- Lives in gote
- Posts: 311
- Joined: Wed May 31, 2017 6:49 am
- Rank: 2d
- GD Posts: 0
- Been thanked: 45 times
Re: This 'n' that
AFAIS that paper says it's pro prediction accuracy was similar, not that it's playing strength. The latter was measured against GnuGo and other programs. Anyway, surely the NN also improved since then, but there are upper bounds - I doubt a net without search could reasonably expected to reach even the weakest pro levels. Strength does not depend on the good moves, but the fequency and size of the mistakes.Uberdude wrote: Even at the end of 2014 before AlphaGo was born and they were just doing what would become the policy network, it was about as strong as Aja Huang (5 dan EGF): https://arxiv.org/abs/1412.6564
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: This 'n' that
Oh, I think that we have seen whole board minimaxing before, going back at least as far as Dosaku in the 17th century. And, as far as computer programs are concerned, I am unaware of any of the stronger programs of their time that did not use whole board minimaxing, even back in the 1990s and probably before.moha wrote:On the latter I agree. But why strategic superiority? I'm pretty sure it's leads were based on wholeboard minimaxing. This is where it's strength lies, it's main innovation: with NN the tree can be pruned enough that deep minimaxing is possible, even in a non-local sense. And it makes MC effective for the first time. Even in the opening, reading dozens of moves ahead, without significant oversights, always taking the biggest points, best approach moves, rarely losing sente, and even then leaving the best followups, etc. - wholeboard minimaxing was never seen before, so most people underestimate it's power.Bill Spight wrote:However, it seems clear that in most of the Master games this winter, Master took an early lead, too early for even deep reading to have much effect. The lead was based upon strategic superiority. The version of AlphaGo that played Lee Sedol last year was rather weaker that the current version. Humans have a lot to learn about strategy from AlphaGo (and probably other programs, as they get stronger). And I am confident that humans will learn a lot.moha wrote:I think human strategy and reasoning works, just does not always work, and will always lose to reading in the end. This is why it is less reliable in 9p / Alphago games - at that levels there is simply no acceptable alternative to deep reading. Which is basically what Lee Sedol said after his match.
It is true that the policy network —originally trained on human move choice, BTW
You just did apply human reasoning to AlphaGo's moves.Fortunately, most of it's leads could be explained by human reasoning as well - these are the cases we can learn from.
Biggest points, approach moves, sente, and followups are all human concepts.always taking the biggest points, best approach moves, rarely losing sente, and even then leaving the best followups
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
Kirby
- Honinbo
- Posts: 9553
- Joined: Wed Feb 24, 2010 6:04 pm
- GD Posts: 0
- KGS: Kirby
- Tygem: 커비라고해
- Has thanked: 1583 times
- Been thanked: 1707 times
Re: This 'n' that
I don't disagree. I'm not arguing against reading.moha wrote: I think human strategy and reasoning works, just does not always work, and will always lose to reading in the end. This is why it is less reliable in 9p / Alphago games - at that levels there is simply no acceptable alternative to deep reading. Which is basically what Lee Sedol said after his match.![]()
The point Kasparov was trying to make, as I understand it, was that just accepting a computer's result (minimax, I think you referred to it as) is inferior to using human reasoning. Not because human reasoning brings about a better result, but rather, because human reasoning brings about understanding.
Think of it this way: I can calculate integrals on my calculator with greater accuracy and speed than doing by hand. But doing by hand gives deeper understanding of the result.
Besides, when a computer is wrong, there's no way to verify if you always take the computer result at face value.
be immersed
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: This 'n' that
Well, first, the policy network guides search, so not doing search itself is no handicap.dfan wrote:This was true in 2015, when they were first putting together the system. My understanding is that the policy network is much stronger now. I would be curious what level it is now, although of course the fact that it can't do any reading is a large handicap.moha wrote:Alphago's lines are nowehere near optimal (IIRC the NN's are low dan levels by themselves),
Edit: Ah! I guess you mean playing with the policy network alone, doing no search is a handicap. OK.
And my impression from the graph of AlphaGo's progress is that it is not close to peaking. It has retired, but there is plenty of room for other programs to forge ahead in the coming years. Now I have to wonder. Could God give Ke Jie 9 stones? (I doubt it, but maybe 6?)
Last edited by Bill Spight on Thu Jun 22, 2017 6:08 am, edited 1 time in total.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: This 'n' that
Isaac Asimov wrote a short story based on exactly that point. IIRC, in it a man was chosen to go on a long space journey because he had taught himself to do arithmetic and was not solely dependent upon computers or calculators. As I recall, he was asked in an interview, "Is 3x4 always 12?"Kirby wrote:Besides, when a computer is wrong, there's no way to verify if you always take the computer result at face value.
Ah! I DuckDuckGoed it. "The Feeling of Power".
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
moha
- Lives in gote
- Posts: 311
- Joined: Wed May 31, 2017 6:49 am
- Rank: 2d
- GD Posts: 0
- Been thanked: 45 times
Re: This 'n' that
I meant effective and deep minimaxing. Without pruning this does not work, reading ahead just a few moves is not really useful. (Neither with poor pruning, often missing key moves.)Bill Spight wrote:Oh, I think that we have seen whole board minimaxing before, going back at least as far as Dosaku in the 17th century. And, as far as computer programs are concerned, I am unaware of any of the stronger programs of their time that did not use whole board minimaxing, even back in the 1990s and probably before.moha wrote:wholeboard minimaxing was never seen before, so most people underestimate it's power.
Sure. As I wrote: human reasoning and strategy works - to an extent. And is the only reasonable option for a human - since a game of pure minimaxing is not really a game meant for intelligent beings (but a tedious task for machines). But search is stronger and more accurate, and works even where strategy fails.You just did apply human reasoning to AlphaGo's moves.Fortunately, most of it's leads could be explained by human reasoning as well - these are the cases we can learn from.Biggest points, approach moves, sente, and followups are all human concepts.always taking the biggest points, best approach moves, rarely losing sente, and even then leaving the best followups
Think about evolution and biology. Survival does not require finding the best answer for a particular problem, but it does require finding a good enough answer for every problem. This is how human intelligence works. Faced with a task, search for tools, concepts, generalizations. It will not give the best answer, but it will never produce a very bad answer either. So a decent strength in go as well - but not more.
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: This 'n' that
The point is that it is strategy, in the form of the neural networks, that makes AlphaGo's search effective.moha wrote:I meant effective and deep minimaxing. Without pruning this does not work, reading ahead just a few moves is not really useful. (Neither with poor pruning, often missing key moves.)Bill Spight wrote:Oh, I think that we have seen whole board minimaxing before, going back at least as far as Dosaku in the 17th century. And, as far as computer programs are concerned, I am unaware of any of the stronger programs of their time that did not use whole board minimaxing, even back in the 1990s and probably before.moha wrote:wholeboard minimaxing was never seen before, so most people underestimate it's power.
Sure. As I wrote: human reasoning and strategy works - to an extent. And is the only reasonable option for a human - since a game of pure minimaxing is not really a game meant for intelligent beings (but a tedious task for machines). But search is stronger and more accurate, and works even where strategy fails.You just did apply human reasoning to AlphaGo's moves.Fortunately, most of it's leads could be explained by human reasoning as well - these are the cases we can learn from.Biggest points, approach moves, sente, and followups are all human concepts.always taking the biggest points, best approach moves, rarely losing sente, and even then leaving the best followups
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
moha
- Lives in gote
- Posts: 311
- Joined: Wed May 31, 2017 6:49 am
- Rank: 2d
- GD Posts: 0
- Been thanked: 45 times
Re: This 'n' that
I think the NNs have more to do with tactics, local fights and shapes etc. What we see as wholeboard strategy (Kasparov's "understanding"), the Alphago "style", the new moves, shoulder hits, the awareness of influence and weak groups vs. strong groups, the obsession with sente mostly comes from search IMO. The NN could not produce anything new by itself (at least until the human bootstrap step is omitted).
-
dfan
- Gosei
- Posts: 1598
- Joined: Wed Apr 21, 2010 8:49 am
- Rank: AGA 2k Fox 3d
- GD Posts: 61
- KGS: dfan
- Has thanked: 891 times
- Been thanked: 534 times
- Contact:
Re: This 'n' that
Interesting! My impression is exactly the opposite, that the strategy is largely handled by the policy and value networks, and the tactics is largely handled by the MCTS. I don't think there is time to do enough playouts to perform effective whole-board strategy via tree search, especially at its level of "understanding".moha wrote:I think the NNs have more to do with tactics, local fights and shapes etc. What we see as wholeboard strategy (Kasparov's "understanding"), the Alphago "style", the new moves, shoulder hits, the awareness of influence and weak groups vs. strong groups, the obsession with sente mostly comes from search IMO. The NN could not produce anything new by itself (at least until the human bootstrap step is omitted).
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: This 'n' that
Well the value network, which produces a probability estimate for winning the game, has to be whole board. As for the policy network, which proposes moves, I think that it does look at small regions of the board, but it looks at the whole board as well. As for producing something new, what does it do when it has not seen the whole board before? AlphaGo's famous 5th line shoulder hit was new, but I am sure that the policy network had seen shoulder hits before, and maybe even some 5th line shoulder hits.moha wrote:I think the NNs have more to do with tactics, local fights and shapes etc. What we see as wholeboard strategy (Kasparov's "understanding"), the Alphago "style", the new moves, shoulder hits, the awareness of influence and weak groups vs. strong groups, the obsession with sente mostly comes from search IMO. The NN could not produce anything new by itself (at least until the human bootstrap step is omitted).
I may be wrong, but my impression is that neural networks generalize from what they are trained on, and so they can produce some new things from time to time.
Last edited by Bill Spight on Thu Jun 22, 2017 6:56 am, edited 1 time in total.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
Uberdude
- Judan
- Posts: 6727
- Joined: Thu Nov 24, 2011 11:35 am
- Rank: UK 4 dan
- GD Posts: 0
- KGS: Uberdude 4d
- OGS: Uberdude 7d
- Location: Cambridge, UK
- Has thanked: 436 times
- Been thanked: 3718 times
Re: This 'n' that
Isn't the point of reinforcement learning that as you train the networks, you are essentially transferring skill that was initially derived from tree search into the weights of the neural network: e.g. AlphaGo started by playing the slide to 2-4 after approaching a 4-4 like humans did in the initial training data, but after millions of games it found that tended to lead to poor results so now the policy network doesn't much like that move.
-
moha
- Lives in gote
- Posts: 311
- Joined: Wed May 31, 2017 6:49 am
- Rank: 2d
- GD Posts: 0
- Been thanked: 45 times
Re: This 'n' that
Well I just wrote my personal opinion, so could also be wrong. The tactics require search of course, but search is only possible with heavy pruning (with the policy net - which may be too error-prone tactically without search). But most of the strategic elements I mentioned are quite dynamic concepts, that would seem pretty hard to handle with a static NN (sente for example!). About MC, I think it is useful at leaf nodes (getting the estimated value of the end position - this also needs the policy net, for reasonable playouts). About the value net, I think it is used as a partial substitute for the costly MC, also near leaf points. But this is purely my guess OC.dfan wrote:Interesting! My impression is exactly the opposite, that the strategy is largely handled by the policy and value networks, and the tactics is largely handled by the MCTS. I don't think there is time to do enough playouts to perform effective whole-board strategy via tree search, especially at its level of "understanding".moha wrote:I think the NNs have more to do with tactics, local fights and shapes etc. What we see as wholeboard strategy (Kasparov's "understanding"), the Alphago "style", the new moves, shoulder hits, the awareness of influence and weak groups vs. strong groups, the obsession with sente mostly comes from search IMO. The NN could not produce anything new by itself (at least until the human bootstrap step is omitted).
I think the high shoulder hit - shoulder hits in general - are present in the NN as a tactical shape, but that it is a good move at that particular situation, even on the 5th line, this seems to be a search-based decision. Generally, I would think the NN is there to drop bad and meaningless moves, not to produce the best moves. But maybe just the best-10 or best-20, that could be searched on (the slide into the corner is still searched for example, at least in some positions, since AG did play it a few times as an attacking move).
Last edited by moha on Thu Jun 22, 2017 7:19 am, edited 1 time in total.
-
dfan
- Gosei
- Posts: 1598
- Joined: Wed Apr 21, 2010 8:49 am
- Rank: AGA 2k Fox 3d
- GD Posts: 61
- KGS: dfan
- Has thanked: 891 times
- Been thanked: 534 times
- Contact:
Re: This 'n' that
Yes, exactly.Uberdude wrote:Isn't the point of reinforcement learning that as you train the networks, you are essentially transferring skill that was initially derived from tree search into the weights of the neural network: e.g. AlphaGo started by playing the slide to 2-4 after approaching a 4-4 like humans did in the initial training data, but after millions of games it found that tended to lead to poor results so now the policy network doesn't much like that move.