Kirby wrote:
Bill Spight wrote:
In general one maximizes the probability of winning by maximizing the territory difference.
<snip>
Nonetheless, it appears that AlphaGo prefers this type of strategy. It prefers a state of greater certainty of winning the game, even if it means making point-losing plays.
I think seeing the wood for the trees might be a help in this thread.
We know that go in general cannot be solved by "brute force". On the other hand for certain endgame positions it can be, by filtering out candidate plays first, and then looking at all possible orders of play (to first approximation). The trouble being that looking at all orders of play hits a fast-growing function, the factorial. Anyone with a feel for these things knows that 20! is much more serious than 10!, for example.
So, AlphaGo in general seems to have succeeded in dominating the brute force requirement, well enough, by some very sharp filtering and sampling of orders of play. The program can cope, in a classy fashion, with different kinds of middlegame challenges, which is the primary determinant of strength (not being the butter to your opponent's hot knife in fighting).
Come the endgame, as far as we know, it does not change regime. Indeed it would be dangerous to assume that life-and-death issues or
ko are off the menu just because plays are supposedly smallish and generally local. Human players who switch off the shields at this point will lose some games memorably.
When it sees the shore, the program is going to swim to it as directly as it can. We could say this is "instinctive", because effectively its brain has been hardwired to do that.
Near the end of the game its sampling of lines will start getting somewhat closer to a complete view of ways to play. It seems quite possible that a constructed position could defeat that sampling: something a chess-player might call "problem-like", with a rather different resonance. In CGT jargon, "hidden secrets" are probably implicit throughout the game. The concept can be illustrated effectively in endgame positions; it doesn't mean that is their natural habitat.
I don't think we know yet whether further training of the type already done will have much impact on the finer endgame points. It may not be so easy to "improve AlphaGo in the endgame" within the DeepMind paradigm.