possible to improve AlphaGo in endgame

RobertJasiek · Post by **RobertJasiek** » Wed Mar 16, 2016 1:47 pm

It is very unlikely that AlphaGo overlooked move 78. Instead, it would overlook the best sequence with the correct timing in the context of a longer sequence incorporating the neighbour fights.

Kirby · Post by **Kirby** » Wed Mar 16, 2016 1:54 pm

gowan wrote: I may be confused but I don't see how go is a zero sum game. I think zero-sum means that what one player wins the other loses, or that the two players' payoffs sum to zero. If go had payoffs so that the winner wins, say $1, and the loser loses the same amount $1, then it would be zero-sum, but except in gambling situations there is no payoffs.

Of course go is a game of "perfect information" but that is something other than zero-sum.

I disagree.

The payoff of the game is winning or losing.

Wikipedia wrote: In game theory and economic theory, a zero-sum game is a mathematical representation of a situation in which each participant's gain (or loss) of utility is exactly balanced by the losses (or gains) of the utility of the other participant(s).

The utility that you get from playing the game is the win. This is balanced with the negative utility you get from losing.

You can contrast this with the idea that getting more points in Go provides more utility than getting fewer points. If getting more points in Go provided additional utility than getting fewer points, then the goal would be to maximize points.

But this is not the goal. The goal is to maximize your utility, which is defined by winning that particular game.

You can think of it like you said as getting $1 if you win, and losing $1 if you lose.

You don't have to think of it in terms of dollars. Specifically, the payoff for winning a game of Go is 1-unit (the win), and the payoff for losing the game is also 1-unit (the loss).

Anyway, the point I want to express is that it's not the goal of a computer AI to maximize points, because that is not the utility of the game. The utility of the game is the win, which is worth 1 unit - not some point value you are trying to maximize.

Kirby · Post by **Kirby** » Wed Mar 16, 2016 1:55 pm

RobertJasiek wrote:It is very unlikely that AlphaGo overlooked move 78. Instead, it would overlook the best sequence with the correct timing in the context of a longer sequence incorporating the neighbour fights.

I don't think it overlooked the move, but since it was an unusual variation, I don't believe as much computation had been put into getting a successful variation from that branch.

zorq · Post by **zorq** » Wed Mar 16, 2016 3:51 pm

Bill Spight wrote: In general one maximizes the probability of winning by maximizing the territory difference.

This is clearly false. If one is greedy, one may be punished.

Bill Spight wrote:Many of the plays that these programs make when ahead do not appear to be playing safe, they look silly, particularly in the endgame.

They only look silly to entities equipped with a theorem prover, who can prove to themselves that certain moves are useless or inferior. Alphago is not equipped with a theorem prover.

Temp · Post by **Temp** » Wed Mar 16, 2016 4:12 pm

Seems it has problems calculating when stones are surrounded and in a state of semeai. We saw it with that endgame sequence in Game 2 in the upper right where it threw away about 5 points when it elected to capture those stones in the center. I think it did so because Lee and AlphaGo had two sets of 3 stones sort of surrounding each other. I don't think a computer would care or not care about losing points endgame. It should calculate and make the biggest move regardless of how far ahead it is. My guess is there is some sort of calculation error going on. There was also the obvious wedge in Game 4. Then Game 5 the sequence in the bottom right. All had a sort of semeai involved.

On a side note, does anyone know what is going on with AlphaGo? Will they work on it for a month or two and let Ke Jie play it or not? I'm kind of assuming they want to move onto other things if they want to build something similar for solving cancer or other problems, but at the same time I don't want them to. I want to see AlphaGo play more pros. Really all the top pros should have a chance to play it a few times. Not every game has to be a televised event.

Bill Spight · Post by **Bill Spight** » Wed Mar 16, 2016 5:04 pm

zorq wrote:
Bill Spight wrote: In general one maximizes the probability of winning by maximizing the territory difference.
This is clearly false. If one is greedy, one may be punished.

If one is greedy and is punished, one has not maximized the territory difference.

Bill Spight · Post by **Bill Spight** » Wed Mar 16, 2016 5:12 pm

Kirby wrote:
Bill Spight wrote:In general one maximizes the probability of winning by maximizing the territory difference.

In short, the computer miscalculated the situation due to the complexity that was added by a very unusual move.

If I were the computer, and I wanted to increase my chances of winning, I would want to avoid this type of complexity that would result in my misreading of the situation.

Is that what it means in the program to maximize the probability of winning?

So maximizing my chances of winning isn't necessarily about always maximizing the difference in score.

(Emphasis mine.)

That's not what I said, is it?

mitsun · Post by **mitsun** » Thu Mar 17, 2016 10:22 am

Bill Spight wrote:
zorq wrote:
Bill Spight wrote: In general one maximizes the probability of winning by maximizing the territory difference.
This is clearly false. If one is greedy, one may be punished.
If one is greedy and is punished, one has not maximized the territory difference.

I presume AlphaGo can calculate, for an endgame position, for every possible move, two quantities: probability that this move will lead to a win, expected margin of win for this move. Are you really stating that the move which maximizes the second of these quantities will necessarily maximize the first? That seems clearly false to me. If no single move maximizes both of these quantities, which move do you think the computer should play?

Kirby · Post by **Kirby** » Thu Mar 17, 2016 11:10 am

Bill Spight wrote:
Kirby wrote:
Bill Spight wrote:In general one maximizes the probability of winning by maximizing the territory difference.

In short, the computer miscalculated the situation due to the complexity that was added by a very unusual move.

If I were the computer, and I wanted to increase my chances of winning, I would want to avoid this type of complexity that would result in my misreading of the situation.
Is that what it means in the program to maximize the probability of winning?

So maximizing my chances of winning isn't necessarily about always maximizing the difference in score.
(Emphasis mine.)

That's not what I said, is it?

You did not specifically say that maximizing the territorial difference always is the best way to increase the probability of winning, but you suggested that it generally was. There may be cases where maximizing the territorial difference leads to a greater chance of winning. Like you said, it allows leeway for making mistakes.

My argument is, rather, that a computer need not base its strategy in such a way to account for its mistakes. Rather, the computer can adopt a strategy to reduce its own uncertainty in how the game will progress. The more certain the computer is of how the rest of the game will proceed, the more easily it can make decisions on what to do later in the game.

This is my view of what the computer is doing. It makes plays that appear to be point-losing at times, but result in less uncertainty as to how the rest of the game will play out. The better certainty the computer has in the rest of the game, the better position it can be to make decisions that will likely to lead to a win.

Admittedly, this uncertainty-reducing strategy comes at a cost: if the result of making the game less uncertain leads to a losing board position, the computer has failed.

Nonetheless, it appears that AlphaGo prefers this type of strategy. It prefers a state of greater certainty of winning the game, even if it means making point-losing plays.

gowan · Post by **gowan** » Thu Mar 17, 2016 1:02 pm

Does AlphaGo have a strategy when it plays other than a greedy algorithm of choosing the best move each time? For example can it decide to play a moyo game before the game starts?

Charles Matthews · Post by **Charles Matthews** » Thu Mar 17, 2016 1:29 pm

Kirby wrote:
Bill Spight wrote:In general one maximizes the probability of winning by maximizing the territory difference.
<snip>

Nonetheless, it appears that AlphaGo prefers this type of strategy. It prefers a state of greater certainty of winning the game, even if it means making point-losing plays.

I think seeing the wood for the trees might be a help in this thread.

We know that go in general cannot be solved by "brute force". On the other hand for certain endgame positions it can be, by filtering out candidate plays first, and then looking at all possible orders of play (to first approximation). The trouble being that looking at all orders of play hits a fast-growing function, the factorial. Anyone with a feel for these things knows that 20! is much more serious than 10!, for example.

So, AlphaGo in general seems to have succeeded in dominating the brute force requirement, well enough, by some very sharp filtering and sampling of orders of play. The program can cope, in a classy fashion, with different kinds of middlegame challenges, which is the primary determinant of strength (not being the butter to your opponent's hot knife in fighting).

Come the endgame, as far as we know, it does not change regime. Indeed it would be dangerous to assume that life-and-death issues or ko are off the menu just because plays are supposedly smallish and generally local. Human players who switch off the shields at this point will lose some games memorably.

When it sees the shore, the program is going to swim to it as directly as it can. We could say this is "instinctive", because effectively its brain has been hardwired to do that.

Near the end of the game its sampling of lines will start getting somewhat closer to a complete view of ways to play. It seems quite possible that a constructed position could defeat that sampling: something a chess-player might call "problem-like", with a rather different resonance. In CGT jargon, "hidden secrets" are probably implicit throughout the game. The concept can be illustrated effectively in endgame positions; it doesn't mean that is their natural habitat.

I don't think we know yet whether further training of the type already done will have much impact on the finer endgame points. It may not be so easy to "improve AlphaGo in the endgame" within the DeepMind paradigm.

Bill Spight · Post by **Bill Spight** » Thu Mar 17, 2016 1:50 pm

Bill Spight wrote:
zorq wrote:
Bill Spight wrote: In general one maximizes the probability of winning by maximizing the territory difference.
This is clearly false. If one is greedy, one may be punished.
If one is greedy and is punished, one has not maximized the territory difference.

mitsun wrote:I presume AlphaGo can calculate, for an endgame position, for every possible move, two quantities: probability that this move will lead to a win, expected margin of win for this move.

Do you mean point margin? I think not.

Are you really stating that the move which maximizes the second of these quantities will necessarily maximize the first? That seems clearly false to me.

Me, too.

If no single move maximizes both of these quantities, which move do you think the computer should play?

For random rollouts, I know what they mean by probability of winning. For the evaluation network, I do not. My approach, when playing safe, would be to minimize the maximum error of my estimation of my chances of winning. I purposely avoid using the term, probability, because it is not a probability estimate, as commonly understood, for example, in a Bayesian or frequentist approach.

Generally speaking, which is what I am doing, increasing the territory difference provides a safety buffer against both misreading, now or later, and misestimating the probability of winning.

Bill Spight · Post by **Bill Spight** » Thu Mar 17, 2016 2:05 pm

Charles Matthews wrote:I don't think we know yet whether further training of the type already done will have much impact on the finer endgame points. It may not be so easy to "improve AlphaGo in the endgame" within the DeepMind paradigm.

As someone has already pointed out, AlphaGo focuses on winning the game at hand. To do so it has its own heuristics. These are obviously different from the human developed heuristics of the endgame, such as evaluating the size of plays. We do know that in the vast majority of cases, playing the largest play is best, and we also know how to recognize some situations when that is not the case. One advantage of the human heuristics is that they apply in general, not just to the game at hand. So it is still worthwhile for humans to study them.

Could an AlphaGo type program be trained for general endgame play? I think so, but nobody has done so yet. Such a program could be a good endgame tutor for humans. It might even be a good add-on to AlphaGo for the endgame stage.

Charles Matthews · Post by **Charles Matthews** » Thu Mar 17, 2016 2:27 pm

Bill Spight wrote:Could an AlphaGo type program be trained for general endgame play? I think so, but nobody has done so yet.

The standard Demis Hassabis lecture/stump speech is that all things become possible, as engineering matters, once "general artificial intelligence" comes onstream. Some crumbs would fall from the corporate table: this application would require a large body of training sequences.

I would actually make a program of this type to play "Archipelago". I'm not sure I have mentioned this go variant, ever.

As a training game for go players, it is conceived of as a multi-board version of go (disjunctive games) where you deal a dozen small graphs off the top of a pack. Then you just play with Tromp-Taylor style rules, with something done about komi per board.

I think the advantage over 19x19 monoboard go is that there would probably be more chance of bootstrapping the training up from simple examples. Clearly CGT principles are there to be learned, via disjunctive games with finite graphs.

Assuming that all makes sense (abandoning the homogeneity of the big board, and its fighting complexity, introducing disjunction consciously, allowing superko to rule some kinds of small-board incident) I think breeding up superhuman understanding of endgame theory becomes a feasible project.

Bill Spight wrote:Such a program could be a good endgame tutor for humans. It might even be a good add-on to AlphaGo for the endgame stage.

Yes, the whole deal with a genuine strong AI go player is that illustrative material can become a commodity.

Bill Spight · Post by **Bill Spight** » Thu Mar 17, 2016 2:33 pm

Let me try to give an example of a possible approach. Not that I know this is a good heuristic without testing it, OC.

Suppose that AlphaGo thinks that it is ahead and wishes to play safe. Then, instead of looking for a play that maximizes its estimated probability of winning, it vicariously switches sides and looks for a play by the opponent that maximizes its opponent's estimated probability of winning. Then it makes that play itself in the search tree and estimates its probability of winning in the resultant position. If it estimates that it is still ahead, that play becomes a good candidate move. This is a heuristic for playing prophylactically, to minimize the chances of the opponent making trouble, not for maximizing its own estimated probability of winning.

Life In 19x19

possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame