possible to improve AlphaGo in endgame

yamiyodare · Post by **yamiyodare** » Tue Mar 15, 2016 6:42 pm

The order of AlphaGo's yose moves looks like not perfect when it leads the game.

AlphaGo knows how to win, but it doesn't know how to win more.

There are multiple choices of move orders to win (win rate 100%) at the endgame.
Some of them win more and some of them win less.

AlphaGo selects one way with 100% win rate but don't know how much it can win by the move.

possible solution:

In the training stage of value network, it's possible to train different versions with different komi.

When AlphaGo sees multiple choices of 100% win rate,

AlphaGo plays black: get answers from a different value network with higher komi (black must win more)
AlphaGo plays white: get answers from a different value network with lower komi (white must win more)

Then original 100% win rate will decline to a value lower than 100%.
AlphaGo can select the move with highest win rate, and the move order of yose should be better.

DrStraw · Post by **DrStraw** » Tue Mar 15, 2016 7:15 pm

Why does it need to win by more? It is not playing bangneki. Isn't it better to optimize the chance of a win, regardless of the margin?

yamiyodare · Post by **yamiyodare** » Tue Mar 15, 2016 8:24 pm

The benefit of win more strategy (differentiate 100% win rates) in endgame may be

It could show the best yose move (win the most) of AlphaGo and compare with Pro's yose to see if there is anything could be further improved for both AI and Pros.

Kirby · Post by **Kirby** » Tue Mar 15, 2016 8:51 pm

yamiyodare wrote:The benefit of win more strategy (differentiate 100% win rates) in endgame may be

It could show the best yose move (win the most) of AlphaGo and compare with Pro's yose to see if there is anything could be further improved for both AI and Pros.

I agree with DrStraw. Correct yose depends on board position. And any sequence of plays that wins the game is sufficient, since Go is a zero-sum game.

If you are betting on the number of points to win by, etc., then it makes sense.

Bill Spight · Post by **Bill Spight** » Tue Mar 15, 2016 11:15 pm

The trouble with making smaller plays in a winning position is the possibility of making a later error that is larger than the smaller margin of victory.

ez4u · Post by **ez4u** » Tue Mar 15, 2016 11:48 pm

Pros may also play to reduce the uncertainty in a position rather than trying to maximize the point difference, when they judge that they are ahead. There is nothing unusual about it.

RobertJasiek · Post by **RobertJasiek** » Wed Mar 16, 2016 12:28 am

If AlphaGo always played perfect endgame, it would not need to optimise the score. Since it plays imperfect endgame, Bill's reply applies.

zorq · Post by **zorq** » Wed Mar 16, 2016 8:32 am

yamiyodare wrote:AlphaGo selects one way with 100% win rate

I don't think so. As far as I can see, AlphaGo does not include a theorem prover, while the number of legal variations is astronical, also in the end game. So, not all legal variations are investigated, and a win is never certain, only very likely.

mitsun · Post by **mitsun** » Wed Mar 16, 2016 12:04 pm

Bill Spight wrote:The trouble with making smaller plays in a winning position is the possibility of making a later error that is larger than the smaller margin of victory.

Surely that is part of the calculation of risk, which the computer is minimizing, to the best of its ability.

Bill Spight · Post by **Bill Spight** » Wed Mar 16, 2016 12:19 pm

mitsun wrote:
Bill Spight wrote:The trouble with making smaller plays in a winning position is the possibility of making a later error that is larger than the smaller margin of victory.
Surely that is part of the calculation of risk, which the computer is minimizing, to the best of its ability.

Well, if it is estimating the probability of winning, that estimate has an error. Also, how is the probability defined? By random rollouts?

Bill Spight · Post by **Bill Spight** » Wed Mar 16, 2016 12:36 pm

ez4u wrote:Pros may also play to reduce the uncertainty in a position rather than trying to maximize the point difference, when they judge that they are ahead. There is nothing unusual about it.

What is unusual is how the currently best computer programs do it, or claim to do it. Consider AlphaGo's play in the final game. It made a small misstep at move 262, which might have cost one point. In the opening it lost a few points on the right side. I suppose that the opening play would have reduced the uncertainty if it had been correct, by settling that region of the board, but in a way it was reckless, because if the play was incorrect, it would reduce the uncertainty along with possibly losing the advantage. And even if it only reduced the advantage, it would have made possible future mistakes more dangerous, arguably decreasing the probability of winning.

In general one maximizes the probability of winning by maximizing the territory difference. Against that, if one is ahead, one can often play safe. Many of the plays that these programs make when ahead do not appear to be playing safe, they look silly, particularly in the endgame.

Charles Matthews · Post by **Charles Matthews** » Wed Mar 16, 2016 12:43 pm

Bill Spight wrote:
mitsun wrote:
Bill Spight wrote:The trouble with making smaller plays in a winning position is the possibility of making a later error that is larger than the smaller margin of victory.
Surely that is part of the calculation of risk, which the computer is minimizing, to the best of its ability.
Well, if it is estimating the probability of winning, that estimate has an error. Also, how is the probability defined? By random rollouts?

The actual algorithm is a bit too stratified to be a comfortable thing to put into a few words. But towards the end of the game it is navigating towards a solid win.

It would probably detect tedomari just by pseudo-random rollouts, as you suggest, for example. And likewise any type of play which "clarifies" a win in that fashion. Some noise allowed.

That may be what happened near the end of game 5, when it played a one point reverse sente, and Redmond commented that it was "small".

That, though, is likely an over-simplification, since there was a potential ko top left that we didn't see played out. AlphaGo doesn't manage its threats as a pro would; it assumes it can see enough in concrete variations (and so can be wrong) but nothing is bolted on to its assessments, when it comes down to it. But I think it may maximise its larger threats, as held in reserve, under some circumstances - it's an interesting issue.

The style is a sort of organic, holistic, fallible, conservative playing of the percentages. Not much self-doubt built in! But pretty good at "playing for money", I hazard. One way to define a pro, we shouldn't forget.

Bill Spight wrote:In general one maximizes the probability of winning by maximizing the territory difference. Against that, if one is ahead, one can often play safe. Many of the plays that these programs make when ahead do not appear to be playing safe, they look silly.

The aliens have landed, and they don't look in mirrors.

Consider that DeepMind started with a machine that learned to play Space Invaders, and their process could create a "pinball wizard". Cf. The Who, Tommy, lyrics

http://www.azlyrics.com/lyrics/who/goto ... orboy.html

Maybe if AlphaGo listens to you, Bill, it will found a new religion ...

gowan · Post by **gowan** » Wed Mar 16, 2016 12:49 pm

Kirby wrote:
yamiyodare wrote:The benefit of win more strategy (differentiate 100% win rates) in endgame may be

It could show the best yose move (win the most) of AlphaGo and compare with Pro's yose to see if there is anything could be further improved for both AI and Pros.
I agree with DrStraw. Correct yose depends on board position. And any sequence of plays that wins the game is sufficient, since Go is a zero-sum game.

If you are betting on the number of points to win by, etc., then it makes sense.

I may be confused but I don't see how go is a zero sum game. I think zero-sum means that what one player wins the other loses, or that the two players' payoffs sum to zero. If go had payoffs so that the winner wins, say $1, and the loser loses the same amount $1, then it would be zero-sum, but except in gambling situations there is no payoffs.

Of course go is a game of "perfect information" but that is something other than zero-sum.

Bill Spight · Post by **Bill Spight** » Wed Mar 16, 2016 1:11 pm

Charles Matthews wrote:The actual algorithm is a bit too stratified to be a comfortable thing to put into a few words. But towards the end of the game it is navigating towards a solid win.

It would probably detect tedomari just by pseudo-random rollouts, as you suggest, for example.

We could test it on positions from Mathematical Go.

Maybe if AlphaGo listens to you, Bill, it will found a new religion ...

I can see it now:

Some day AlphaGo will return!

Kirby · Post by **Kirby** » Wed Mar 16, 2016 1:36 pm

Bill Spight wrote:In general one maximizes the probability of winning by maximizing the territory difference.

I think this is might not always match how computers see the situation.

Lee Sedol's move 78 from Game 4 against AlphaGo gives some insight. Apparently, the computer overlooked Lee Sedol's move, since it found that Lee Sedol's choice had only 1/10000 of a chance of being played. As a result, I guess it did some sort of simplified reading of the situation, and made the wrong move.

In short, the computer miscalculated the situation due to the complexity that was added by a very unusual move.

If I were the computer, and I wanted to increase my chances of winning, I would want to avoid this type of complexity that would result in my misreading of the situation. I'd want to have several clear and simple paths to victory, and eliminate these 1/10000-type moves that could lead to something I haven't really thought about.

So maximizing my chances of winning isn't necessarily about always maximizing the difference in score. If I'm ahead, rather, sometimes I'd like to simplify the situation into one where I know I won't encounter one of those 1/10000 type moves that lead to a situation where the path to victory is less clear to me.

If losing a few points here and there will get me on a simple and direct path to victory, I think it's the way to go.

Life In 19x19

possible to improve AlphaGo in endgame

possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame