Life In 19x19
http://lifein19x19.com/

possible to improve AlphaGo in endgame
http://lifein19x19.com/viewtopic.php?f=18&t=12856
Page 1 of 3

Author:  yamiyodare [ Tue Mar 15, 2016 6:42 pm ]
Post subject:  possible to improve AlphaGo in endgame

The order of AlphaGo's yose moves looks like not perfect when it leads the game.

AlphaGo knows how to win, but it doesn't know how to win more.

There are multiple choices of move orders to win (win rate 100%) at the endgame.
Some of them win more and some of them win less.

AlphaGo selects one way with 100% win rate but don't know how much it can win by the move.

possible solution:

In the training stage of value network, it's possible to train different versions with different komi.

When AlphaGo sees multiple choices of 100% win rate,

AlphaGo plays black: get answers from a different value network with higher komi (black must win more)
AlphaGo plays white: get answers from a different value network with lower komi (white must win more)

Then original 100% win rate will decline to a value lower than 100%.
AlphaGo can select the move with highest win rate, and the move order of yose should be better.

Author:  DrStraw [ Tue Mar 15, 2016 7:15 pm ]
Post subject:  Re: possible to improve AlphaGo in endgame

Why does it need to win by more? It is not playing bangneki. Isn't it better to optimize the chance of a win, regardless of the margin?

Author:  yamiyodare [ Tue Mar 15, 2016 8:24 pm ]
Post subject:  Re: possible to improve AlphaGo in endgame

The benefit of win more strategy (differentiate 100% win rates) in endgame may be

It could show the best yose move (win the most) of AlphaGo and compare with Pro's yose to see if there is anything could be further improved for both AI and Pros.

Author:  Kirby [ Tue Mar 15, 2016 8:51 pm ]
Post subject:  Re: possible to improve AlphaGo in endgame

yamiyodare wrote:
The benefit of win more strategy (differentiate 100% win rates) in endgame may be

It could show the best yose move (win the most) of AlphaGo and compare with Pro's yose to see if there is anything could be further improved for both AI and Pros.


I agree with DrStraw. Correct yose depends on board position. And any sequence of plays that wins the game is sufficient, since Go is a zero-sum game.

If you are betting on the number of points to win by, etc., then it makes sense.

Author:  Bill Spight [ Tue Mar 15, 2016 11:15 pm ]
Post subject:  Re: possible to improve AlphaGo in endgame

The trouble with making smaller plays in a winning position is the possibility of making a later error that is larger than the smaller margin of victory.

Author:  ez4u [ Tue Mar 15, 2016 11:48 pm ]
Post subject:  Re: possible to improve AlphaGo in endgame

Pros may also play to reduce the uncertainty in a position rather than trying to maximize the point difference, when they judge that they are ahead. There is nothing unusual about it.

Author:  RobertJasiek [ Wed Mar 16, 2016 12:28 am ]
Post subject:  Re: possible to improve AlphaGo in endgame

If AlphaGo always played perfect endgame, it would not need to optimise the score. Since it plays imperfect endgame, Bill's reply applies.

Author:  zorq [ Wed Mar 16, 2016 8:32 am ]
Post subject:  Re: possible to improve AlphaGo in endgame

yamiyodare wrote:
AlphaGo selects one way with 100% win rate

I don't think so. As far as I can see, AlphaGo does not include a theorem prover, while the number of legal variations is astronical, also in the end game. So, not all legal variations are investigated, and a win is never certain, only very likely.

Author:  mitsun [ Wed Mar 16, 2016 12:04 pm ]
Post subject:  Re: possible to improve AlphaGo in endgame

Bill Spight wrote:
The trouble with making smaller plays in a winning position is the possibility of making a later error that is larger than the smaller margin of victory.
Surely that is part of the calculation of risk, which the computer is minimizing, to the best of its ability.

Author:  Bill Spight [ Wed Mar 16, 2016 12:19 pm ]
Post subject:  Re: possible to improve AlphaGo in endgame

mitsun wrote:
Bill Spight wrote:
The trouble with making smaller plays in a winning position is the possibility of making a later error that is larger than the smaller margin of victory.
Surely that is part of the calculation of risk, which the computer is minimizing, to the best of its ability.


Well, if it is estimating the probability of winning, that estimate has an error. Also, how is the probability defined? By random rollouts?

Author:  Bill Spight [ Wed Mar 16, 2016 12:36 pm ]
Post subject:  Re: possible to improve AlphaGo in endgame

ez4u wrote:
Pros may also play to reduce the uncertainty in a position rather than trying to maximize the point difference, when they judge that they are ahead. There is nothing unusual about it.


What is unusual is how the currently best computer programs do it, or claim to do it. Consider AlphaGo's play in the final game. It made a small misstep at move 262, which might have cost one point. In the opening it lost a few points on the right side. I suppose that the opening play would have reduced the uncertainty if it had been correct, by settling that region of the board, but in a way it was reckless, because if the play was incorrect, it would reduce the uncertainty along with possibly losing the advantage. And even if it only reduced the advantage, it would have made possible future mistakes more dangerous, arguably decreasing the probability of winning.

In general one maximizes the probability of winning by maximizing the territory difference. Against that, if one is ahead, one can often play safe. Many of the plays that these programs make when ahead do not appear to be playing safe, they look silly, particularly in the endgame.

Author:  Charles Matthews [ Wed Mar 16, 2016 12:43 pm ]
Post subject:  Re: possible to improve AlphaGo in endgame

Bill Spight wrote:
mitsun wrote:
Bill Spight wrote:
The trouble with making smaller plays in a winning position is the possibility of making a later error that is larger than the smaller margin of victory.
Surely that is part of the calculation of risk, which the computer is minimizing, to the best of its ability.


Well, if it is estimating the probability of winning, that estimate has an error. Also, how is the probability defined? By random rollouts?


The actual algorithm is a bit too stratified to be a comfortable thing to put into a few words. But towards the end of the game it is navigating towards a solid win.

It would probably detect tedomari just by pseudo-random rollouts, as you suggest, for example. And likewise any type of play which "clarifies" a win in that fashion. Some noise allowed.

That may be what happened near the end of game 5, when it played a one point reverse sente, and Redmond commented that it was "small".

That, though, is likely an over-simplification, since there was a potential ko top left that we didn't see played out. AlphaGo doesn't manage its threats as a pro would; it assumes it can see enough in concrete variations (and so can be wrong) but nothing is bolted on to its assessments, when it comes down to it. But I think it may maximise its larger threats, as held in reserve, under some circumstances - it's an interesting issue.

The style is a sort of organic, holistic, fallible, conservative playing of the percentages. Not much self-doubt built in! But pretty good at "playing for money", I hazard. One way to define a pro, we shouldn't forget.

Bill Spight wrote:
In general one maximizes the probability of winning by maximizing the territory difference. Against that, if one is ahead, one can often play safe. Many of the plays that these programs make when ahead do not appear to be playing safe, they look silly.


The aliens have landed, and they don't look in mirrors.

Consider that DeepMind started with a machine that learned to play Space Invaders, and their process could create a "pinball wizard". Cf. The Who, Tommy, lyrics

http://www.azlyrics.com/lyrics/who/goto ... orboy.html

Maybe if AlphaGo listens to you, Bill, it will found a new religion ...

Author:  gowan [ Wed Mar 16, 2016 12:49 pm ]
Post subject:  Re: possible to improve AlphaGo in endgame

Kirby wrote:
yamiyodare wrote:
The benefit of win more strategy (differentiate 100% win rates) in endgame may be

It could show the best yose move (win the most) of AlphaGo and compare with Pro's yose to see if there is anything could be further improved for both AI and Pros.


I agree with DrStraw. Correct yose depends on board position. And any sequence of plays that wins the game is sufficient, since Go is a zero-sum game.

If you are betting on the number of points to win by, etc., then it makes sense.


I may be confused but I don't see how go is a zero sum game. I think zero-sum means that what one player wins the other loses, or that the two players' payoffs sum to zero. If go had payoffs so that the winner wins, say $1, and the loser loses the same amount $1, then it would be zero-sum, but except in gambling situations there is no payoffs.

Of course go is a game of "perfect information" but that is something other than zero-sum.

Author:  Bill Spight [ Wed Mar 16, 2016 1:11 pm ]
Post subject:  Re: possible to improve AlphaGo in endgame

Charles Matthews wrote:
The actual algorithm is a bit too stratified to be a comfortable thing to put into a few words. But towards the end of the game it is navigating towards a solid win.

It would probably detect tedomari just by pseudo-random rollouts, as you suggest, for example.


We could test it on positions from Mathematical Go. :)

Quote:
Maybe if AlphaGo listens to you, Bill, it will found a new religion ...


I can see it now:
Quote:
Some day AlphaGo will return!
;)

Author:  Kirby [ Wed Mar 16, 2016 1:36 pm ]
Post subject:  Re: possible to improve AlphaGo in endgame

Bill Spight wrote:
In general one maximizes the probability of winning by maximizing the territory difference.


I think this is might not always match how computers see the situation.

Lee Sedol's move 78 from Game 4 against AlphaGo gives some insight. Apparently, the computer overlooked Lee Sedol's move, since it found that Lee Sedol's choice had only 1/10000 of a chance of being played. As a result, I guess it did some sort of simplified reading of the situation, and made the wrong move.

In short, the computer miscalculated the situation due to the complexity that was added by a very unusual move.

If I were the computer, and I wanted to increase my chances of winning, I would want to avoid this type of complexity that would result in my misreading of the situation. I'd want to have several clear and simple paths to victory, and eliminate these 1/10000-type moves that could lead to something I haven't really thought about.

So maximizing my chances of winning isn't necessarily about always maximizing the difference in score. If I'm ahead, rather, sometimes I'd like to simplify the situation into one where I know I won't encounter one of those 1/10000 type moves that lead to a situation where the path to victory is less clear to me.

If losing a few points here and there will get me on a simple and direct path to victory, I think it's the way to go.

Author:  RobertJasiek [ Wed Mar 16, 2016 1:47 pm ]
Post subject:  Re: possible to improve AlphaGo in endgame

It is very unlikely that AlphaGo overlooked move 78. Instead, it would overlook the best sequence with the correct timing in the context of a longer sequence incorporating the neighbour fights.

Author:  Kirby [ Wed Mar 16, 2016 1:54 pm ]
Post subject:  Re: possible to improve AlphaGo in endgame

gowan wrote:
I may be confused but I don't see how go is a zero sum game. I think zero-sum means that what one player wins the other loses, or that the two players' payoffs sum to zero. If go had payoffs so that the winner wins, say $1, and the loser loses the same amount $1, then it would be zero-sum, but except in gambling situations there is no payoffs.

Of course go is a game of "perfect information" but that is something other than zero-sum.


I disagree.

The payoff of the game is winning or losing.

Wikipedia wrote:
In game theory and economic theory, a zero-sum game is a mathematical representation of a situation in which each participant's gain (or loss) of utility is exactly balanced by the losses (or gains) of the utility of the other participant(s).


The utility that you get from playing the game is the win. This is balanced with the negative utility you get from losing.

You can contrast this with the idea that getting more points in Go provides more utility than getting fewer points. If getting more points in Go provided additional utility than getting fewer points, then the goal would be to maximize points.

But this is not the goal. The goal is to maximize your utility, which is defined by winning that particular game.

You can think of it like you said as getting $1 if you win, and losing $1 if you lose.

You don't have to think of it in terms of dollars. Specifically, the payoff for winning a game of Go is 1-unit (the win), and the payoff for losing the game is also 1-unit (the loss).

Anyway, the point I want to express is that it's not the goal of a computer AI to maximize points, because that is not the utility of the game. The utility of the game is the win, which is worth 1 unit - not some point value you are trying to maximize.

Author:  Kirby [ Wed Mar 16, 2016 1:55 pm ]
Post subject:  Re: possible to improve AlphaGo in endgame

RobertJasiek wrote:
It is very unlikely that AlphaGo overlooked move 78. Instead, it would overlook the best sequence with the correct timing in the context of a longer sequence incorporating the neighbour fights.


I don't think it overlooked the move, but since it was an unusual variation, I don't believe as much computation had been put into getting a successful variation from that branch.

Author:  zorq [ Wed Mar 16, 2016 3:51 pm ]
Post subject:  Re: possible to improve AlphaGo in endgame

Bill Spight wrote:
In general one maximizes the probability of winning by maximizing the territory difference.

This is clearly false. If one is greedy, one may be punished.
Bill Spight wrote:
Many of the plays that these programs make when ahead do not appear to be playing safe, they look silly, particularly in the endgame.

They only look silly to entities equipped with a theorem prover, who can prove to themselves that certain moves are useless or inferior. Alphago is not equipped with a theorem prover.

Author:  Temp [ Wed Mar 16, 2016 4:12 pm ]
Post subject:  Re: possible to improve AlphaGo in endgame

Seems it has problems calculating when stones are surrounded and in a state of semeai. We saw it with that endgame sequence in Game 2 in the upper right where it threw away about 5 points when it elected to capture those stones in the center. I think it did so because Lee and AlphaGo had two sets of 3 stones sort of surrounding each other. I don't think a computer would care or not care about losing points endgame. It should calculate and make the biggest move regardless of how far ahead it is. My guess is there is some sort of calculation error going on. There was also the obvious wedge in Game 4. Then Game 5 the sequence in the bottom right. All had a sort of semeai involved.

On a side note, does anyone know what is going on with AlphaGo? Will they work on it for a month or two and let Ke Jie play it or not? I'm kind of assuming they want to move onto other things if they want to build something similar for solving cancer or other problems, but at the same time I don't want them to. I want to see AlphaGo play more pros. Really all the top pros should have a chance to play it a few times. Not every game has to be a televised event.

Page 1 of 3 All times are UTC - 8 hours [ DST ]
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/