possible to improve AlphaGo in endgame
-
yamiyodare
- Beginner
- Posts: 2
- Joined: Tue Mar 15, 2016 6:05 pm
- GD Posts: 0
possible to improve AlphaGo in endgame
The order of AlphaGo's yose moves looks like not perfect when it leads the game.
AlphaGo knows how to win, but it doesn't know how to win more.
There are multiple choices of move orders to win (win rate 100%) at the endgame.
Some of them win more and some of them win less.
AlphaGo selects one way with 100% win rate but don't know how much it can win by the move.
possible solution:
In the training stage of value network, it's possible to train different versions with different komi.
When AlphaGo sees multiple choices of 100% win rate,
AlphaGo plays black: get answers from a different value network with higher komi (black must win more)
AlphaGo plays white: get answers from a different value network with lower komi (white must win more)
Then original 100% win rate will decline to a value lower than 100%.
AlphaGo can select the move with highest win rate, and the move order of yose should be better.
AlphaGo knows how to win, but it doesn't know how to win more.
There are multiple choices of move orders to win (win rate 100%) at the endgame.
Some of them win more and some of them win less.
AlphaGo selects one way with 100% win rate but don't know how much it can win by the move.
possible solution:
In the training stage of value network, it's possible to train different versions with different komi.
When AlphaGo sees multiple choices of 100% win rate,
AlphaGo plays black: get answers from a different value network with higher komi (black must win more)
AlphaGo plays white: get answers from a different value network with lower komi (white must win more)
Then original 100% win rate will decline to a value lower than 100%.
AlphaGo can select the move with highest win rate, and the move order of yose should be better.
-
DrStraw
- Oza
- Posts: 2180
- Joined: Tue Apr 27, 2010 4:09 am
- Rank: AGA 5d
- GD Posts: 4312
- Online playing schedule: Every tenth February 29th from 20:00-20:01 (if time permits)
- Location: ʍoquıɐɹ ǝɥʇ ɹǝʌo 'ǝɹǝɥʍǝɯos
- Has thanked: 237 times
- Been thanked: 662 times
- Contact:
Re: possible to improve AlphaGo in endgame
Why does it need to win by more? It is not playing bangneki. Isn't it better to optimize the chance of a win, regardless of the margin?
Still officially AGA 5d but I play so irregularly these days that I am probably only 3d or 4d over the board (but hopefully still 5d in terms of knowledge, theory and the ability to contribute).
-
yamiyodare
- Beginner
- Posts: 2
- Joined: Tue Mar 15, 2016 6:05 pm
- GD Posts: 0
Re: possible to improve AlphaGo in endgame
The benefit of win more strategy (differentiate 100% win rates) in endgame may be
It could show the best yose move (win the most) of AlphaGo and compare with Pro's yose to see if there is anything could be further improved for both AI and Pros.
It could show the best yose move (win the most) of AlphaGo and compare with Pro's yose to see if there is anything could be further improved for both AI and Pros.
-
Kirby
- Honinbo
- Posts: 9553
- Joined: Wed Feb 24, 2010 6:04 pm
- GD Posts: 0
- KGS: Kirby
- Tygem: 커비라고해
- Has thanked: 1583 times
- Been thanked: 1707 times
Re: possible to improve AlphaGo in endgame
I agree with DrStraw. Correct yose depends on board position. And any sequence of plays that wins the game is sufficient, since Go is a zero-sum game.yamiyodare wrote:The benefit of win more strategy (differentiate 100% win rates) in endgame may be
It could show the best yose move (win the most) of AlphaGo and compare with Pro's yose to see if there is anything could be further improved for both AI and Pros.
If you are betting on the number of points to win by, etc., then it makes sense.
be immersed
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: possible to improve AlphaGo in endgame
The trouble with making smaller plays in a winning position is the possibility of making a later error that is larger than the smaller margin of victory.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
- ez4u
- Oza
- Posts: 2414
- Joined: Wed Feb 23, 2011 10:15 pm
- Rank: Jp 6 dan
- GD Posts: 0
- KGS: ez4u
- Location: Tokyo, Japan
- Has thanked: 2351 times
- Been thanked: 1332 times
Re: possible to improve AlphaGo in endgame
Pros may also play to reduce the uncertainty in a position rather than trying to maximize the point difference, when they judge that they are ahead. There is nothing unusual about it.
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
-
RobertJasiek
- Judan
- Posts: 6273
- Joined: Tue Apr 27, 2010 8:54 pm
- GD Posts: 0
- Been thanked: 797 times
- Contact:
Re: possible to improve AlphaGo in endgame
If AlphaGo always played perfect endgame, it would not need to optimise the score. Since it plays imperfect endgame, Bill's reply applies.
Re: possible to improve AlphaGo in endgame
I don't think so. As far as I can see, AlphaGo does not include a theorem prover, while the number of legal variations is astronical, also in the end game. So, not all legal variations are investigated, and a win is never certain, only very likely.yamiyodare wrote:AlphaGo selects one way with 100% win rate
-
mitsun
- Lives in gote
- Posts: 553
- Joined: Fri Apr 23, 2010 10:10 pm
- Rank: AGA 5 dan
- GD Posts: 0
- Has thanked: 61 times
- Been thanked: 250 times
Re: possible to improve AlphaGo in endgame
Surely that is part of the calculation of risk, which the computer is minimizing, to the best of its ability.Bill Spight wrote:The trouble with making smaller plays in a winning position is the possibility of making a later error that is larger than the smaller margin of victory.
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: possible to improve AlphaGo in endgame
Well, if it is estimating the probability of winning, that estimate has an error. Also, how is the probability defined? By random rollouts?mitsun wrote:Surely that is part of the calculation of risk, which the computer is minimizing, to the best of its ability.Bill Spight wrote:The trouble with making smaller plays in a winning position is the possibility of making a later error that is larger than the smaller margin of victory.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: possible to improve AlphaGo in endgame
What is unusual is how the currently best computer programs do it, or claim to do it. Consider AlphaGo's play in the final game. It made a small misstep at move 262, which might have cost one point. In the opening it lost a few points on the right side. I suppose that the opening play would have reduced the uncertainty if it had been correct, by settling that region of the board, but in a way it was reckless, because if the play was incorrect, it would reduce the uncertainty along with possibly losing the advantage. And even if it only reduced the advantage, it would have made possible future mistakes more dangerous, arguably decreasing the probability of winning.ez4u wrote:Pros may also play to reduce the uncertainty in a position rather than trying to maximize the point difference, when they judge that they are ahead. There is nothing unusual about it.
In general one maximizes the probability of winning by maximizing the territory difference. Against that, if one is ahead, one can often play safe. Many of the plays that these programs make when ahead do not appear to be playing safe, they look silly, particularly in the endgame.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
Charles Matthews
- Lives in gote
- Posts: 450
- Joined: Sun May 13, 2012 9:12 am
- Rank: BGA 3 dan
- GD Posts: 0
- Has thanked: 5 times
- Been thanked: 189 times
Re: possible to improve AlphaGo in endgame
The actual algorithm is a bit too stratified to be a comfortable thing to put into a few words. But towards the end of the game it is navigating towards a solid win.Bill Spight wrote:Well, if it is estimating the probability of winning, that estimate has an error. Also, how is the probability defined? By random rollouts?mitsun wrote:Surely that is part of the calculation of risk, which the computer is minimizing, to the best of its ability.Bill Spight wrote:The trouble with making smaller plays in a winning position is the possibility of making a later error that is larger than the smaller margin of victory.
It would probably detect tedomari just by pseudo-random rollouts, as you suggest, for example. And likewise any type of play which "clarifies" a win in that fashion. Some noise allowed.
That may be what happened near the end of game 5, when it played a one point reverse sente, and Redmond commented that it was "small".
That, though, is likely an over-simplification, since there was a potential ko top left that we didn't see played out. AlphaGo doesn't manage its threats as a pro would; it assumes it can see enough in concrete variations (and so can be wrong) but nothing is bolted on to its assessments, when it comes down to it. But I think it may maximise its larger threats, as held in reserve, under some circumstances - it's an interesting issue.
The style is a sort of organic, holistic, fallible, conservative playing of the percentages. Not much self-doubt built in! But pretty good at "playing for money", I hazard. One way to define a pro, we shouldn't forget.
The aliens have landed, and they don't look in mirrors.Bill Spight wrote:In general one maximizes the probability of winning by maximizing the territory difference. Against that, if one is ahead, one can often play safe. Many of the plays that these programs make when ahead do not appear to be playing safe, they look silly.
Consider that DeepMind started with a machine that learned to play Space Invaders, and their process could create a "pinball wizard". Cf. The Who, Tommy, lyrics
http://www.azlyrics.com/lyrics/who/goto ... orboy.html
Maybe if AlphaGo listens to you, Bill, it will found a new religion ...
-
gowan
- Gosei
- Posts: 1628
- Joined: Thu Apr 29, 2010 4:40 am
- Rank: senior player
- GD Posts: 1000
- Has thanked: 546 times
- Been thanked: 450 times
Re: possible to improve AlphaGo in endgame
I may be confused but I don't see how go is a zero sum game. I think zero-sum means that what one player wins the other loses, or that the two players' payoffs sum to zero. If go had payoffs so that the winner wins, say $1, and the loser loses the same amount $1, then it would be zero-sum, but except in gambling situations there is no payoffs.Kirby wrote:I agree with DrStraw. Correct yose depends on board position. And any sequence of plays that wins the game is sufficient, since Go is a zero-sum game.yamiyodare wrote:The benefit of win more strategy (differentiate 100% win rates) in endgame may be
It could show the best yose move (win the most) of AlphaGo and compare with Pro's yose to see if there is anything could be further improved for both AI and Pros.
If you are betting on the number of points to win by, etc., then it makes sense.
Of course go is a game of "perfect information" but that is something other than zero-sum.
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: possible to improve AlphaGo in endgame
We could test it on positions from Mathematical Go.Charles Matthews wrote:The actual algorithm is a bit too stratified to be a comfortable thing to put into a few words. But towards the end of the game it is navigating towards a solid win.
It would probably detect tedomari just by pseudo-random rollouts, as you suggest, for example.
I can see it now:Maybe if AlphaGo listens to you, Bill, it will found a new religion ...
Some day AlphaGo will return!
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
Kirby
- Honinbo
- Posts: 9553
- Joined: Wed Feb 24, 2010 6:04 pm
- GD Posts: 0
- KGS: Kirby
- Tygem: 커비라고해
- Has thanked: 1583 times
- Been thanked: 1707 times
Re: possible to improve AlphaGo in endgame
I think this is might not always match how computers see the situation.Bill Spight wrote:In general one maximizes the probability of winning by maximizing the territory difference.
Lee Sedol's move 78 from Game 4 against AlphaGo gives some insight. Apparently, the computer overlooked Lee Sedol's move, since it found that Lee Sedol's choice had only 1/10000 of a chance of being played. As a result, I guess it did some sort of simplified reading of the situation, and made the wrong move.
In short, the computer miscalculated the situation due to the complexity that was added by a very unusual move.
If I were the computer, and I wanted to increase my chances of winning, I would want to avoid this type of complexity that would result in my misreading of the situation. I'd want to have several clear and simple paths to victory, and eliminate these 1/10000-type moves that could lead to something I haven't really thought about.
So maximizing my chances of winning isn't necessarily about always maximizing the difference in score. If I'm ahead, rather, sometimes I'd like to simplify the situation into one where I know I won't encounter one of those 1/10000 type moves that lead to a situation where the path to victory is less clear to me.
If losing a few points here and there will get me on a simple and direct path to victory, I think it's the way to go.
be immersed