It is currently Thu Mar 28, 2024 9:37 am

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 43 posts ]  Go to page 1, 2, 3  Next
Author Message
Offline
 Post subject: possible to improve AlphaGo in endgame
Post #1 Posted: Tue Mar 15, 2016 6:42 pm 
Beginner

Posts: 2
Liked others: 0
Was liked: 0
The order of AlphaGo's yose moves looks like not perfect when it leads the game.

AlphaGo knows how to win, but it doesn't know how to win more.

There are multiple choices of move orders to win (win rate 100%) at the endgame.
Some of them win more and some of them win less.

AlphaGo selects one way with 100% win rate but don't know how much it can win by the move.

possible solution:

In the training stage of value network, it's possible to train different versions with different komi.

When AlphaGo sees multiple choices of 100% win rate,

AlphaGo plays black: get answers from a different value network with higher komi (black must win more)
AlphaGo plays white: get answers from a different value network with lower komi (white must win more)

Then original 100% win rate will decline to a value lower than 100%.
AlphaGo can select the move with highest win rate, and the move order of yose should be better.

Top
 Profile  
 
Offline
 Post subject: Re: possible to improve AlphaGo in endgame
Post #2 Posted: Tue Mar 15, 2016 7:15 pm 
Oza

Posts: 2180
Location: ʍoquıɐɹ ǝɥʇ ɹǝʌo 'ǝɹǝɥʍǝɯos
Liked others: 237
Was liked: 662
Rank: AGA 5d
GD Posts: 4312
Online playing schedule: Every tenth February 29th from 20:00-20:01 (if time permits)
Why does it need to win by more? It is not playing bangneki. Isn't it better to optimize the chance of a win, regardless of the margin?

_________________
Still officially AGA 5d but I play so irregularly these days that I am probably only 3d or 4d over the board (but hopefully still 5d in terms of knowledge, theory and the ability to contribute).


This post by DrStraw was liked by: Kirby
Top
 Profile  
 
Offline
 Post subject: Re: possible to improve AlphaGo in endgame
Post #3 Posted: Tue Mar 15, 2016 8:24 pm 
Beginner

Posts: 2
Liked others: 0
Was liked: 0
The benefit of win more strategy (differentiate 100% win rates) in endgame may be

It could show the best yose move (win the most) of AlphaGo and compare with Pro's yose to see if there is anything could be further improved for both AI and Pros.

Top
 Profile  
 
Offline
 Post subject: Re: possible to improve AlphaGo in endgame
Post #4 Posted: Tue Mar 15, 2016 8:51 pm 
Honinbo

Posts: 9545
Liked others: 1600
Was liked: 1711
KGS: Kirby
Tygem: 커비라고해
yamiyodare wrote:
The benefit of win more strategy (differentiate 100% win rates) in endgame may be

It could show the best yose move (win the most) of AlphaGo and compare with Pro's yose to see if there is anything could be further improved for both AI and Pros.


I agree with DrStraw. Correct yose depends on board position. And any sequence of plays that wins the game is sufficient, since Go is a zero-sum game.

If you are betting on the number of points to win by, etc., then it makes sense.

_________________
be immersed

Top
 Profile  
 
Offline
 Post subject: Re: possible to improve AlphaGo in endgame
Post #5 Posted: Tue Mar 15, 2016 11:15 pm 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
The trouble with making smaller plays in a winning position is the possibility of making a later error that is larger than the smaller margin of victory.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: possible to improve AlphaGo in endgame
Post #6 Posted: Tue Mar 15, 2016 11:48 pm 
Oza
User avatar

Posts: 2401
Location: Tokyo, Japan
Liked others: 2338
Was liked: 1332
Rank: Jp 6 dan
KGS: ez4u
Pros may also play to reduce the uncertainty in a position rather than trying to maximize the point difference, when they judge that they are ahead. There is nothing unusual about it.

_________________
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21


This post by ez4u was liked by: Kirby
Top
 Profile  
 
Offline
 Post subject: Re: possible to improve AlphaGo in endgame
Post #7 Posted: Wed Mar 16, 2016 12:28 am 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
If AlphaGo always played perfect endgame, it would not need to optimise the score. Since it plays imperfect endgame, Bill's reply applies.

Top
 Profile  
 
Offline
 Post subject: Re: possible to improve AlphaGo in endgame
Post #8 Posted: Wed Mar 16, 2016 8:32 am 
Beginner

Posts: 15
Liked others: 0
Was liked: 3
yamiyodare wrote:
AlphaGo selects one way with 100% win rate

I don't think so. As far as I can see, AlphaGo does not include a theorem prover, while the number of legal variations is astronical, also in the end game. So, not all legal variations are investigated, and a win is never certain, only very likely.

Top
 Profile  
 
Offline
 Post subject: Re: possible to improve AlphaGo in endgame
Post #9 Posted: Wed Mar 16, 2016 12:04 pm 
Lives in gote

Posts: 553
Liked others: 61
Was liked: 250
Rank: AGA 5 dan
Bill Spight wrote:
The trouble with making smaller plays in a winning position is the possibility of making a later error that is larger than the smaller margin of victory.
Surely that is part of the calculation of risk, which the computer is minimizing, to the best of its ability.

Top
 Profile  
 
Offline
 Post subject: Re: possible to improve AlphaGo in endgame
Post #10 Posted: Wed Mar 16, 2016 12:19 pm 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
mitsun wrote:
Bill Spight wrote:
The trouble with making smaller plays in a winning position is the possibility of making a later error that is larger than the smaller margin of victory.
Surely that is part of the calculation of risk, which the computer is minimizing, to the best of its ability.


Well, if it is estimating the probability of winning, that estimate has an error. Also, how is the probability defined? By random rollouts?

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: possible to improve AlphaGo in endgame
Post #11 Posted: Wed Mar 16, 2016 12:36 pm 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
ez4u wrote:
Pros may also play to reduce the uncertainty in a position rather than trying to maximize the point difference, when they judge that they are ahead. There is nothing unusual about it.


What is unusual is how the currently best computer programs do it, or claim to do it. Consider AlphaGo's play in the final game. It made a small misstep at move 262, which might have cost one point. In the opening it lost a few points on the right side. I suppose that the opening play would have reduced the uncertainty if it had been correct, by settling that region of the board, but in a way it was reckless, because if the play was incorrect, it would reduce the uncertainty along with possibly losing the advantage. And even if it only reduced the advantage, it would have made possible future mistakes more dangerous, arguably decreasing the probability of winning.

In general one maximizes the probability of winning by maximizing the territory difference. Against that, if one is ahead, one can often play safe. Many of the plays that these programs make when ahead do not appear to be playing safe, they look silly, particularly in the endgame.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: possible to improve AlphaGo in endgame
Post #12 Posted: Wed Mar 16, 2016 12:43 pm 
Lives in gote

Posts: 448
Liked others: 5
Was liked: 187
Rank: BGA 3 dan
Bill Spight wrote:
mitsun wrote:
Bill Spight wrote:
The trouble with making smaller plays in a winning position is the possibility of making a later error that is larger than the smaller margin of victory.
Surely that is part of the calculation of risk, which the computer is minimizing, to the best of its ability.


Well, if it is estimating the probability of winning, that estimate has an error. Also, how is the probability defined? By random rollouts?


The actual algorithm is a bit too stratified to be a comfortable thing to put into a few words. But towards the end of the game it is navigating towards a solid win.

It would probably detect tedomari just by pseudo-random rollouts, as you suggest, for example. And likewise any type of play which "clarifies" a win in that fashion. Some noise allowed.

That may be what happened near the end of game 5, when it played a one point reverse sente, and Redmond commented that it was "small".

That, though, is likely an over-simplification, since there was a potential ko top left that we didn't see played out. AlphaGo doesn't manage its threats as a pro would; it assumes it can see enough in concrete variations (and so can be wrong) but nothing is bolted on to its assessments, when it comes down to it. But I think it may maximise its larger threats, as held in reserve, under some circumstances - it's an interesting issue.

The style is a sort of organic, holistic, fallible, conservative playing of the percentages. Not much self-doubt built in! But pretty good at "playing for money", I hazard. One way to define a pro, we shouldn't forget.

Bill Spight wrote:
In general one maximizes the probability of winning by maximizing the territory difference. Against that, if one is ahead, one can often play safe. Many of the plays that these programs make when ahead do not appear to be playing safe, they look silly.


The aliens have landed, and they don't look in mirrors.

Consider that DeepMind started with a machine that learned to play Space Invaders, and their process could create a "pinball wizard". Cf. The Who, Tommy, lyrics

http://www.azlyrics.com/lyrics/who/goto ... orboy.html

Maybe if AlphaGo listens to you, Bill, it will found a new religion ...

Top
 Profile  
 
Offline
 Post subject: Re: possible to improve AlphaGo in endgame
Post #13 Posted: Wed Mar 16, 2016 12:49 pm 
Gosei

Posts: 1625
Liked others: 542
Was liked: 450
Rank: senior player
GD Posts: 1000
Kirby wrote:
yamiyodare wrote:
The benefit of win more strategy (differentiate 100% win rates) in endgame may be

It could show the best yose move (win the most) of AlphaGo and compare with Pro's yose to see if there is anything could be further improved for both AI and Pros.


I agree with DrStraw. Correct yose depends on board position. And any sequence of plays that wins the game is sufficient, since Go is a zero-sum game.

If you are betting on the number of points to win by, etc., then it makes sense.


I may be confused but I don't see how go is a zero sum game. I think zero-sum means that what one player wins the other loses, or that the two players' payoffs sum to zero. If go had payoffs so that the winner wins, say $1, and the loser loses the same amount $1, then it would be zero-sum, but except in gambling situations there is no payoffs.

Of course go is a game of "perfect information" but that is something other than zero-sum.

Top
 Profile  
 
Offline
 Post subject: Re: possible to improve AlphaGo in endgame
Post #14 Posted: Wed Mar 16, 2016 1:11 pm 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
Charles Matthews wrote:
The actual algorithm is a bit too stratified to be a comfortable thing to put into a few words. But towards the end of the game it is navigating towards a solid win.

It would probably detect tedomari just by pseudo-random rollouts, as you suggest, for example.


We could test it on positions from Mathematical Go. :)

Quote:
Maybe if AlphaGo listens to you, Bill, it will found a new religion ...


I can see it now:
Quote:
Some day AlphaGo will return!
;)

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: possible to improve AlphaGo in endgame
Post #15 Posted: Wed Mar 16, 2016 1:36 pm 
Honinbo

Posts: 9545
Liked others: 1600
Was liked: 1711
KGS: Kirby
Tygem: 커비라고해
Bill Spight wrote:
In general one maximizes the probability of winning by maximizing the territory difference.


I think this is might not always match how computers see the situation.

Lee Sedol's move 78 from Game 4 against AlphaGo gives some insight. Apparently, the computer overlooked Lee Sedol's move, since it found that Lee Sedol's choice had only 1/10000 of a chance of being played. As a result, I guess it did some sort of simplified reading of the situation, and made the wrong move.

In short, the computer miscalculated the situation due to the complexity that was added by a very unusual move.

If I were the computer, and I wanted to increase my chances of winning, I would want to avoid this type of complexity that would result in my misreading of the situation. I'd want to have several clear and simple paths to victory, and eliminate these 1/10000-type moves that could lead to something I haven't really thought about.

So maximizing my chances of winning isn't necessarily about always maximizing the difference in score. If I'm ahead, rather, sometimes I'd like to simplify the situation into one where I know I won't encounter one of those 1/10000 type moves that lead to a situation where the path to victory is less clear to me.

If losing a few points here and there will get me on a simple and direct path to victory, I think it's the way to go.

_________________
be immersed

Top
 Profile  
 
Offline
 Post subject: Re: possible to improve AlphaGo in endgame
Post #16 Posted: Wed Mar 16, 2016 1:47 pm 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
It is very unlikely that AlphaGo overlooked move 78. Instead, it would overlook the best sequence with the correct timing in the context of a longer sequence incorporating the neighbour fights.

Top
 Profile  
 
Offline
 Post subject: Re: possible to improve AlphaGo in endgame
Post #17 Posted: Wed Mar 16, 2016 1:54 pm 
Honinbo

Posts: 9545
Liked others: 1600
Was liked: 1711
KGS: Kirby
Tygem: 커비라고해
gowan wrote:
I may be confused but I don't see how go is a zero sum game. I think zero-sum means that what one player wins the other loses, or that the two players' payoffs sum to zero. If go had payoffs so that the winner wins, say $1, and the loser loses the same amount $1, then it would be zero-sum, but except in gambling situations there is no payoffs.

Of course go is a game of "perfect information" but that is something other than zero-sum.


I disagree.

The payoff of the game is winning or losing.

Wikipedia wrote:
In game theory and economic theory, a zero-sum game is a mathematical representation of a situation in which each participant's gain (or loss) of utility is exactly balanced by the losses (or gains) of the utility of the other participant(s).


The utility that you get from playing the game is the win. This is balanced with the negative utility you get from losing.

You can contrast this with the idea that getting more points in Go provides more utility than getting fewer points. If getting more points in Go provided additional utility than getting fewer points, then the goal would be to maximize points.

But this is not the goal. The goal is to maximize your utility, which is defined by winning that particular game.

You can think of it like you said as getting $1 if you win, and losing $1 if you lose.

You don't have to think of it in terms of dollars. Specifically, the payoff for winning a game of Go is 1-unit (the win), and the payoff for losing the game is also 1-unit (the loss).

Anyway, the point I want to express is that it's not the goal of a computer AI to maximize points, because that is not the utility of the game. The utility of the game is the win, which is worth 1 unit - not some point value you are trying to maximize.

_________________
be immersed


This post by Kirby was liked by: luigi
Top
 Profile  
 
Offline
 Post subject: Re: possible to improve AlphaGo in endgame
Post #18 Posted: Wed Mar 16, 2016 1:55 pm 
Honinbo

Posts: 9545
Liked others: 1600
Was liked: 1711
KGS: Kirby
Tygem: 커비라고해
RobertJasiek wrote:
It is very unlikely that AlphaGo overlooked move 78. Instead, it would overlook the best sequence with the correct timing in the context of a longer sequence incorporating the neighbour fights.


I don't think it overlooked the move, but since it was an unusual variation, I don't believe as much computation had been put into getting a successful variation from that branch.

_________________
be immersed

Top
 Profile  
 
Offline
 Post subject: Re: possible to improve AlphaGo in endgame
Post #19 Posted: Wed Mar 16, 2016 3:51 pm 
Beginner

Posts: 15
Liked others: 0
Was liked: 3
Bill Spight wrote:
In general one maximizes the probability of winning by maximizing the territory difference.

This is clearly false. If one is greedy, one may be punished.
Bill Spight wrote:
Many of the plays that these programs make when ahead do not appear to be playing safe, they look silly, particularly in the endgame.

They only look silly to entities equipped with a theorem prover, who can prove to themselves that certain moves are useless or inferior. Alphago is not equipped with a theorem prover.


Last edited by zorq on Wed Mar 16, 2016 4:27 pm, edited 1 time in total.
Top
 Profile  
 
Offline
 Post subject: Re: possible to improve AlphaGo in endgame
Post #20 Posted: Wed Mar 16, 2016 4:12 pm 
Dies in gote

Posts: 26
Liked others: 6
Was liked: 3
Seems it has problems calculating when stones are surrounded and in a state of semeai. We saw it with that endgame sequence in Game 2 in the upper right where it threw away about 5 points when it elected to capture those stones in the center. I think it did so because Lee and AlphaGo had two sets of 3 stones sort of surrounding each other. I don't think a computer would care or not care about losing points endgame. It should calculate and make the biggest move regardless of how far ahead it is. My guess is there is some sort of calculation error going on. There was also the obvious wedge in Game 4. Then Game 5 the sequence in the bottom right. All had a sort of semeai involved.

On a side note, does anyone know what is going on with AlphaGo? Will they work on it for a month or two and let Ke Jie play it or not? I'm kind of assuming they want to move onto other things if they want to build something similar for solving cancer or other problems, but at the same time I don't want them to. I want to see AlphaGo play more pros. Really all the top pros should have a chance to play it a few times. Not every game has to be a televised event.

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 43 posts ]  Go to page 1, 2, 3  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group