It is currently Thu Mar 28, 2024 7:23 am

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 16 posts ] 
Author Message
Offline
 Post subject: 2019 China Securities Cup World AI Open
Post #1 Posted: Sat Aug 24, 2019 12:32 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
This week was this AI competition. The final is today and tomorrow, FineArt has a 2-0 lead vs Golaxy. I think LeelaZero lost in the semi-final to Golaxy. There is commentary on the AGA twitch hosted by xhu, at the moment with Ohashi Hirofumi 6p commentating. Schedule at https://www.reddit.com/r/baduk/comments ... 019_china/. Here's the 2nd game with LZ following along, FineArt is black, won by resign.



Attachments:
world ai open final 2 golaxy fineart.sgf [17.55 KiB]
Downloaded 1298 times

This post by Uberdude was liked by: Bonobo
Top
 Profile  
 
Offline
 Post subject: Re: 2019 China Securities Cup World AI Open
Post #2 Posted: Sat Aug 24, 2019 5:04 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
:w72: LZ reckons that White's (Golaxy's) counterhane loses 4% (712 playouts) versus LZ's winrate evaluation at :b71:, while the hanging connection in the center gains 1% (982 playouts) versus the same estimate. Does this 5% difference reflect Golaxy's error as a player, or LZ's error as an analyst at fewer than 1k playouts? (Or both? The two are not mutually exclusive, OC. ;))

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.


Last edited by Bill Spight on Sat Aug 24, 2019 9:13 am, edited 2 times in total.
Top
 Profile  
 
Offline
 Post subject: Re: 2019 China Securities Cup World AI Open
Post #3 Posted: Sat Aug 24, 2019 6:27 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
Maybe the fault, dear Brutus, lies in our bots, not in ourselves. Or rather, in our use of our bots for analysis. See below.

White 308: LZ estimates White winning chances at 75% (405 playouts).

Black 309: makes the obvious (to humans) response. LZ now estimates White's winning chances to be 10½% worse, only 64½%. :o Did LZ not see that reply? Surely it did, but misevaluated its significance. Or is misevaluating the current position. Or both.

White 310: Golaxy plays a throw-in atari, which caters to a mistake by Black. According to LZ it loses 21½% (64 playouts), reducing White's chances of winning to 43%. :shock: Really? (No, not really, with only 64 playouts. ;))

Black 311: Fine Art makes the obvious capture of the throw-in stone, thereby losing 16% (313 playouts), to give it only a 40% chance of winning the game. According to LZ.

White 312: This is the theoretically largest play, gaining (on average) 1¾ pt. by area scoring. The alternative is to fill the ko in the bottom right, gaining on average 1⅔ pt.

Black 313: Nutso, by human standards. There is nothing to lose, and everything to gain, by taking the ko with sente instead of forcing White to fill the ko. Still, only an inaccuracy. According to LZ it gains 8½% (330 playouts) to make Black the slight favorite.

Edit: Perhaps I should not say nutso. It is true that Black loses nothing, either in theory or practice, by taking the ko. However, if Black takes the ko and White connects the dame, White is komaster of the remaining ko, and Black should fill it instead of playing the gote on the left side. Whether Black should answer White's ko threat at D-08 is not exactly obvious. If Black does answer and runs out of ko threats, then Black should play as FineArt did and play the "nutso" move before taking the gote on the left side to prevent White from getting the last dame and winning by ½ pt.

White 314: The obviously (to humans) correct reply, losing 7%, (647 playouts), according to LZ.

Black 315: The last play before the dame stage, gaining 1½ pts. of area. According to LZ it gains 8% (3.6k playouts), giving Black a 67% chance of winning. Really? With only dame left (and, as it turns out, 12 moves from the end), Black has only 2:1 odds of winning? This is not an error of fewer than 1k playouts, it is an error with almost 4k playouts. In its favor, FineArt was confident enough of a win to give up the advantage of taking the sente ko two moves before. OC, we do not know its evaluation. Edit: Actually, Black 313 is correct when White is komaster of the ko in the bottom right, to prevent White from getting the last dame at area scoring.

Elsewhere I have pointed out the lack of guidance to humans in using as analysts, bots trained as players. I rest my case. ;)

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: 2019 China Securities Cup World AI Open
Post #4 Posted: Sat Aug 24, 2019 9:20 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
Bill, I wouldn't put much weight on the LZ percentages at such low playouts: I just have it pondering as I input the game. I only glanced at your thread about Winrate errors, but as lightvector said vs what? vs perfect play is just itself or 100% - self. vs the bot's best understanding of the position where we say that is what it says at near infinite playouts is more tractable comparison and can be studied. So we can ask what is the error in the Winrate at 100 or 1000 thousand playouts when we use that to estimate what that bot would think at a billion playouts. maybe a billion takes too long so we could take 10 million as "a lot" as that's what deepmind used for the AG teaching tool.

ps 72 involves a ladder so needs extra playouts.

Top
 Profile  
 
Offline
 Post subject: Re: 2019 China Securities Cup World AI Open
Post #5 Posted: Sat Aug 24, 2019 10:59 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
Uberdude wrote:
Bill, I wouldn't put much weight on the LZ percentages at such low playouts: I just have it pondering as I input the game.


I don't put much, if any, weight on them, which was really the point of my first post.
Bill Spight wrote:
Does this 5% difference reflect Golaxy's error as a player, or LZ's error as an analyst at fewer than 1k playouts? (Or both? The two are not mutually exclusive, OC. )
:)

Quote:
I only glanced at your thread about Winrate errors, but as lightvector said vs what? vs perfect play is just itself or 100% - self. vs the bot's best understanding of the position where we say that is what it says at near infinite playouts is more tractable comparison and can be studied. So we can ask what is the error in the Winrate at 100 or 1000 thousand playouts when we use that to estimate what that bot would think at a billion playouts. maybe a billion takes too long so we could take 10 million as "a lot" as that's what deepmind used for the AG teaching tool.


Unless we are talking about a limited region of play, or about the late endgame, we don't know what perfect play is, and neither do the bots. That's part of what makes go interesting. :) But players don't need accurate evaluations (winrate estimates) to play well, they only need good enough evaluations. Reviewers and analysts, however, need to consider the roads not taken. Those moves need good evaluations. And, as humans, we need to understand the evaluations that we rely upon. I submit that nobody understands them now.

I was not intending to post the second note, but with a close game, a ko with a potential komonster at area scoring, and low playouts, LZ was challenged its endgame evaluations. Even so, the very strange swings in winrate estimates less than 20 moves from the end of the game underscore my doubts about how well the bots play the endgame. Maybe at that point the good enough evaluations don't have to be very good, I dunno. ;)

One more point. When the players whose game you are reviewing come up with plays that gain more than 1% according to the bot you are using for review, you need more playouts. ;)

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: 2019 China Securities Cup World AI Open
Post #6 Posted: Sun Aug 25, 2019 12:16 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
FineArt beat Golaxy 4-1 in the final.

Top
 Profile  
 
Offline
 Post subject: Re: 2019 China Securities Cup World AI Open
Post #7 Posted: Sun Aug 25, 2019 8:24 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
RE 72, LZ's view doesn't change much with about 100k playouts. Hanging connection gives white 56.4% (104k), hane lets black get 49.1% (91k) with the cut ie white 50.9 so 5% difference. Golaxy is generally stronger than LZ (though I don't know how many playouts each got in this competition, the time limits were 60 min + 10x40s byo), though not so much that in some situations LZ could be better than Golaxy, no idea if this is one of them. Would be interesting to know what FineArt thought.

Attachment:
golaxy fineart lz connect.PNG
golaxy fineart lz connect.PNG [ 846.74 KiB | Viewed 9480 times ]

Attachment:
golaxy fineart lz hane.PNG
golaxy fineart lz hane.PNG [ 900.9 KiB | Viewed 9480 times ]


This post by Uberdude was liked by: Bill Spight
Top
 Profile  
 
Offline
 Post subject: Re: 2019 China Securities Cup World AI Open
Post #8 Posted: Sun Aug 25, 2019 8:41 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
Uberdude wrote:
RE 72, LZ's view doesn't change much with about 100k playouts. Hanging connection gives white 56.4% (104k), hane lets black get 49.1% (91k) with the cut ie white 50.9 so 5% difference.


Con rispetto, signore, the difference I am interested in is the one between the hanging connection with at least 100k playouts and the actual :w72: with at least 100k playouts. In the diagram White has a winrate estimate of 56½% with 104k playouts for the hanging connection, while for :w72: the winrate estimate is 49½% with only 580 playouts. That is not enough playouts for a fair comparison.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: 2019 China Securities Cup World AI Open
Post #9 Posted: Sun Aug 25, 2019 9:31 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
The way to see what LZ thinks of the actual 72 hane with 100k playouts is to play it and wait for 100k playouts to happen, as shown in the 2nd picture. To wait for that 580 to turn into 100k on the first position would likely mean the 1st choice move has also gone up by a factor of 200, which is 20 million which would take ages.

Top
 Profile  
 
Offline
 Post subject: Re: 2019 China Securities Cup World AI Open
Post #10 Posted: Sun Aug 25, 2019 10:01 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
Uberdude wrote:
The way to see what LZ thinks of the actual 72 hane with 100k playouts is to play it and wait for 100k playouts to happen, as shown in the 2nd picture. To wait for that 580 to turn into 100k on the first position would likely mean the 1st choice move has also gone up by a factor of 200, which is 20 million which would take ages.


Well, you know LZ, but that has not been my experience playing around with Deep Leela. Plays do not retain their relative playout ratios when you alter the game tree. IIUC, the main differences lie in the networks, not the search strategies.

Edit: In fact, if the ratios remained the same, you would still have a potentially unfair comparison. But if you had, say, 100k playouts for :w72: and 300k playouts for the hanging connection, that's not such an imbalance. :)

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: 2019 China Securities Cup World AI Open
Post #11 Posted: Sun Aug 25, 2019 4:26 pm 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
Uberdude wrote:
The way to see what LZ thinks of the actual 72 hane with 100k playouts is to play it and wait for 100k playouts to happen, as shown in the 2nd picture. To wait for that 580 to turn into 100k on the first position would likely mean the 1st choice move has also gone up by a factor of 200, which is 20 million which would take ages.


Oh, I haven't been taking the second picture and comparing it with the first. What I have been doing with Deep Leela to get a direct comparison is this. After playing :w72: as in the game and generating the second picture, then back up and play :b71: again. That way DL compares the options for :w72: directly, utilizing the altered search tree which focuses more on the actual move in the game than the original tree. Generating the second picture alters the winrate estimates and number of playouts for the first picture. At least, that happens with DL. My guess is that it works that way with LZ as well.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: 2019 China Securities Cup World AI Open
Post #12 Posted: Sun Aug 25, 2019 4:38 pm 
Lives in gote

Posts: 653
Location: Austin, Texas, USA
Liked others: 54
Was liked: 216
Bill Spight wrote:
Uberdude wrote:
The way to see what LZ thinks of the actual 72 hane with 100k playouts is to play it and wait for 100k playouts to happen, as shown in the 2nd picture. To wait for that 580 to turn into 100k on the first position would likely mean the 1st choice move has also gone up by a factor of 200, which is 20 million which would take ages.


Oh, I haven't been taking the second picture and making a direct comparison. What I have been doing with Deep Leela is this. After playing :w72: as in the game and generating the second picture, then backing up and playing :b71: again. That way DL compares the options for :w72: directly, utilizing the altered search tree which focuses more on the actual move in the game than the original tree. Generating the second picture alters the winrate estimate and number of playouts for the first picture. At least, that happens with DL. My guess is that it works that way with LZ as well.


This will sometimes work, but in a chaotic way. The internals go like this: After going to :w72: it will build a tree and analyze positions, and as a side effect, cache those positions. Then when you go back to :b71:, it will reset the tree search part, and start searching again. The search part will go normally, it does not know the result of the deep :w72: search. But all searches starting with :w72: will be in the cache. These will take a shortcut, bypassing the GPU. So even if :w72: normally starts off bad and only becomes good later, this shortcut might allow it to get enough visits to it to see that it's actually a good move.

So you can't really rely on this method, it's more reliable to compare moves by clicking into each one and noting the winrates.


This post by yoyoma was liked by: Bill Spight
Top
 Profile  
 
Offline
 Post subject: Re: 2019 China Securities Cup World AI Open
Post #13 Posted: Sun Aug 25, 2019 7:11 pm 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
yoyoma wrote:
Bill Spight wrote:
Uberdude wrote:
The way to see what LZ thinks of the actual 72 hane with 100k playouts is to play it and wait for 100k playouts to happen, as shown in the 2nd picture. To wait for that 580 to turn into 100k on the first position would likely mean the 1st choice move has also gone up by a factor of 200, which is 20 million which would take ages.


Oh, I haven't been taking the second picture and making a direct comparison. What I have been doing with Deep Leela is this. After playing :w72: as in the game and generating the second picture, then backing up and playing :b71: again. That way DL compares the options for :w72: directly, utilizing the altered search tree which focuses more on the actual move in the game than the original tree. Generating the second picture alters the winrate estimate and number of playouts for the first picture. At least, that happens with DL. My guess is that it works that way with LZ as well.


This will sometimes work, but in a chaotic way. The internals go like this: After going to :w72: it will build a tree and analyze positions, and as a side effect, cache those positions. Then when you go back to :b71:, it will reset the tree search part, and start searching again. The search part will go normally, it does not know the result of the deep :w72: search. But all searches starting with :w72: will be in the cache. These will take a shortcut, bypassing the GPU. So even if :w72: normally starts off bad and only becomes good later, this shortcut might allow it to get enough visits to it to see that it's actually a good move.

So you can't really rely on this method, it's more reliable to compare moves by clicking into each one and noting the winrates.


Let me see if I understand you. The best way to compare play A and play B in terms of winrates is to make each play and observe the winrate estimate of the bot's choice for the opponent's reply to each.

I tried that approach with Deep Leela (faute de mieux, au moment) and iterated attempts until the winrate estimates converged to the same 0.1%. That yielded these pictures:

Attachment:
hanging connection.png
hanging connection.png [ 240.48 KiB | Viewed 9403 times ]


Black's reply to the hanging connection has a winrate estimate of 47.9%.

Attachment:
White 72.png
White 72.png [ 219.19 KiB | Viewed 9403 times ]


Black's reply to FineArt's :w72: has a winrate estimate of 43.7%.

So, FWIW, DL prefers FineArt's actual play over the hanging connection by more than 4%, even though the actual play was not on its radar after :b71:, and still is not. Right?

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: 2019 China Securities Cup World AI Open
Post #14 Posted: Mon Aug 26, 2019 8:18 am 
Lives in gote

Posts: 653
Location: Austin, Texas, USA
Liked others: 54
Was liked: 216
Bill Spight wrote:
Black's reply to the hanging connection has a winrate estimate of 47.9%.

Black's reply to FineArt's :w72: has a winrate estimate of 43.7%.

So, FWIW, DL prefers FineArt's actual play over the hanging connection by more than 4%, even though the actual play was not on its radar after :b71:, and still is not. Right?


Yes I think we're on the same page now. Before actually playing move 72, LZ doesn't spend enough time considering FineArt's play to realize it's good. Also LZ code isn't smart enough to reuse results from a forced deeper search of 72 in the way you tried -- going forward and then back.

I just realized you're talking about the original Leela with deep learning, not Leela Zero? Are those screenshots from original Leela's built in GUI? I was talking about Leela Zero, but I think the same logic applies to both.

Top
 Profile  
 
Offline
 Post subject: Re: 2019 China Securities Cup World AI Open
Post #15 Posted: Mon Aug 26, 2019 9:33 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
yoyoma wrote:
Bill Spight wrote:
Black's reply to the hanging connection has a winrate estimate of 47.9%.

Black's reply to FineArt's :w72: has a winrate estimate of 43.7%.

So, FWIW, DL prefers FineArt's actual play over the hanging connection by more than 4%, even though the actual play was not on its radar after :b71:, and still is not. Right?


Yes I think we're on the same page now. Before actually playing move 72, LZ doesn't spend enough time considering FineArt's play to realize it's good. Also LZ code isn't smart enough to reuse results from a forced deeper search of 72 in the way you tried -- going forward and then back.

I just realized you're talking about the original Leela with deep learning, not Leela Zero? Are those screenshots from original Leela's built in GUI? I was talking about Leela Zero, but I think the same logic applies to both.


Yeah, I am planning to buy a new desktop later this year. Meanwhile I am making occasional use of Deep Leela ( https://www.deepleela.com ). The screenshots are from there. The web site claims to use Leela Zero, but I am pretty sure they still are using Leela 11 right now. It still likes the slide to the 4-2, underneath the 4-4 stone, instead of the jump attachment on the 4-3, for instance.

Thanks for the suggestion. Relying upon the estimated winrates of the opponent's replies works much quicker than backing up. :) Given propensity of LZ to make more visits to plays it thinks are better — probably unavoidable with any kind of best first search — backing up is probably inferior for direct comparisons, anyway. ;)

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: 2019 China Securities Cup World AI Open
Post #16 Posted: Tue Aug 27, 2019 8:55 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
Results of the tournament including links to games are here: https://www.reddit.com/r/baduk/comments ... 9_summary/

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 16 posts ] 

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group