It is currently Sat Oct 19, 2019 1:22 am

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 82 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next
Author Message
Offline
 Post subject: Re: On the accuracy of winrates
Post #61 Posted: Fri Aug 17, 2018 4:55 pm 
Lives with ko

Posts: 247
Liked others: 0
Was liked: 31
Rank: 2d
Vance wrote:
Collect samples, have the bot play them out, compare actual results with the estimates.
There should be no need to play out, just make a detailed/parametrized correlation table from LZ's last million of selfplay games. Except they don't seem to record winrate estimates (only visit counts) in the training data, and selfplay sgfs are not annotated AFAIK (if kept at all). :scratch:

Top
 Profile  
 
Offline
 Post subject: Re: On the accuracy of winrates
Post #62 Posted: Sun Aug 19, 2018 6:44 am 
Lives with ko

Posts: 247
Liked others: 0
Was liked: 31
Rank: 2d
On second thought there is at least one particular problem with bot winrates, if used for anything else outside the bot's search. Because of the way pure MCTS works (averaging), winrates can only change slowly (the later in the search the slower potential changes are).

In peaceful positions this should be ok, just refining raw NN evaluations. But suppose there are two candidates A and B, with estimates 70% and 65% after few thousands visits. Then suddenly a tesuji/refutation move is found below A (actually B is only move). Now with further search A winrate starts to decrease (as losing lines start to average into it) but will only move towards 30% (its correct value) slowly. (Depending on remaining analysis limit the bot may even still play A knowing its refutation, and even after it fell below B in value, if further visits are not enough to also overcome its visit disadvantage.)

So the returned value MAY be in the middle of a slow but significant change, lagging behind current knowledge (thus quite random), but there is no obvious indication of this. Maybe UIs could display small red/green down/up arrows (like stock prices) beside moves under such reconsideration. It could also be possible to compare estimate distribution to visit distribution, and guess the stability of current eval from this.

Top
 Profile  
 
Offline
 Post subject: Re: On the accuracy of winrates
Post #63 Posted: Mon Aug 20, 2018 4:47 am 
Dies in gote
User avatar

Posts: 22
Location: Vienna, Austria
Liked others: 571
Was liked: 11
moha wrote:
On second thought there is at least one particular problem with bot winrates, if used for anything else outside the bot's search. Because of the way pure MCTS works (averaging), winrates can only change slowly (the later in the search the slower potential changes are).

In peaceful positions this should be ok, just refining raw NN evaluations. But suppose there are two candidates A and B, with estimates 70% and 65% after few thousands visits. Then suddenly a tesuji/refutation move is found below A (actually B is only move). Now with further search A winrate starts to decrease (as losing lines start to average into it) but will only move towards 30% (its correct value) slowly. (Depending on remaining analysis limit the bot may even still play A knowing its refutation, and even after it fell below B in value, if further visits are not enough to also overcome its visit disadvantage.)

So the returned value MAY be in the middle of a slow but significant change, lagging behind current knowledge (thus quite random), but there is no obvious indication of this. Maybe UIs could display small red/green down/up arrows (like stock prices) beside moves under such reconsideration. It could also be possible to compare estimate distribution to visit distribution, and guess the stability of current eval from this.


Building the search tree and selecting the move to play are two different tasks that could be treated differently.

MCTS seems to work fine for guiding the search. But to avoid the problem of insensitivity to sudden changes at the leaf nodes, one could use a version of the alpha-beta algorithm to select the move to play in the root position.

Of course one must be careful not to go to the other extreme: Propagating the values from the leaf nodes back to the root with alpha-beta could make the move selection to sensitiv to a single wrong evaluation. A possible solution could consider only "reliable" leaf nodes, i.e. such nodes with at least a certain number of evaluated child nodes below them.

Top
 Profile  
 
Offline
 Post subject: Re: On the accuracy of winrates
Post #64 Posted: Tue Aug 21, 2018 10:52 am 
Dies in gote

Posts: 23
Liked others: 7
Was liked: 3
LZ trying to escape a ladder and failed.
Games between AQ-GO (w) and LZ (b)
Both are android version running on the same phone. There are threads on how to set up both in this forum.
LZ is set to 10 sec/move
Attachment:
Screenshot_2018-08-22-01-10-19-838_cn.ezandroid.aqgo.png
Screenshot_2018-08-22-01-10-19-838_cn.ezandroid.aqgo.png [ 1012.19 KiB | Viewed 2739 times ]


10 sec/move on android is not a lot of computer power for tree search - but that is the sort of timing that human will tolerate playing against a computer.
AQ has menu options that say "show ladder capture", "show ladder escape", that means AQ as built-in logic for ladder.

I am convinced that LZ need similar control. I don't play a lot as I am more interested in the algorithm. But I have already encounter enough failed ladder in LZ.


This post by chut was liked by: Bill Spight
Top
 Profile  
 
Offline
 Post subject: Re: On the accuracy of winrates
Post #65 Posted: Thu Aug 30, 2018 2:18 am 
Judan

Posts: 6119
Location: Cambridge, UK
Liked others: 350
Was liked: 3293
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
Bill et al,
Here is a github thread in which people are making Elf or LZ play against itself from the same position (ones posted by the Russian Go Fed twitter with extreme Elf viewpoints after a human joseki) to test the accuracy of winrates. Quick summary, Elf v1 gave a position 4%, but in 50 game match at 1.6k visits it won 22%. In the one we discussed here which Elf v1 gave 1% for black latest LZ 20b gave 25% and in a 170 game match won 22%, much closer. Of course it's possible Elf's win% would be closer to a match result if the match was with Elf's not LZ's engine and at a gazillion playouts per move.


This post by Uberdude was liked by 2 people: Bill Spight, dfan
Top
 Profile  
 
Offline
 Post subject: Re: On the accuracy of winrates
Post #66 Posted: Thu Aug 30, 2018 2:36 am 
Honinbo

Posts: 8898
Liked others: 2664
Was liked: 3023
Uberdude wrote:
Bill et al,
Here is a github thread in which people are making Elf or LZ play against itself from the same position (ones posted by the Russian Go Fed twitter with extreme Elf viewpoints after a human joseki) to test the accuracy of winrates. Quick summary, Elf v1 gave a position 4%, but in 50 game match at 1.6k visits it won 22%. In the one we discussed here which Elf v1 gave 1% for black latest LZ 20b gave 25% and in a 170 game match won 22%, much closer. Of course it's possible Elf's win% would be closer to a match result if the match was with Elf's not LZ's engine and at a gazillion playouts per move.


Many thanks. :D

OC, playouts per move matter. I would want at least 10k visits, myself. But I doubt if that would overcome an 18% difference. ;)

Another thing at work is the statistical phenomenon of regression to the mean. That is, we should expect positions chosen because they are extreme to produce less extreme results. Still, the degree of regression is quite shocking. :shock:

Edit: And testing Elf's projections, based upon its own self play should not have used Leela's self play. As dfan pointed out somewhere recently, since Leela is weaker than Elf, Leela's self play results should be closer to 50%.

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

I think it's a great idea to talk during sex, as long as it's about snooker.

— Steve Davis

Top
 Profile  
 
Offline
 Post subject: Re: On the accuracy of winrates
Post #67 Posted: Thu Aug 30, 2018 2:46 am 
Judan

Posts: 6119
Location: Cambridge, UK
Liked others: 350
Was liked: 3293
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
Bill Spight wrote:
Edit: And testing Elf's projections, based upon its own self play should not have used Leela's self play.

They did use Elf's network, converted for use in the LZ engine. The source code for the Elf engine is available but apparently it's really hard to compile, I don't know if anyone has managed yet. How much difference the LZ vs Elf engine makes is an open question to me (but should at least be less that the weights!).


This post by Uberdude was liked by: Bill Spight
Top
 Profile  
 
Offline
 Post subject: Re: On the accuracy of winrates
Post #68 Posted: Thu Aug 30, 2018 3:20 am 
Honinbo

Posts: 8898
Liked others: 2664
Was liked: 3023
Uberdude wrote:
Bill Spight wrote:
Edit: And testing Elf's projections, based upon its own self play should not have used Leela's self play.

They did use Elf's network, converted for use in the LZ engine. The source code for the Elf engine is available but apparently it's really hard to compile, I don't know if anyone has managed yet. How much difference the LZ vs Elf engine makes is an open question to me (but should at least be less that the weights!).


OIC. Thanks. :)

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

I think it's a great idea to talk during sex, as long as it's about snooker.

— Steve Davis

Top
 Profile  
 
Offline
 Post subject: Re: On the accuracy of winrates
Post #69 Posted: Thu Aug 30, 2018 3:25 am 
Honinbo

Posts: 8898
Liked others: 2664
Was liked: 3023
As far as the number of visits goes, my preliminary results suggest that with a setting of 100k, Leela Zero's margin of error is at least 3%. With only 1600 visits, God only knows!

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

I think it's a great idea to talk during sex, as long as it's about snooker.

— Steve Davis

Top
 Profile  
 
Offline
 Post subject: Re: On the accuracy of winrates
Post #70 Posted: Thu Aug 30, 2018 3:32 am 
Lives with ko

Posts: 247
Liked others: 0
Was liked: 31
Rank: 2d
Elf value head is known to be much sharper, more sensitive to slight advantages or disadvantages. Would these results still seem incorrect if we interpret Elf estimate as "probability that a perfect player would win against a perfect player" from here? That value cannot be determined but can only be 0 or 1.

Actual LZ or even Elf play have some randomness in their moves, so such practical win percentages are expected to be closer to 50%. On the above interpretation if the engine estimate is close to practical win percentage it may even mean it is less accurate. (No point for it to include this huge random factor - btw it would also be interesting to do the playout test with such randomness disabled.)

The only problem with this viewpoint is that AFAIK neither Elf nor other bot training targeted the perfect solution. :) And it seems hard to imagine how that direction would be possible indirectly, without direct training data (maybe at price of huge training slowdown - or maybe simply using less randomness in selfplay could have slightly similar effect).

In any case, directly comparing bot selfplay winrates to bot estimates seems incorrect - these are two completely different things. At the very least the former depends heavily on bot move randomness configuration, so comparing to it has reduced meaning.

Top
 Profile  
 
Offline
 Post subject: Re: On the accuracy of winrates
Post #71 Posted: Thu Aug 30, 2018 6:03 am 
Honinbo

Posts: 8898
Liked others: 2664
Was liked: 3023
moha wrote:
Elf value head is known to be much sharper, more sensitive to slight advantages or disadvantages. Would these results still seem incorrect if we interpret Elf estimate as "probability that a perfect player would win against a perfect player" from here?


That depends upon the meaning of probability.

Quote:
That value cannot be determined but can only be 0 or 1.


If that is your meaning of probability, then your proposed interpretation is impossible. In that case, better interpret winrates as assuming errors. The question is, whose errors. From what I hear they are the bot's errors in self play.

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

I think it's a great idea to talk during sex, as long as it's about snooker.

— Steve Davis

Top
 Profile  
 
Offline
 Post subject: Re: On the accuracy of winrates
Post #72 Posted: Thu Aug 30, 2018 6:13 am 
Lives with ko

Posts: 247
Liked others: 0
Was liked: 31
Rank: 2d
I meant the actual, observable (if would be possible) value for "winner with perfect play" is 0 or 1.

The point is, predicting/estimating this is completely different to predicting selfplay results (which are almost always much closer to 50%).

With "probability" I meant the best guess from all available information.

Top
 Profile  
 
Offline
 Post subject: Re: On the accuracy of winrates
Post #73 Posted: Thu Aug 30, 2018 7:16 am 
Honinbo

Posts: 8898
Liked others: 2664
Was liked: 3023
IMO, the people who produce winrates should define the term.

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

I think it's a great idea to talk during sex, as long as it's about snooker.

— Steve Davis

Top
 Profile  
 
Offline
 Post subject: Re: On the accuracy of winrates
Post #74 Posted: Thu Aug 30, 2018 9:24 am 
Lives in gote

Posts: 426
Liked others: 1
Was liked: 119
Rank: KGS 2k
GD Posts: 100
KGS: Tryss
Bill Spight wrote:
As far as the number of visits goes, my preliminary results suggest that with a setting of 100k, Leela Zero's margin of error is at least 3%. With only 1600 visits, God only knows!


Actually, it should be more accurate the closer you're to the training parameters.

Quote:
IMO, the people who produce winrates should define the term.


It's "just" a metric of who's ahead. As this experiment show, it's not exactly the probability of winning. But this would be really hard to calculate, and wouldn't be much more usefull than what we actually have. "The probability this exact network win against itself at x visits in this exact position" is not much more interesting than what have now. LZ winrate seems close enough (the difference between 22% or 26% is kinda irrelevant)

Top
 Profile  
 
Offline
 Post subject: Re: On the accuracy of winrates
Post #75 Posted: Thu Aug 30, 2018 9:50 am 
Honinbo

Posts: 8898
Liked others: 2664
Was liked: 3023
Tryss wrote:
Bill Spight wrote:
IMO, the people who produce winrates should define the term.


It's "just" a metric of who's ahead.


That's what I thought when I started this thread, but apparently for the Zero bots it actually is an estimate of the winning percentage from the current position. But this seems to be a matter of dispute, at least here.

Quote:
As this experiment show, it's not exactly the probability of winning. But this would be really hard to calculate, and wouldn't be much more usefull than what we actually have. "The probability this exact network win against itself at x visits in this exact position" is not much more interesting than what have now.


Except that humans want to use winrate differences to say whether certain plays are likely errors, and how bad the errors are.

Quote:
LZ winrate seems close enough (the difference between 22% or 26% is kinda irrelevant)


Vanitas vanitatum, omnia vanitas.

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

I think it's a great idea to talk during sex, as long as it's about snooker.

— Steve Davis

Top
 Profile  
 
Offline
 Post subject: Re: On the accuracy of winrates
Post #76 Posted: Thu Aug 30, 2018 3:08 pm 
Lives in gote

Posts: 426
Liked others: 1
Was liked: 119
Rank: KGS 2k
GD Posts: 100
KGS: Tryss
Quote:
That's what I thought when I started this thread, but apparently for the Zero bots it actually is an estimate of the winning percentage from the current position.


This metric is derived from game results. But it's interpolated data. You feed the self-play positions and the result to the network, and it try to fit itself to these data.

One thing that may have an impact : the network is trained on games played by older networks. But hard to say how much impact it has.

Bill Spight wrote:
Except that humans want to use winrate differences to say whether certain plays are likely errors, and how bad the errors are.


We can already do that, and that's how LZ use these winrate too. It doesn't needs to be truly accurate for this, just monotonous and consistent enough (and "nice" enough).

For exemple, if LZ winrate in function of the true winrate looks something like this :

Image

Then it's perfectly usable by humans players.

Top
 Profile  
 
Offline
 Post subject: Re: On the accuracy of winrates
Post #77 Posted: Thu Aug 30, 2018 4:12 pm 
Honinbo

Posts: 8898
Liked others: 2664
Was liked: 3023
Well, I'm old enough to believe in reality testing. ;)

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

I think it's a great idea to talk during sex, as long as it's about snooker.

— Steve Davis

Top
 Profile  
 
Offline
 Post subject: Re: On the accuracy of winrates
Post #78 Posted: Thu Aug 30, 2018 9:38 pm 
Tengen

Posts: 4803
Liked others: 0
Was liked: 650
When AIs suggest interesting alternatives, identifies blunders or overlooked relevant tactical variations, it is good to learn them. However, we should not over-interpret percentages. A next program version or other AIs might already produce others.

Top
 Profile  
 
Offline
 Post subject: Re: On the accuracy of winrates
Post #79 Posted: Sat Oct 06, 2018 9:51 pm 
Honinbo

Posts: 8898
Liked others: 2664
Was liked: 3023
Here is a file comparing winrate estimates of Leela Elf at settings of 100K and 200K. Edit: Graciously provided by Ales Cieply here. viewtopic.php?p=234293#p234293 My working hypothesis is that the differences reflect possible errors at the 100K setting.

I expect to discuss these findings, which can only be preliminary, later. :)

Attachment:
Metta-Ben David Workbook1 Sheet1.pdf [36.41 KiB]
Downloaded 49 times


Edit: I am unfamiliar with Excel, and ended up with a printout that lost some characters. I apologize and am attaching a more readable file. Please note that ∆ in this file refers to the difference between Leela Elf's winrate estimates for the same play at the 100K and 200K settings (whether they made the same choice or not). It does not refer to the estimated gain or loss in winrate for a player's choice.

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

I think it's a great idea to talk during sex, as long as it's about snooker.

— Steve Davis

Top
 Profile  
 
Offline
 Post subject: Re: On the accuracy of winrates
Post #80 Posted: Sun Oct 07, 2018 7:33 am 
Honinbo

Posts: 8898
Liked others: 2664
Was liked: 3023
My working hypothesis is that Leela Elf with the 200K setting is better than it is with the 100K setting. (N.B. These playout numbers are not actually observed in the files.) So the observed ∆s are not random noise, but indicate likely errors with the 100K setting. The sign changes in the ∆s in the game record support that hypothesis.

The median ∆ is -0.03. If we subtract that amount from each ∆ we get 137 ∆s with one 0. Ignoring that ∆ we have a sequence of 136 signed ∆s, half with a + sign, half with a - sign. Our expected number of sign changes in the sequence (ignoring the 0) is 136/2 = 68. We get only 50 sign changes, too few for a random sequence.

This lack of randomness is more obvious when we look at sequences of signs of the same kind, called runs. The expected random run length is 2. The average run length for the game is 2.7. What mainly skews the result is two runs of length 12. :o (One of these contains the median, so is 13 moves long.) The first long run (13 moves) begins at the position after :b67:, based upon Leela Elf's choice for :w68:. (So it shows up starting at move 68 in the chart.) The second long run (12 moves) begins at the position after Black 147 (move 148 in the chart). One explanation for these long runs is that there are persistent features of the board in each that Leela Elf misevaluates at a setting of 100K and evaluates better at a setting of 200K. During the first run it underestimates Black's chances, and during the second run in overestimates them.

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

I think it's a great idea to talk during sex, as long as it's about snooker.

— Steve Davis

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 82 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group