On the accuracy of winrates

For discussing go computing, software announcements, etc.
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: On the accuracy of winrates

Post by Bill Spight »

Well, I'm old enough to believe in reality testing. ;)
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
RobertJasiek
Judan
Posts: 6273
Joined: Tue Apr 27, 2010 8:54 pm
GD Posts: 0
Been thanked: 797 times
Contact:

Re: On the accuracy of winrates

Post by RobertJasiek »

When AIs suggest interesting alternatives, identifies blunders or overlooked relevant tactical variations, it is good to learn them. However, we should not over-interpret percentages. A next program version or other AIs might already produce others.
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: On the accuracy of winrates

Post by Bill Spight »

Here is a file comparing winrate estimates of Leela Elf at settings of 100K and 200K. Edit: Graciously provided by Ales Cieply here. viewtopic.php?p=234293#p234293 My working hypothesis is that the differences reflect possible errors at the 100K setting.

I expect to discuss these findings, which can only be preliminary, later. :)
Metta-Ben David Workbook1 Sheet1.pdf
(36.41 KiB) Downloaded 513 times
Edit: I am unfamiliar with Excel, and ended up with a printout that lost some characters. I apologize and am attaching a more readable file. Please note that ∆ in this file refers to the difference between Leela Elf's winrate estimates for the same play at the 100K and 200K settings (whether they made the same choice or not). It does not refer to the estimated gain or loss in winrate for a player's choice.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: On the accuracy of winrates

Post by Bill Spight »

My working hypothesis is that Leela Elf with the 200K setting is better than it is with the 100K setting. (N.B. These playout numbers are not actually observed in the files.) So the observed ∆s are not random noise, but indicate likely errors with the 100K setting. The sign changes in the ∆s in the game record support that hypothesis.

The median ∆ is -0.03. If we subtract that amount from each ∆ we get 137 ∆s with one 0. Ignoring that ∆ we have a sequence of 136 signed ∆s, half with a + sign, half with a - sign. Our expected number of sign changes in the sequence (ignoring the 0) is 136/2 = 68. We get only 50 sign changes, too few for a random sequence.

This lack of randomness is more obvious when we look at sequences of signs of the same kind, called runs. The expected random run length is 2. The average run length for the game is 2.7. What mainly skews the result is two runs of length 12. :o (One of these contains the median, so is 13 moves long.) The first long run (13 moves) begins at the position after :b67:, based upon Leela Elf's choice for :w68:. (So it shows up starting at move 68 in the chart.) The second long run (12 moves) begins at the position after Black 147 (move 148 in the chart). One explanation for these long runs is that there are persistent features of the board in each that Leela Elf misevaluates at a setting of 100K and evaluates better at a setting of 200K. During the first run it underestimates Black's chances, and during the second run in overestimates them.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: On the accuracy of winrates

Post by Bill Spight »

Some comments on the first run. :)

Go to move :b67:.



Edit: Added a few comments on the second long run. Go to move 147.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: On the accuracy of winrates

Post by Bill Spight »

In terms of learning something about go, this exercise was not worth much. However, I do think the methodology can help to pinpoint positions that a bot is likely to be misevaluating, to what degree, and in whose favor, even though we cannot say what is best play. :)

I think it also bolsters my rule of thumb, already stated, that with Leela Elf 100K we should not worry about possible errors of less than 3%. OC, this is only one game, one bot, and one setting. More research on other games, other bots, and other settings could well be of value. But the main takeaway, IMHO, is this:

Don't sweat the small stuff.

:cool:
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
Post Reply