Here is an example of what I am talking about. It is taken from the Elf GoGoD commentaries. Other programs may and almost surely do differ from Elf, but I think my point remains the same.
- Click Here To Show Diagram Code
[go]$$Wcm24 Kim Kiheon, 7 dan (W) vs. Jimmy Cha, 5 dan, 2018-07-18m
$$ ---------------------------------------
$$ | . . . . . . . . . . . O a . X . X X . |
$$ | . . . . . . X . . X X X O b O X O X . |
$$ | . . O X . X O X X X O O . . O X . . . |
$$ | . . X X . X O O O O O O . . O X X . . |
$$ | . . . . . . X O O . X X O 1 O X . X . |
$$ | . . . . . . X X . O . . O X O O X . X |
$$ | . O . . . . . X O O X X 5 X X O O X . |
$$ | X X X X X X X O . O X . . . . X O 3 4 |
$$ | O X X O X O X O O X X . O X X X X X 2 |
$$ | O O O O O O O . . O X X X X O , X O . |
$$ | . X O . . . . O . O X O O X O . O . O |
$$ | . . X O O O . O O X O O O . O O . O . |
$$ | . X . . X O . X X X X X O O O . O . . |
$$ | . O X X . O X X . . . X X O X X O . . |
$$ | O . O . X . . O X . . X O O X O . O O |
$$ | . O O O X . O O O X X X O O X O O O X |
$$ | . . X . O O . O . O X . X X . X O X X |
$$ | . . . . . . . O . O X X . . X . X . X |
$$ | . . . . . . . . O . O . . . . . . X . |
$$ ---------------------------------------[/go]
This is the game record of moves 224 - 228.
After
Elf estimates Black's winrate as 92.6% with 12,886 rollouts. This is apparently Elf's second choice, with the also rans not reported, as they got fewer than 1500 rollouts. Elf's top choice is at
a, yielding a Black winrate estimate of 87.2% with 13,450 rollouts. That play is 5.4% better for White, both estimates based on around 13k rollouts.
After
Black's winrate estimate is 81.4% with 39 rollouts. That's a drop of 11.2%, but how accurate is an estimate based on only 39 rollouts?
I don't know what other programs do, but Elf inherits that estimate from its top choice for
, which is
. That estimate is 81.4% with 17,451 rollouts, a respectable number.
OK, we have a drop of 11.2% between
and
(actually, between
and Elf's top choice for
). As I say, I don't know what other programs do. If they use an estimate based on only 39 rollouts, all I can do is to roll my eyes.
But what about the comparison between
and Elf's top choice for
, which is
b? After Black
b, Black's winrate estimate is 93.5% with 41,491 rollouts, a substantial number. The difference between that and the estimate for
is 12.1%, around 1% more than the drop between positions. A minor difference.
Moving on. After
we get a Black winrate estimate of 87.2% with 575 rollouts. In this case, since the estimate for the previous position was inherited from Elf's top choice for
there is no difference between the drop of 5.8% between succesive positions and the winrate difference between plays. However, the fact that one estimate is based on only 575 rollouts while the other estimate is based on 17,451 rollouts is a question. Elf does not inherit estimates if there are 500 or more rollouts. But if it did, it would inherit from ELf's top choice for
(also at
b) an estimate of 93.5% with 28,501 rollouts. That's a substantial difference, both in the winrate estimate and the number of rollouts. Wouldn't it be better for the analyst program to make the human play de novo, and rely upon neither a comparatively low number of rollouts nor inherit the estimate from a different play?
After
Black's winrate estimate is 78.7% with 301 rollouts. Since this is less than 500 rollouts, the estimate is inherited from the estimate after
, which is also Elf's top choice, with 25,793 rollouts.
The drop between positions is 8.5%. However, the difference between the estimate for
and Black
b is 14.8%. Is
a minor error, costing 8 or 9%, or a substantial error, costing 15%?
When you have winrate estimates based upon vastly different numbers of rollouts you can easily get these discrepancies. Also, you can get discrepancies between successive estimates based upon the horizon effect. Better, IMO, to have the analyst program make the human moves de novo for comparison with its top choice (unless both moves are the same, OC).
----
Edit: More dramatic examples are possible, for instance, where the human play is better than Elf's play. But I wanted to show an ordinary position where these questions arise.