pnprog wrote:Hi Bill!
Bill Spight wrote:So to find a good delta we don't want to do what Go Review Partner does. It's OK for casual review, but not for scientific purposes. We want to start from the same place, and we want to have an equal number of playouts for each play we are comparing. With Go Review Partner I think we can do that by making each play we are comparing and then running the bot for a certain number of rollouts, or for a certain length of time. That way we are comparing apples with apples.
There is not direct way to ask Leela (or other bot) to evaluate one specific move. So do you mean something like:
For one given position:
- Check out the move "A1" played in actual game (let's imagine D16)
- Check out what move, "B1", would have been played by the bot (let's imagine D17)
- Ask the bot for its best counter move "A2", to the move "A1" (let's imagine C14)
- Ask the bot for its best counter move "B2", to the move "B1" (let's imagine C15)
Then, if W(X) is the win rate of move at X, then:
delta = W(B2)-W(A2)
And then, the thinking parameters (time and play-outs) should be the same when asking the bot to come out with "B1","A2" and "B2".
Is that what you mean?
By the way, we could ask the Leela Zero team is they can come up with a specific GTP command to evaluate one precise move. Maybe it's not that hard to implement.
Here's what I am talking about. Let's look at moves

and

in the Metta-Ben David game.
$$Wcm14
$$ ---------------------------------------
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . X . . |
$$ | . . . O . . . . . , . . . . . , . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . , . . . . . , . . . . . , . . . |
$$ | . . O . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . X . . . . . . . . . . . . . . . . |
$$ | . . . X X X O . . , . . . . . , . . . |
$$ | . . X O O . O . . . . . . 1 . . X . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ ---------------------------------------
- Click Here To Show Diagram Code
[go]$$Wcm14
$$ ---------------------------------------
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . X . . |
$$ | . . . O . . . . . , . . . . . , . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . , . . . . . , . . . . . , . . . |
$$ | . . O . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . X . . . . . . . . . . . . . . . . |
$$ | . . . X X X O . . , . . . . . , . . . |
$$ | . . X O O . O . . . . . . 1 . . X . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ ---------------------------------------[/go]
Leela evaluates 9 different replies to

.
Its top choice is the keima.
$$Wcm14 Keima
$$ ---------------------------------------
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . X . . |
$$ | . . . O . . . . . , . . . . . , . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . , . . . . . , . . . . . , . . . |
$$ | . . O . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . X . . . . . . . . . . . . 2 . . . |
$$ | . . . X X X O . . , . . . . . , . . . |
$$ | . . X O O . O . . . . . . 1 . . X . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ ---------------------------------------
- Click Here To Show Diagram Code
[go]$$Wcm14 Keima
$$ ---------------------------------------
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . X . . |
$$ | . . . O . . . . . , . . . . . , . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . , . . . . . , . . . . . , . . . |
$$ | . . O . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . X . . . . . . . . . . . . 2 . . . |
$$ | . . . X X X O . . , . . . . . , . . . |
$$ | . . X O O . O . . . . . . 1 . . X . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ ---------------------------------------[/go]
It evaluates this as 55.90% for Black with 222084 playouts.
$$Wcm14 De
$$ ---------------------------------------
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . X . . |
$$ | . . . O . . . . . , . . . . . , . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . , . . . . . , . . . . . , . . . |
$$ | . . O . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . X . . . . . . . . . . . . . . . . |
$$ | . . . X X X O . . , . . . . . , . . . |
$$ | . . X O O 2 O . . . . . . 1 . . X . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ ---------------------------------------
- Click Here To Show Diagram Code
[go]$$Wcm14 De
$$ ---------------------------------------
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . X . . |
$$ | . . . O . . . . . , . . . . . , . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . , . . . . . , . . . . . , . . . |
$$ | . . O . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . X . . . . . . . . . . . . . . . . |
$$ | . . . X X X O . . , . . . . . , . . . |
$$ | . . X O O 2 O . . . . . . 1 . . X . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ ---------------------------------------[/go]
Its second choice is the de, which it evaluates as 54.72% for Black with 136882 playouts. That is fewer playouts, but they are in the same ballpark, and good enough, I think, for a winrate difference of 1.2%.
$$Wcm14 Two space extension
$$ ---------------------------------------
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . X . . |
$$ | . . . O . . . . . , . . . . . , . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . , . . . . . , . . . . . , . . . |
$$ | . . O . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . 2 . . |
$$ | . . X . . . . . . . . . . . . . . . . |
$$ | . . . X X X O . . , . . . . . , . . . |
$$ | . . X O O . O . . . . . . 1 . . X . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ ---------------------------------------
- Click Here To Show Diagram Code
[go]$$Wcm14 Two space extension
$$ ---------------------------------------
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . X . . |
$$ | . . . O . . . . . , . . . . . , . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . , . . . . . , . . . . . , . . . |
$$ | . . O . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . 2 . . |
$$ | . . X . . . . . . . . . . . . . . . . |
$$ | . . . X X X O . . , . . . . . , . . . |
$$ | . . X O O . O . . . . . . 1 . . X . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ ---------------------------------------[/go]
Leela's third choice is the two space extension, which it evaluates as 54.37% with 41693 playouts. The two winrates are not all that comparable, but good enough for the winrate difference of 1.5%.
$$Wcm14 Two space high pincer
$$ ---------------------------------------
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . X . . |
$$ | . . . O . . . . . , . . . . . , . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . , . . . . . , . . . . . , . . . |
$$ | . . O . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . X . . . . . . . . . . . . . . . . |
$$ | . . . X X X O . . , 2 . . . . , . . . |
$$ | . . X O O . O . . . . . . 1 . . X . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ ---------------------------------------
- Click Here To Show Diagram Code
[go]$$Wcm14 Two space high pincer
$$ ---------------------------------------
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . X . . |
$$ | . . . O . . . . . , . . . . . , . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . , . . . . . , . . . . . , . . . |
$$ | . . O . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . X . . . . . . . . . . . . . . . . |
$$ | . . . X X X O . . , 2 . . . . , . . . |
$$ | . . X O O . O . . . . . . 1 . . X . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ ---------------------------------------[/go]
Leela's sixth choice is the two space high pincer, with only 782 playouts. With so few playouts, it is not worth figuring a winrate difference.
Now let's look at

in the game.
$$Wcm14 Two space extension
$$ ---------------------------------------
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . X . . |
$$ | . . . O . . . . . , . . . . . , . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . , . . . . . , . . . . . , . . . |
$$ | . . O . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . 2 . . |
$$ | . . X . . . . . . . . . . . . . . . . |
$$ | . . . X X X O . . , . . . . . , . . . |
$$ | . . X O O . O . . . . . . 1 . . X . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ ---------------------------------------
- Click Here To Show Diagram Code
[go]$$Wcm14 Two space extension
$$ ---------------------------------------
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . X . . |
$$ | . . . O . . . . . , . . . . . , . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . , . . . . . , . . . . . , . . . |
$$ | . . O . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . 2 . . |
$$ | . . X . . . . . . . . . . . . . . . . |
$$ | . . . X X X O . . , . . . . . , . . . |
$$ | . . X O O . O . . . . . . 1 . . X . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ ---------------------------------------[/go]
Leela evaluates it as 55.01% for Black with around 341,000 playouts. (I did not add them up exactly.) With many more playouts the winrate is more precise, and presumably more accurate. Not a big difference in this case, but bigger differences have been observed. The delta is 0.9% instead of a winrate difference of 1.5%.
The first comparison, between Leela's first and third choices, is at the same depth of the tree, but with quite different playouts. IIUC, it is not easy to equalize the number of playouts, because, as a kind of Monte Carlo bot, Leela uses the number of playouts as one of its criteria to decide which play to choose. Its purpose is to pick plays, not just evaluate positions and plays.
The second comparison, for the delta, has a comparable number of playouts for the two choices in this case, but they start at different levels of the game tree. Now, Leela is run at

in the game tree, according to the conditions set, time or number of playouts, or whatever. Is it not possible to make a separate variation with Leela's first choice, the keima, and run Leela for it, under the same conditions as Leela is run for the actual play in the game? That would give comparisons made under the same conditions at the same level in the game tree.
One way to do that might be, after Leela has evaluated

in the actual game, before it evaluates

in the actual game, have it evaluate Leela's first choice for

. (You wouldn't even have to check to see if it is different from the actual play. Double comparisons of the same play would give you an idea of the error rate of the winrate estimates. Something that we do not currently have.) Another possibility would be go over the game a second time, this time only evaluating the variations with Leela's first choices from the initial run.