Bill, I'm pretty sure that these are the judgment of AlphaGo. For AlphaGo Master, that would presumably be a weighted average of its value net evaluations and possibly monte-carlo rollouts. Exactly how it does the weighting is unknown since Deepmind never released many technical details about the versions of AlphaGo between AlphaGo Fan and AlphaGo Zero, but regardless, these values should be precisely the values AlphaGo would be using to decide on its move**.Bill Spight wrote: Unclear what that 2% difference means. It seems to be based upon the win rate of simulations, not upon the judgement of AlphaGo. But simulations of what?
To clarify the terminology: in MCTS, a "simulation" doesn't have to be a simulation of anything. It simply means one pass from the root down to a leaf, with the accompanying call to the value net and the monte-carlo rollout at the leaf, and incrementing all the statistics. And also, the "winrate" or "winning probability" is not directly a probability of anything - it is simply the weighted average of the rollout win/loss statistics and the value net's evaluation statistics at all the nodes in that subtree.
Yes the terminology sucks.
**With the caveat that if AlphaGo is anything like other MCTS bots, it does not always make the move it evaluates most highly because normally in MCTS you play the move that has the largest number of simulations, not the highest value. On average this produces stronger play than playing the highest value move because a move that has a higher average value but many fewer simulations might wrongly have a higher value simply due to not having been searched as deeply to find the appropriate refutations, or because the policy net is confident enough that the move is not good that it's safer to ignore slightly higher values coming from rollouts or the value net. You can see this behavior all the time in Leela and Zen. It's also of course good to randomize a little to prevent being exploited.