AlphaGo win %

lightvector · Post by **lightvector** » Mon Dec 11, 2017 4:55 pm

Bill Spight wrote: Unclear what that 2% difference means. It seems to be based upon the win rate of simulations, not upon the judgement of AlphaGo. But simulations of what?

Bill, I'm pretty sure that these are the judgment of AlphaGo. For AlphaGo Master, that would presumably be a weighted average of its value net evaluations and possibly monte-carlo rollouts. Exactly how it does the weighting is unknown since Deepmind never released many technical details about the versions of AlphaGo between AlphaGo Fan and AlphaGo Zero, but regardless, these values should be precisely the values AlphaGo would be using to decide on its move**.

To clarify the terminology: in MCTS, a "simulation" doesn't have to be a simulation of anything. It simply means one pass from the root down to a leaf, with the accompanying call to the value net and the monte-carlo rollout at the leaf, and incrementing all the statistics. And also, the "winrate" or "winning probability" is not directly a probability of anything - it is simply the weighted average of the rollout win/loss statistics and the value net's evaluation statistics at all the nodes in that subtree.

Yes the terminology sucks.

**With the caveat that if AlphaGo is anything like other MCTS bots, it does not always make the move it evaluates most highly because normally in MCTS you play the move that has the largest number of simulations, not the highest value. On average this produces stronger play than playing the highest value move because a move that has a higher average value but many fewer simulations might wrongly have a higher value simply due to not having been searched as deeply to find the appropriate refutations, or because the policy net is confident enough that the move is not good that it's safer to ignore slightly higher values coming from rollouts or the value net. You can see this behavior all the time in Leela and Zen. It's also of course good to randomize a little to prevent being exploited.

sorin · Post by **sorin** » Mon Dec 11, 2017 7:54 pm

Bill Spight wrote: Unclear what that 2% difference means. It seems to be based upon the win rate of simulations, not upon the judgement of AlphaGo. But simulations of what?

From the information on their website, my understanding is that they just let AlphaGo Master run its normal decision making process for each of the positions in the database (as if it was playing a game against an opponent from that position).
They let it run for 10 min in each position (a lot more than the regular 1-2 min during the match with Ke Jie for instance).

The "winning rates" they display in the teaching tool are the result of this extended analysis.

Bill Spight · Post by **Bill Spight** » Mon Dec 11, 2017 8:10 pm

lightvector wrote:
Bill Spight wrote: Unclear what that 2% difference means. It seems to be based upon the win rate of simulations, not upon the judgement of AlphaGo. But simulations of what?
Bill, I'm pretty sure that these are the judgment of AlphaGo. For AlphaGo Master, that would presumably be a weighted average of its value net evaluations and possibly monte-carlo rollouts. Exactly how it does the weighting is unknown since Deepmind never released many technical details about the versions of AlphaGo between AlphaGo Fan and AlphaGo Zero, but regardless, these values should be precisely the values AlphaGo would be using to decide on its move**.

To clarify the terminology: in MCTS, a "simulation" doesn't have to be a simulation of anything. It simply means one pass from the root down to a leaf, with the accompanying call to the value net and the monte-carlo rollout at the leaf, and incrementing all the statistics.

So those "simulations" are not exactly "independent", eh?

And also, the "winrate" or "winning probability" is not directly a probability of anything - it is simply the weighted average of the rollout win/loss statistics and the value net's evaluation statistics at all the nodes in that subtree.

That has been clear for some time. As I have said, I would be willing to bet using those "probabilities" in actual games, as long as I could decide which side to bet on.

Yes the terminology sucks.

Not only that, it is misleading, as you can see when ordinary people and go players respond to the pronouncements about AlphaGo.

Uberdude · Post by **Uberdude** » Mon Dec 11, 2017 11:48 pm

When I say win % please interpret that as shorthand for "ill-defined goodness metric".

Bill Spight · Post by **Bill Spight** » Tue Dec 12, 2017 2:07 am

Uberdude wrote:When I say win % please interpret that as shorthand for "ill-defined goodness metric".

Like, "Three out of four doctors recommend . . . ."

I suppose that sorin's explanation is correct, that the percentages are the result of running Master's evaluation for 10 min. instead of 1 min. But then that leaves open the question of the significance of the kakari having a 2% higher evaluation than the pincer. Is the kakari objectively better in some sense, while still open to error, or does Master simply like kakaris better than pincers? It is obvious that Master plays pincers less often than humans. There are not enough published Zero games to say much, but it appears that Zero pincers when Master does not. And Zero is the better player. Maybe Zero would prefer the pincer by 2%. Quien sabe?

If the evaluations were actually probability estimates, then it would be possible to provide error estimates, as well. But since they are not actually probability estimates, there is nothing to compare them with to tell us what the errors may be.

moha · Post by **moha** » Tue Dec 12, 2017 3:21 am

Bill Spight wrote:the significance of the kakari having a 2% higher evaluation than the pincer. Is the kakari objectively better in some sense, while still open to error, or does Master simply like kakaris better than pincers? It is obvious that Master plays pincers less often than humans. There are not enough published Zero games to say much, but it appears that Zero pincers when Master does not. And Zero is the better player. Maybe Zero would prefer the pincer by 2%. Quien sabe?

Maybe, but I think Zero is better mostly because of its huge speedup and the resulting deeper search. My impression was - which seem to match Redmonds' and other reviewers so far - that Zero shows incredible tactical accuracy that even Master is unable to keep up with sometimes.

Anyway, is there really a practical difference between A being better than B in some sense, and a strong teacher likes A better? (as long as no perfect solution or stronger teacher is available) Those 2% are partly backed by Master's value net which you seem to respect, and its compressed experience of selfplays from similar positions. Actually, since these are mostly opening positions with nearly empty boards, there should have been several selfplay lines starting from the exact positions in question. So the 2% corresponds for the case Master vs. Master - which may differ from Pro vs. Pro, or Zero vs. Zero. If Zero likes to pincer that only means it has higher (actual) winrate in the latter case.

If the evaluations were actually probability estimates, then it would be possible to provide error estimates, as well. But since they are not actually probability estimates, there is nothing to compare them with to tell us what the errors may be.

I'm not sure if this is what you mean, but since these are binary estimates, I don't think a separate error estimate is necessary - variance or (in)confidence can be derived from the estimate itself (p*(1-p)). There is no real difference between saying that winrate is 51% or it is 99% with high error or deviation (it can only be really 99% if the errors are expected to be low).

Bill Spight · Post by **Bill Spight** » Tue Dec 12, 2017 7:32 am

moha wrote:
Bill Spight wrote:If the evaluations were actually probability estimates, then it would be possible to provide error estimates, as well. But since they are not actually probability estimates, there is nothing to compare them with to tell us what the errors may be.
I'm not sure if this is what you mean, but since these are binary estimates, I don't think a separate error estimate is necessary - variance or (in)confidence can be derived from the estimate itself (p*(1-p)). There is no real difference between saying that winrate is 51% or it is 99% with high error or deviation (it can only be really 99% if the errors are expected to be low).

If the evaluations were actual win rates, such as in pure MCTS, then we could do as you say. The win rates would also be of something, such as purely random play on the one hand, or policy net vs. policy net on the other. But we cannot do the same thing with estimates. For instance, suppose that a coin flip is rigged so that heads comes up 75% of the time, but I estimate the probability of heads as 50%. You cannot use 0.5(1 - 0.5) to estimate the error of my estimate.

Claims have been made that humans cannot identify errors by AlphaGo because it thinks differently from humans, in particular that it uses probabilities. For instance: "That play may lose points locally, as humans evaluate points, but it actually increases the probability of winning the game by comparison with that other play." That's BS. First, there is no such thing as the probability of winning the game (except 0 or 1, OC). There are probabilities of winning the game between two players with imperfect knowledge, and other characteristics, such as Gu Li vs. Ke Jie. But AlphaGo's evaluations are not actually probabilities, even though they are expressed as such. So what the previous statement actually means is this: "That play may lose points locally, as humans evaluate points, but actually AlphaGo evaluates it more highly than that other play." When AlphaGo evaluates one move as 2% better than another, we have no idea how often AlphaGo is wrong about which move is better.

moha · Post by **moha** » Tue Dec 12, 2017 1:50 pm

Bill Spight wrote:
moha wrote:I'm not sure if this is what you mean, but since these are binary estimates, I don't think a separate error estimate is necessary - variance or (in)confidence can be derived from the estimate itself (p*(1-p)). There is no real difference between saying that winrate is 51% or it is 99% with high error or deviation (it can only be really 99% if the errors are expected to be low).
If the evaluations were actual win rates, such as in pure MCTS, then we could do as you say. The win rates would also be of something, such as purely random play on the one hand, or policy net vs. policy net on the other. But we cannot do the same thing with estimates. For instance, suppose that a coin flip is rigged so that heads comes up 75% of the time, but I estimate the probability of heads as 50%. You cannot use 0.5(1 - 0.5) to estimate the error of my estimate.

Claims have been made that humans cannot identify errors by AlphaGo because it thinks differently from humans, in particular that it uses probabilities. For instance: "That play may lose points locally, as humans evaluate points, but it actually increases the probability of winning the game by comparison with that other play." That's BS. First, there is no such thing as the probability of winning the game (except 0 or 1, OC). There are probabilities of winning the game between two players with imperfect knowledge, and other characteristics, such as Gu Li vs. Ke Jie. But AlphaGo's evaluations are not actually probabilities, even though they are expressed as such. So what the previous statement actually means is this: "That play may lose points locally, as humans evaluate points, but actually AlphaGo evaluates it more highly than that other play." When AlphaGo evaluates one move as 2% better than another, we have no idea how often AlphaGo is wrong about which move is better.

I think Master's winrates (should) correspond well to an experiment of running a thousand selfplay games from that point. That's how they were derived or trained towards. So if it rates a move (esp. a well explored opening move) 2% higher, that means it will win a few more games (against itself) with that move than the other. It doesn't mean the move is objectively better, Zero may win a few less games (against itself) with the same move than the other. So there is a specific meaning behind the numbers, a reference they can be compared to, and their accuracy can actually be verified with a copy of Master.

Bill Spight · Post by **Bill Spight** » Tue Dec 12, 2017 3:15 pm

moha wrote:I think Master's winrates (should) correspond well to an experiment of running a thousand selfplay games from that point. That's how they were derived or trained towards. So if it rates a move (esp. a well explored opening move) 2% higher, that means it will win a few more games (against itself) with that move than the other.

Well, they did not do that, did they?

moha · Post by **moha** » Tue Dec 12, 2017 6:20 pm

Bill Spight wrote:
moha wrote:I think Master's winrates (should) correspond well to an experiment of running a thousand selfplay games from that point. That's how they were derived or trained towards. So if it rates a move (esp. a well explored opening move) 2% higher, that means it will win a few more games (against itself) with that move than the other.
Well, they did not do that, did they?

You mean in this book? Here the winrates are likely just Master's estimates after a long search. But they did play a huge number of selfplay games, during the selfplay cycle that was similar to Zero's. And the actual win percent there was the target the value network was trained towards.

I think the network's estimate should be pretty close to those past statistics on these near-empty boards (the network saw several games starting from these exact positions). I would only expect significant differences (experimental win% <> network estimate) in late game, unique and complex positions (where the network need to do generalization instead of just recalling observed statistics). I think those are the cases Master needed rollouts for.

Bill Spight · Post by **Bill Spight** » Tue Dec 12, 2017 9:57 pm

moha wrote: I think the network's estimate should be pretty close to those past statistics on these near-empty boards (the network saw several games starting from these exact positions).

Even assuming that, the estimates are still second best by comparison with the actual self play win rates.

moha · Post by **moha** » Wed Dec 13, 2017 6:19 am

Bill Spight wrote:
moha wrote:I think the network's estimate should be pretty close to those past statistics on these near-empty boards (the network saw several games starting from these exact positions).
Even assuming that, the estimates are still second best by comparison with the actual self play win rates.

You are right, but given time constraints the closest they could do is releasing their complete selfplay database, for direct analysis like pro databases.

Paradoxically, doing these long 10 minute searches actually resulted in less accurate winrates, at least for the earliest nodes. That's because a deep search minimaxes the result up from somewhere like 20 plies deep. And not only value net is more accurate earlier, but this also means those 20 moves were traversed in a lower quality way than during selfplay.

OC for practical use none of this matters besides the winrate differences between Ama vs. Ama and Master vs. Master.

yoyoma · Post by **yoyoma** » Wed Dec 13, 2017 8:59 am

moha wrote:
Bill Spight wrote:
moha wrote:I think the network's estimate should be pretty close to those past statistics on these near-empty boards (the network saw several games starting from these exact positions).
Even assuming that, the estimates are still second best by comparison with the actual self play win rates.
You are right, but given time constraints the closest they could do is releasing their complete selfplay database, for direct analysis like pro databases.

Paradoxically, doing these long 10 minute searches actually resulted in less accurate winrates, at least for the earliest nodes. That's because a deep search minimaxes the result up from somewhere like 20 plies deep. And not only value net is more accurate earlier, but this also means those 20 moves were traversed in a lower quality way than during selfplay.

OC for practical use none of this matters besides the winrate differences between Ama vs. Ama and Master vs. Master.

I don't think it is less accurate. It doesn't do strict minmax, it's averaging all results of simulations down a certain branch.

I don't think anyone wants to try to argue that AlphaGo will play worse if you let it think longer?

moha · Post by **moha** » Wed Dec 13, 2017 12:03 pm

yoyoma wrote:
moha wrote:Paradoxically, doing these long 10 minute searches actually resulted in less accurate winrates, at least for the earliest nodes. That's because a deep search minimaxes the result up from somewhere like 20 plies deep. And not only value net is more accurate earlier, but this also means those 20 moves were traversed in a lower quality way than during selfplay.
I don't think it is less accurate. It doesn't do strict minmax, it's averaging all results of simulations down a certain branch.

The search method doesn't matter, it's about effort. A 10 minute search will always be inferior to months of selfplay from a given position.

I don't think anyone wants to try to argue that AlphaGo will play worse if you let it think longer?

Only for the first 1-2 moves, as I wrote (and not proportionally to thinking time). But from the empty board, an 1-move lookahead with value net only should give almost exact winrates as seen during selfplay. Doing a few minutes of search can only make this value less accurate and more biased. Like in chess: playing with and from an opening book vs runtime search. OC, after a few moves this doesn't hold anymore.

mitsun · Post by **mitsun** » Wed Dec 13, 2017 2:33 pm

Do we really want perfect evaluations of a position? If AlphaGo were perfect, it could simply say "move A wins", asssuming perfect play from both sides through the rest of the game. There would be no probabilities or confidence ranges.

Instead AlphaGo can only say "move A wins with probability p", assuming play at AlphaGo level from both sides through the rest of the game. This actually seems more useful for human purposes. Yes, there is some uncertainty involved, but also some useful discrimination between moves.

Humans like to evaluate moves on a continuous scale (imperfect play), even though the underlying truth (perfect play) is binary. If AlphaGo becomes much stronger, the reported probabilities will skew more toward the extremes, with fewer moves close to 50%. This might be less instructive for humans.

Life In 19x19

AlphaGo win %

Re: Let's study AlphaGo's opening book

Re: Let's study AlphaGo's opening book

Re: Let's study AlphaGo's opening book

Re: Let's study AlphaGo's opening book

Re: Let's study AlphaGo's opening book

Re: Let's study AlphaGo's opening book

Re: Let's study AlphaGo's opening book

Re: Let's study AlphaGo's opening book

Re: Let's study AlphaGo's opening book

Re: Let's study AlphaGo's opening book

Re: Let's study AlphaGo's opening book

Re: Let's study AlphaGo's opening book

Re: Let's study AlphaGo's opening book

Re: Let's study AlphaGo's opening book

Re: AlphaGo win %