Re: Why humans fail against AI
Posted: Fri Aug 24, 2018 4:20 am
In MCTS (and in the variants used by AlphaGo Zero etc.) the move to play is generally chosen by highest number of visits (technically in AlphaGo Zero etc. these are not rollouts, since they aren't played out to the end of the game). Since it is always visiting the most promising move, this usually matches pretty well with the winrate, and it avoids situations where some other move suddenly looks good at the last second but there isn't time to verify it to a necessary degree of confidence. On the other hand, you can run into the converse issue where the move you've been spending all our time on suddenly gets refuted by something you see further down, and there isn't time for an alternative move to bubble to the top. People on the Leela Zero team have been thinking about this.John Fairbairn wrote:LZ, via Lizzie, actually expresses a move's evaluation in a two-dimensional way. There is winrate and a figure that seems to mean something like number of rollouts. I have no idea how important each of these factors is relative to each other but LZ seems to think rollouts is very important because it will sometimes choose a move with a low winrate but high rollouts over one with a higher winrate and low rollouts.
For both go engines and chess engines, 1) they perform better if you let them search deeper, and 2) the "principal variation" (PV) returned by the engine makes less and less sense as it goes on (because it's spending less time there). Chess players know not to pay much attention to the later moves of the PV. I think one issue in go is that because the variation is displayed graphically instead of in notation it's harder to avoid paying attention to the long tail of it. I think Lizzie would benefit from showing only the variation only as far as it is being visited some reasonable number of times.There is a related point you make: that margins of error multiply as you go deeper into a search. I can "see" that but I can't relate that to the other thing I "see", which is that chess programs generally perform better the deeper they search.
What is new about AlphaGo Zero is that its instinct (one visit and one pass through the neural net) is constantly being trained to match its calculation (the results from doing a more extended tree search). So it is learning to absorb its search into its intuition, which in turn enhances its search.I have suspected from the very beginning, and still believe, that the bots out-perform humans in go mainly because they search better and so make fewer crunching mistakes (and for that reason - i.e. it's the last mistake that decides the game - all the "research" into josekis and fusekis is somewhat misguided). AlphaGo Zero seems to have upset the apple cart, in chess as well as go, but until shown otherwise I will still believe that it is ultimately just searching better (not just deeper but perhaps because it makes a better selection of candidate moves and prunes better).
It certainly would be, but see below about "multiplication of the margin of errors".So if a new version of AlphaGo came along with the same policy network but had the hardware to do an even deeper search, I'd initially expect that to be even stronger - notwithstanding the multiplication of the margin of errors that it would surely still be making.
I think it is dangerous to mix 1) the error in the winrate at the root node of the search, which certainly goes down as more search is done and 2) the error in the PV, which will go up the farther into the sequence we go, because less and less energy is being expended there. It is true that very accurate moves can be made at the root of the tree despite not spending lots of time at the leaves, but this has been true since the very first chess computer programs. Human players do the same.Is there some way the margin of error in talking about the margin of error cancels out the original margin of error? In a vague way that seems to be why Monte Carlo search works.
It's not that the leaf errors get bigger as you search more (one visit/network evaluation is one visit, whether or not it's one ply into the future or twenty), it's that the root error gets smaller.