Life In 19x19

Posted: **Tue Apr 25, 2017 12:01 am**

I'm confused - is everyone just talking past one another? I think most of what people have said on both sides is right, if not literally then at least when interpreted charitably or in good faith.

* Bill is obviously right that strong Go programs are not computing or maximizing the "probability of winning the game". For MCTS playouts, the closest human thing to compare might be something like "probability of winning if both players play weird drunk mid-kyu-level blitz", but better at good shape and worse at tactics. For the value net, it might be "probability of winning if both players play weird drunk mid-dan-level blitz" but better at both good shape and counting and worse at tactics. Neither comparison is perfect, but is close enough to be useful. We can also neglect sampling error - with millions of playouts "drunk blitzness" bias by far dominates any sampling error.

* Bill is again right that computer programs are almost certainly making mistakes even from a practical-probability-of-winning perspective, because they are at least some of the time giving up points for no gain where they have not read out the rest of the game to prove they won't need those points. That's how Zen and other bots lost some of their games in the past when they were around 5d to 7d - a blind spot causing overoptimism about some part of the board, which wouldn't matter since the bot was winning anyways. But then they would needlessly give up enough points on the rest of the board until the blind spot actually swung the game against them.

And from the other side...

* Multiple people are obviously right that playing the expected-point-difference-maximizing move is not the thing that maximizes real chances of winning the game. And experience on the part of computer Go programmers has taught that given the choice between maximizing playout win/loss probabilities vs pretty much any so-far-conceived notion of "expected point difference" it's better by far to maximize the the win/loss probabilities.

* Moreover, in practice the giving-up of points doesn't lose the game on its own, because it's always coupled with a misevaluation by the bot that it doesn't need those points, else it would have tried to keep them. In each case fixing the misevaluation will prevent those lost games just as surely, and in practice doing so is far easier. As long that remains true, programmers trying to improve their bot's strength in theory need *never* work on the silly endgame moves, since pushing the misevaluations closer to zero in frequency and severity and improving the bot's overall judgment about whether it might need the points will also suffice to push game-losses-due-to-silly-endgame arbitrarily close to zero. Since right now fixing silly endgame is very much not the best way to improve the bot's strength per unit of effort spent, programmers have quite sensibly not much spent much effort on it.

Posted: **Tue Apr 25, 2017 5:07 am**

I mostly agree with lightvector, and it's nice to see someone trying to find common ground instead of saying the same things over and over, but a few quibbles:

1st point I think you mixed up the policy and value network: policy is the one which chooses a move given a board position (similar to a human's shape intuition and pattern recognition), value is the one which says who is winning a board position (similar to human whole board positional judgement/counting). The value network is AlphaGo's innovation (since reproduced, but not as well by DeepZen, FineArt and others) others made policy networks before around mid dan amateur level (and DeepMind published a paper about theirs too before AlphaGo, plus hired some of the authors of previous ones). In the v13 AlphaGo paper there were charts showing how strong AlphaGo was with the various combinations of the 3 modules: Policy network, value network, and MCTS, and they will be quite a lot stronger since then. So MCTS plus policy network playouts could well be quite a lot better than mid-dan blitz now. Also in response to Robert's point about AlphaGo not reading (which we've had before), whilst I agree MCTS is not much like human reading, the requirement for reading to be perfect to count as reading is a strange use of the word: most words do not have an implied "perfect" adjective in front of them (or else I don't read). But with some tree search and a value network and no monte carlo rollouts, you could actually have a program that reads quite like a human: exploring a tree and judging who is winning those positions, without doing loads of semi-random playouts to the end of the game (which could use a policy network or not).

lightvector wrote:That's how Zen and other bots lost some of their games in the past when they were around 5d to 7d - a blind spot causing overoptimism about some part of the board, which wouldn't matter since the bot was winning anyways. But then they would needlessly give up enough points on the rest of the board until the blind spot actually swung the game against them.

Not just when Zen was 5-7d, but now when it is top pro level. It was winning versus Park Junghwan in the World Go Championship (by about komi according to Kim Jiseok) but lost some lead (probably the monte carlo problem of losing points if you still win) and then near the end even more (a combination of misevaluating some dead stones as in seki at the top (as in lightvector's last point), and problems with the komi and Chinese vs Japanese rules).

Posted: **Tue Apr 25, 2017 8:45 am**

Thanks, guys. Good points.

I talked about beating a dead horse because I think people now pretty much agree that top computer programs can make mistakes by believing that one play is superior to another, assessing the difference in terms of the probability of winning the game, and that humans can catch some of those mistakes by assessing the difference in terms of points.

But there still are those who claim that humans just don't understand how the top programs think, and while humans may think that they recognize some computer mistakes, the programs are better than they are, so the just don't really know.

I'd like to make two points. First, people do think in terms of the probability of winning. They just don't do it very well. Second, I'd like to make the case that assessing the chances of winning by point evaluation works better and better as the end of the game nears (at least at the strong amateur level and above), and that it is likely, at the moment, that human evaluation is better than computer evaluation at some point during the endgame.

uPWarrior wrote:It is known that {% of winning} produces stronger AIs, as it is maximizing the correct objective after all.

When uPWarrior says that the correct objective is the percentage of winning he is thinking probabilistically. We humans do that all the time. We consider an even game between two players of the same level to be a 50-50 proposition, while an even game between, say, a 1 kyu and a shodan means that the shodan will win about 2/3 of the time. If a move is a small gote, but there are much larger moves on the board and we cannot read the game out, we consider that Black will play the gote half the time and White will play it half the time. Or if there is a small sente for Black, and we cannot read the game out, we consider that Black will get to play it almost 100% of the time. But 30 moves into an even game, if you ask us what the probability is that Black will win, we are hard pressed to make an estimate.

I suspect that strong human players can be trained to make good probability estimates of winning the game. The reason is that gamblers made fair bets even before the invention of probability theory. The training would consist of having players make modest bets on the outcomes of top level games while the games were in progress. Over time, I expect that the players would learn to make fair bets.

BTW, it would be an interesting research project to see how well top programs assess the probability of winning the game, using pro game records. Have the programs assess the position after move 100, for instance, and compare the percentage of wins vs. the assessed probabilities. My impression is that the programs are more accurate after 100 moves than after 200 moves. That is, at the endgame stage they underestimate the chances of the winners. I think that by betting on the projected winner at the odds assessed by the program, you would clean up.

More later. Gotta run.

Life In 19x19

possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame

Re: possible to improve AlphaGo in endgame