Re: possible to improve AlphaGo in endgame
Posted: Tue Apr 25, 2017 12:01 am
I'm confused - is everyone just talking past one another? I think most of what people have said on both sides is right, if not literally then at least when interpreted charitably or in good faith.
* Bill is obviously right that strong Go programs are not computing or maximizing the "probability of winning the game". For MCTS playouts, the closest human thing to compare might be something like "probability of winning if both players play weird drunk mid-kyu-level blitz", but better at good shape and worse at tactics. For the value net, it might be "probability of winning if both players play weird drunk mid-dan-level blitz" but better at both good shape and counting and worse at tactics. Neither comparison is perfect, but is close enough to be useful. We can also neglect sampling error - with millions of playouts "drunk blitzness" bias by far dominates any sampling error.
* Bill is again right that computer programs are almost certainly making mistakes even from a practical-probability-of-winning perspective, because they are at least some of the time giving up points for no gain where they have not read out the rest of the game to prove they won't need those points. That's how Zen and other bots lost some of their games in the past when they were around 5d to 7d - a blind spot causing overoptimism about some part of the board, which wouldn't matter since the bot was winning anyways. But then they would needlessly give up enough points on the rest of the board until the blind spot actually swung the game against them.
And from the other side...
* Multiple people are obviously right that playing the expected-point-difference-maximizing move is not the thing that maximizes real chances of winning the game. And experience on the part of computer Go programmers has taught that given the choice between maximizing playout win/loss probabilities vs pretty much any so-far-conceived notion of "expected point difference" it's better by far to maximize the the win/loss probabilities.
* Moreover, in practice the giving-up of points doesn't lose the game on its own, because it's always coupled with a misevaluation by the bot that it doesn't need those points, else it would have tried to keep them. In each case fixing the misevaluation will prevent those lost games just as surely, and in practice doing so is far easier. As long that remains true, programmers trying to improve their bot's strength in theory need *never* work on the silly endgame moves, since pushing the misevaluations closer to zero in frequency and severity and improving the bot's overall judgment about whether it might need the points will also suffice to push game-losses-due-to-silly-endgame arbitrarily close to zero. Since right now fixing silly endgame is very much not the best way to improve the bot's strength per unit of effort spent, programmers have quite sensibly not much spent much effort on it.
* Bill is obviously right that strong Go programs are not computing or maximizing the "probability of winning the game". For MCTS playouts, the closest human thing to compare might be something like "probability of winning if both players play weird drunk mid-kyu-level blitz", but better at good shape and worse at tactics. For the value net, it might be "probability of winning if both players play weird drunk mid-dan-level blitz" but better at both good shape and counting and worse at tactics. Neither comparison is perfect, but is close enough to be useful. We can also neglect sampling error - with millions of playouts "drunk blitzness" bias by far dominates any sampling error.
* Bill is again right that computer programs are almost certainly making mistakes even from a practical-probability-of-winning perspective, because they are at least some of the time giving up points for no gain where they have not read out the rest of the game to prove they won't need those points. That's how Zen and other bots lost some of their games in the past when they were around 5d to 7d - a blind spot causing overoptimism about some part of the board, which wouldn't matter since the bot was winning anyways. But then they would needlessly give up enough points on the rest of the board until the blind spot actually swung the game against them.
And from the other side...
* Multiple people are obviously right that playing the expected-point-difference-maximizing move is not the thing that maximizes real chances of winning the game. And experience on the part of computer Go programmers has taught that given the choice between maximizing playout win/loss probabilities vs pretty much any so-far-conceived notion of "expected point difference" it's better by far to maximize the the win/loss probabilities.
* Moreover, in practice the giving-up of points doesn't lose the game on its own, because it's always coupled with a misevaluation by the bot that it doesn't need those points, else it would have tried to keep them. In each case fixing the misevaluation will prevent those lost games just as surely, and in practice doing so is far easier. As long that remains true, programmers trying to improve their bot's strength in theory need *never* work on the silly endgame moves, since pushing the misevaluations closer to zero in frequency and severity and improving the bot's overall judgment about whether it might need the points will also suffice to push game-losses-due-to-silly-endgame arbitrarily close to zero. Since right now fixing silly endgame is very much not the best way to improve the bot's strength per unit of effort spent, programmers have quite sensibly not much spent much effort on it.