Deep learning and ladder blindness

Tryss · Post by **Tryss** » Sat Aug 04, 2018 2:40 pm

Yes, you're right, I was mistaken

Features for policy/value network. Each position s was pre-processed into a set of 19×19 feature planes. The features that we use come directly from the raw representation of the game rules, indicating the status of each intersection of the Go board: stone colour, liberties (adjacent empty points of stone’s chain), captures, legality, turns since stone was played, and (for the value network only) the current colour to play. In addition, we use one simple tactical feature that computes the outcome of a ladder search⁷.

Source : https://storage.googleapis.com/deepmind ... ePaper.pdf (page 8)

Note that this is the orginial AlphaGo

ez4u · Post by **ez4u** » Sat Aug 04, 2018 3:50 pm

People interested in this topic should have a look at lightvector's work on github. Personally I may understand about 15% of it. But he has done a lot on ladders and other distant relationships. Quite interesting!

Bill Spight · Post by **Bill Spight** » Sat Aug 04, 2018 4:30 pm

dfan wrote:
Bill Spight wrote:
John Fairbairn wrote:It seems practicable to replicate that quite efficiently in a computer, and since it would only ever be triggered in an atari situation, it would be triggered relatively rarely and so would not significantly slow the machine down.
So one would think. Yet even when the programmer is a strong player or the programming team includes a strong player such modules have not been implemented in the top level bots. {shrug}
AlphaGo had two input features that checked for the presence of ladders in this way.

I stand corrected.

chut · Post by **chut** » Sat Aug 04, 2018 5:54 pm

Life/death of a group and ladder are related problems. We still see mightily strong bots fall flat on situations that are obvious to human. These are monkey wrench to the tree search because the minimax evaluation of a whole branch can become invalid if we push the search a bit deeper. That means there is an inherent uncertainty to the evaluated win rate of a branch.

I am wondering, what guides the tree search now? What decides which branch to go deep first? Human are guided by a meta level knowledge of such situations, and human professionals do read out ladder to great depth and detail. There is a famous game of Lee Sedol where he played out a failed ladder to his advantage (https://senseis.xmp.net/?LeeSedolHongChangSikLadderGame).

It does seem to me that we can't escape having meta level tree search guidance system. That is probably worthy of a deep learning project.

Tryss · Post by **Tryss** » Sat Aug 04, 2018 6:24 pm

I am wondering, what guides the tree search now? What decides which branch to go deep first?

A policy network. A bot like AlphaZero or LeelaZero has a neural network that gives his candidate moves, the choice of which one to explore depend on how much the policy network like it, the previous estimated winrate of the branch, and the number of previous explorations (to encourage exploration, there is a bonus to positions less explored).

Bill Spight · Post by **Bill Spight** » Sat Aug 04, 2018 8:56 pm

chut wrote:Life/death of a group and ladder are related problems. We still see mightily strong bots fall flat on situations that are obvious to human. These are monkey wrench to the tree search because the minimax evaluation of a whole branch can become invalid if we push the search a bit deeper. That means there is an inherent uncertainty to the evaluated win rate of a branch.

Well, yes, by design. Win rates other than 100% or 0% depend upon errors in play. But what errors? Has anybody published anything about winrate errors?

It does seem to me that we can't escape having meta level tree search guidance system. That is probably worthy of a deep learning project.

Can't what we already have be described as a meta level tree search guidance system? Tsumego, ladder reading, semeai, etc., involve local searches (although they may cover the whole board). Humans can integrate the results of local searches. Monte Carlo Tree Search is global. Before MCTS came along, I know that Martin Mueller experimented with having computer integration of local searches. MCTS was wildly successful, though, and dominates current thinking.

Humans typically evaluate games by estimating territory, and territory estimates adapt easily to different komis. I think that people are still experimenting with them instead of winrates, but before the advent of AlphaGo winrates worked better with MCTS. After all, the aim is to win the game, not to win it by a larger margin. After AlphaGo you get the bullshit about how humans can't think in terms of probabilities. Well, yes, you either want the probability of winning the game, or, using fuzzy logic, the degree to which the current position, plus having the move, belongs to the set of won games. Fuzziness and probability are different kinds of uncertainty.

In theory, a territory estimate is not in general enough to estimate whether a game is won or not. You also need a parameter called temperature. The temperature at the start of the game is around 14 pts., and it diminishes, with ups and downs, over the course of the game. It is possible to estimate it. And the global temperature is the same as the maximum of the local temperatures. (Although we may want to use a different definition which is insensitive to temporary increases.) Territory and temperature are parameters that may enable us to combine local searches. Furthermore, it is in theory possible to utilize the estimates of temperature and territory in combination to come up with a fuzzy estimate of winning or losing. In fact, currently we can say that if Black is ahead by an estimated 12 pts. on the board, with a 7.5 komi, and both players play perfectly, Black has a nearly won game, even if White has the move. (My estimate is better than 80% won, which is not an 80% winrate, BTW.) OC, nobody plays perfectly, so we still need to develop error measures. (Note that, unlike probability, fuzziness produces uncertainty even if we assume perfect play.)

However, the theory behind MCTS is already well developed. Fuzzy logic has proven itself in other applications, and we may see a theoretical or practical breakthrough in the future which will allow us to apply fuzzy logic to build better go bots.

Edited for correctness and clarity.

Mike Novack · Post by **Mike Novack** » Sun Aug 05, 2018 5:28 am

There is more to this than "ladder blindness". Go is difficult. The problem of apparent blindness to the consequences of EXISTING ladders (that seem oh so obvious to much weaker human players) ignores that there is a more general problem, the consequences of POTENTIAL ladders, and those are not at all obvious to weaker human players.

Ladders are "in play" in games between strong human players even though we do not see those ladders manifest in the games. The point I am making is the fact that some local situation might be good or bad (depending on who would win a ladder beginning there) makes sente any number of moves along the route of that potential ladder. Moves that might be minor local losses in this other local area.

A human learning go is doing it "step by step". The beginner learns about ladders, how to determine the outcome based on stones already on the board. Only much later comes learning about how the threat of a ladder makes remote moves sente. The neural net learning "from zero" has to learn to solve the entire problem all at once.

In "problem solving" a useful skill is being able to figure out where a complex problem can be usefully broken into component parts. Humans who are good at problem solving are good at this.

Life In 19x19

Deep learning and ladder blindness

Re: Deep learning and ladder blindness

Re: Deep learning and ladder blindness

Re: Deep learning and ladder blindness

Re: Deep learning and ladder blindness

Re: Deep learning and ladder blindness

Re: Deep learning and ladder blindness

Re: Deep learning and ladder blindness