Some Questions about Lizzie/LZ

Pippen · Post by **Pippen** » Wed Mar 27, 2019 9:47 am

dfan wrote:As Bill says, once it realizes it couldn't possibly be the best move, it doesn't have any incentive to go back and calculate exactly how bad it is.

Ok, that makes sense, but isn't it weird for us users? If I see a move suggestion with 80+ win probability, even in red, i.e. not prefered by LZ, I think it must be still very well be playable. There are plenty of examples where such moves are indeed playable. But it's good to keep that in mind: if it's 'red' it's 'at your own risk'.

moha · Post by **moha** » Wed Mar 27, 2019 9:56 am

As the third most visited move it was suggested by the policy net at least. I doubt an ELF net would consider it for example. But LZ has a specific defect in its training wrt some ataris and move legality, which may be in work here.

Uberdude · Post by **Uberdude** » Wed Mar 27, 2019 10:05 am

To get a more accurate assessment of a move, play it and let LZ think for a while (e.g. until 4k visits like the #1 move had before). You will likely see that 80% drop pretty fast. But yes that LZ can have super-human instinct at 10 playouts in some positions but atari blindness a 25k won't (often) do at >100 in others is the wonders of neural networks

. See https://www.lifein19x19.com/viewtopic.p ... 13#p242513 for another example of atari blindness.

dfan · Post by **dfan** » Wed Mar 27, 2019 10:17 am

For better or worse, Leela Zero is designed to play as well as possible, not to teach people how to think about positions. Certainly one could argue, and I would agree, that it has accomplished the first goal well enough that it makes sense to transfer a lot of our energy (as programmers and researchers) to the second.

Bill Spight · Post by **Bill Spight** » Wed Mar 27, 2019 10:50 am

Pippen wrote:
dfan wrote:As Bill says, once it realizes it couldn't possibly be the best move, it doesn't have any incentive to go back and calculate exactly how bad it is.
Ok, that makes sense, but isn't it weird for us users? If I see a move suggestion with 80+ win probability, even in red, i.e. not prefered by LZ, I think it must be still very well be playable. There are plenty of examples where such moves are indeed playable. But it's good to keep that in mind: if it's 'red' it's 'at your own risk'.

You have to remember that the win rate estimates of moves with low playouts can be way, way off. The bots do not use win rates like humans want to do.

lightvector · Post by **lightvector** » Wed Mar 27, 2019 4:57 pm

Counting liberties on large groups turns out to be a thing that is pretty difficult for a neural net with current known best architectures.

Leela Zero sticks to wanting to a spirit of wanting to minimize use of high-level human features and heuristics (that's the "zero"), and so does not provide liberty counts to the neural net as input. If you do provide them as input, although a bot may still have blind spots for tesuji or status of groups, I can confirm that blind spots for liberty shortages go away entirely, at least as far as I can tell.

Bill Spight · Post by **Bill Spight** » Thu Mar 28, 2019 3:42 pm

lightvector wrote:Counting liberties on large groups turns out to be a thing that is pretty difficult for a neural net with current known best architectures.

I am reminded of the animals that can instantly distinguish between 5 and 6 objects, but cannot "count" any higher.

Leela Zero sticks to wanting to a spirit of wanting to minimize use of high-level human features and heuristics (that's the "zero"), and so does not provide liberty counts to the neural net as input. If you do provide them as input, although a bot may still have blind spots for tesuji or status of groups, I can confirm that blind spots for liberty shortages go away entirely, at least as far as I can tell.

Since dame count is important for playing ladders, does providing them to the net improve the ability to play ladders?

Also, if you can provide dame count as input, what about komi?

Bill Spight · Post by **Bill Spight** » Thu Mar 28, 2019 3:52 pm

Uberdude wrote:To get a more accurate assessment of a move, play it and let LZ think for a while (e.g. until 4k visits like the #1 move had before).

Having successfully downloaded the Elf commented GoGod files and looked at a couple of games, I suspect that we should not give any winrate generated by Elf with fewer than 4k visits or playouts to have any credence at all. For instance, suppose that a play seems to be an error that costs 4 percentage points in the winrate, but is based on a visit count of 900. In the main line sequence, where Elf is playing both sides, a few moves later, where we anticipate that the actual winrate has not changed by more than a fraction of a point, the winrate is 2.5% better, with a visit count of 32k. I would go with the loss of 1.5% instead of 4%.

(And 1.5% may well be within Elf's margin of error, anyway.

)

When I get time, I'll generate a few statistics on this question.

Edit: Actual example from Segoe 7 dan vs. Go Seigen 4 dan, GoGod 1932-01-01a. Black 113 has a winrate estimate of 41% with 76k playouts. Segoe's next move was surely a mistake. Elf recommends a play with a Black winrate estimate of 41 with 79k playouts. Segoe's move has a winrate estimate for Black of 46% with only 610 playouts. It would show up in the winrate graph as a 5% error. However, Elf recommended play for Black 115 has a winrate estimate of 54% with 18k playouts, and Go Seigen's actual play has a winrate estimate of 55% with 12k playouts. So I would take take the winrate as 54-55%, and Segoe's error as costing 13-14%, a rather larger error than 5%. (You see why I only take the winrate estimates to two digits.

)

lightvector · Post by **lightvector** » Thu Mar 28, 2019 4:37 pm

Bill Spight wrote:
lightvector wrote:Counting liberties on large groups turns out to be a thing that is pretty difficult for a neural net with current known best architectures.
I am reminded of the animals that can instantly distinguish between 5 and 6 objects, but cannot "count" any higher.

Leela Zero sticks to wanting to a spirit of wanting to minimize use of high-level human features and heuristics (that's the "zero"), and so does not provide liberty counts to the neural net as input. If you do provide them as input, although a bot may still have blind spots for tesuji or status of groups, I can confirm that blind spots for liberty shortages go away entirely, at least as far as I can tell.
Since dame count is important for playing ladders, does providing them to the net improve the ability to play ladders?

Also, if you can provide dame count as input, what about komi?

For ladders, no, not significantly. Neural nets don't have a problem with counting liberties on small groups, just the ones that are large and often sea-urchin-like. And the ability to evaluate ladders is still critically poor on the positions where they haven't been played out yet, or haven't been played out much, where all groups involved are small.

The hard part of a ladder for the neural net is not anything liberty-related, but rather "understanding" that the stones diagonally a long distance away across empty space could possibly be relevant. Also, unlike humans, the bot is perfectly happy to read the ladder in one variation, solve it (because given enough playouts, the search does still solve it), and then fail to understand the same ladder in another variation, and another, and another. For every one of the dozens or hundreds of variations in any position, the search has to solve the ladder yet again from scratch. Humans would simply read it once and understand when a move could change the ladder's result and require a reread, but there's no currently known good way to make a bot do that in current architectures that "fits in" to the current neural-net driven search. That's for future research.

I've also experimented with adding ladderability of stones as an input feature too.

The result is that the bot that never messes up common ladders (as far as I can tell) and has good evaluations for tactics that depend on them. The drawback is that it makes the bot weaker at solving rare positions like the Lee Sedol ladder game or that other Fine Art game where actually it's correct to chase a broken ladder across the board because the forcing moves gained will kill another group on that other side of the board, and where it requires actually chasing the ladder rather than simply playing a ladder breaker on that other side of the board directly. Because the bot, knowing that the ladder doesn't work, is less willing to spend reading effort on chasing the ladder out compared to a bot that has no idea if it works or not.

Since normal ladder situations are easily 100x or more common than situations where chasing broken ladders across the board is good (driving tesuji don't count, it's only the cases where you need to chase across a long distance that are tricky), for now I'm happy with this tradeoff, although I have some intuitions for future research on how one might try to get the best of both worlds.

Neural nets can easily handle a wide range of komi if komi is provided as an input and the training data contains a few percent of games with a wide range of komi. I do that in KataGo, and it works very well.

Bill Spight · Post by **Bill Spight** » Thu Mar 28, 2019 5:46 pm

Here is another example of a major problem with winrate estimates, even of Elf. It also comes from GoGoD 1932-01-01a, with comments by Elf.

Go Seigen's move, Black 121, has a winrate estimate of 51% with 25k playouts. Then Segoe's move, White 122, has a Black winrate estimate of 50% with 54k playouts. Neither play has any variations, so presumably those plays were also Elf's first choices. So far, so good.

Black 123 has four variations by Elf, each with a winrate estimate of 50% and playouts varying from 2k to 22k. The variation with 22k playouts does not have the comment, "Good", so it may not be Elf's first choice. I guess that Elf's first choice is Go Seigen's play, which has a winrate estimate of 58%

with 30k playouts, and Elf's recommendation for White 124 has a winrate estimate of 58% with 49k playouts. Hello! That's a jump of 8% for no apparent reason. (Something like the horizon effect, I suppose.) But what does that tell us about Elf's winrate estimates? Even with several thousand playouts?

If humans are going to make use of bots' winrate estimates, we need to know something about their margins of error.

Mike Novack · Post by **Mike Novack** » Fri Mar 29, 2019 8:54 am

lightvector wrote:
The hard part of a ladder for the neural net is not anything liberty-related, but rather "understanding" that the stones diagonally a long distance away across empty space could possibly be relevant.......

I've also experimented with adding ladderability of stones as an input feature too. The result is that the bot that never messes up common ladders (as far as I can tell) and has good evaluations for tactics that depend on them. The drawback is that it makes the bot weaker at solving rare positions like the Lee Sedol ladder game or that other Fine Art game where actually it's correct to chase a broken ladder across the board because the forcing moves gained will kill another group on that other side of the board.......

THAT is the real problem. The issue isn't just the outcome of a ladder, but that the ladder (it's potential) makes a large number of remote moves sente. Or as in the sort of example given, playing out the ladder IS sente against some remote group (so either the ladder or the group)

Pippen · Post by **Pippen** » Sat Apr 13, 2019 4:40 pm

A question about the diagram on the left side of Lizzie's interface. There is the x-axis which basically can be interpreted as total equal game, 50/50. But what about the one line below/above the x-axis? Why is it there? Does it mean that within that boundary the game is still open, outside basically lost? Because I realize that in lost games the graph usually exceeds this boundary by a lot.

dfan · Post by **dfan** » Sat Apr 13, 2019 7:29 pm

I don't think they have any particular significance; they're just points of reference.

Tryss · Post by **Tryss** » Sun Apr 14, 2019 7:59 am

I think it's the 75/25% line. It's arbitrary, but indeed, it's close to the "serious advantage" limit

Pippen · Post by **Pippen** » Sun Apr 14, 2019 8:00 am

Since LZ starts Black with only 46% chance I wonder at what value it thinks the game is still on equal terms. E.g. if you have a value of 35% do you still have chances in the eye of LZ or is it a lost game?

Life In 19x19

Some Questions about Lizzie/LZ

Re: Some Questions about Lizzie/LZ

Re: Some Questions about Lizzie/LZ

Re: Some Questions about Lizzie/LZ

Re: Some Questions about Lizzie/LZ

Re: Some Questions about Lizzie/LZ

Re: Some Questions about Lizzie/LZ

Re: Some Questions about Lizzie/LZ

Re: Some Questions about Lizzie/LZ

Re: Some Questions about Lizzie/LZ

Re: Some Questions about Lizzie/LZ

Re: Some Questions about Lizzie/LZ

Re: Some Questions about Lizzie/LZ

Re: Some Questions about Lizzie/LZ

Re: Some Questions about Lizzie/LZ

Re: Some Questions about Lizzie/LZ