Here's a trace of 10,000 playouts (SGf summary only, the CSV files for this many playouts are huge!) I'll give you three versions:
- LZ-258 from the starting position: it spends most of its time exploring B5, and only gives 43 playouts to G3. (Playing with different random number seeds, I can persuade it to give over 60 visits to G3, but the evaluations don't change much.)
- LZ-258 analysing the position after G3 has been played.
- LZ with the ELF network from the starting position. It still looks at B5 first, but starts paying serious attention to G3 after about 2,000 playouts, and looks at G3 exclusively from playout 4,319 onwards.
Remember that the difference between LZ-258 and LZ-ELF here is only the network. It's still the same software running the same search algorithm. So how do the different networks affect the choice of B5 or G3?
The policy values tell us that both networks will look at B5 first. But that's not the full story. LZ-258 gets to G3 at playout 93, and on a first look thinks that it's a 60% winrate for white, not that much different from the initial 59% for B5. But after a few playouts, G3 starts to look worse, so LZ-258 gives up on it. LZ-ELF starts off similarly, looking at G3 from playout 21, and scoring G3 at 46% for white compared with 49% for B5, again in the same ballpark. But on more playouts, LZ-ELF rates B5 as getting a lot worse for black, while the rating for G3 doesn't change much. So the difference isn't between those two positions specifically, but further down the tree.
Looking at the variations after B5, I haven't yet found a massive difference between 258 and ELF. But here's something interesting for the G3 variations:
$$Bc
$$ | . . . . . . . . . .
$$ | . . . O O O X . . .
$$ | a O 6 O X X X X . .
$$ | . 5 X X O O . . . .
$$ | b . X O . O . O X .
$$ | . . X O O . 1 . X .
$$ | . 3 2 4 . . . . . .
$$ | . c . . . . . . . .
$$ +--------------------
- Click Here To Show Diagram Code
[go]$$Bc
$$ | . . . . . . . . . .
$$ | . . . O O O X . . .
$$ | a O 6 O X X X X . .
$$ | . 5 X X O O . . . .
$$ | b . X O . O . O X .
$$ | . . X O O . 1 . X .
$$ | . 3 2 4 . . . . . .
$$ | . c . . . . . . . .
$$ +--------------------[/go]
Both networks have
a as the first instinct. But it's not actually a good move. ELF spends 15 playouts on it, and then gives 994 playouts to
c (and has another look at
a later on). Meanwhile it's been spending a lot of time on B5 at the start, so it's not until around playout number 3,000 that
c is seen to be clearly better than
a in this diagram.
LZ-258 gives 18 playouts to
a, but then its second choice is
b, which also doesn't work well (and is discarded after four playouts). Starting from the initial position, LZ-258 will never even look at
c (the policy value is around 1%). By playout number 5,000 it's given up on G3 (at least for the time being; it will come back by playout number 100,000 if you let it run that long). Maybe this is a little blind spot in LZ-258's network that skews the evaluations slightly. Not the full story, there are still a lot of other variations to look at.
Finally, G3 in the initial position has a policy value of around 7% for LZ-258, or around 10% for ELF. Does that 3% difference have a big influence on the search? I suspect not -- we've seen examples of where 1% moves can get a lot of playouts if the evaluations are promising -- but it's hard to say for sure (I've tried scribbling down some equations based on the UCT formulae, and the maths gets pretty messy). I might come back to that later. Or I might get distracted by something else...