Exploring LZ's search algorithm: worked examples

For discussing go computing, software announcements, etc.
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: Exploring LZ's search algorithm: worked examples

Post by Bill Spight »

Kirby wrote:if there's one optimal move in the game tree, both approaches, in worst case, find the result after looking through the entire tree. but at least with bfs, you don't have to keep the moves you've already explored in your head.
It is pretty clear that humans prefer depth first search because we have a short stack. With breadth first search we have to keep track of unvisited nodes in short term memory. It also appears that go professionals have sculpted their brains at an early age to increase their short term memory to include a workspace for go.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
xela
Lives in gote
Posts: 652
Joined: Sun Feb 09, 2014 4:46 am
Rank: Australian 3 dan
GD Posts: 200
Location: Adelaide, South Australia
Has thanked: 219 times
Been thanked: 281 times

Re: Exploring LZ's search algorithm: worked examples

Post by xela »

Thanks for all the interesting comments! I'll reply to some of them later, but first, back to LZ's way of reading, and the tracing file (CSV attachment) from post number 6 above.

I'll walk through the first few playouts in a lot of detail, to recap how Monte Carlo tree search (MCTS) works. (And also to expose the gaps in my own knowledge, so that the real experts can teach me something.) Then I'll back out and look at the "big picture".

Remember that each playout goes through one cycle of explore-expand-score-update (different descriptions of MCTS use different names for the phases).
  • Explore: using the tree of variations that you've built from previous playouts, walk down the tree one node at a time, choosing the "most promising" node at each step, until you reach a leaf.
  • Expand: list all the legal moves from the position you've reached, and add them to the tree as new childe nodes.
  • Score: evaluate the new children and decide which one looks like the "best" move. (In old-school MCTS, you'd do this with random playouts, adding moves until you reach the end of the game and can see who's won. Since AlphaGo Zero, we use neural net evaluations instead.)
  • Update: go back up the tree the same way you came down, updating the value at each node in the light of the scores you just calculated. Also keep a count of how many times each node has been visited: this is part of how you calculate "most promising" when exploring.
Here's how I've represented it in the CSV file. I've put the first few lines behind the cut because it might be too wide for some screens:

Code: Select all

playout  depth  operation   move  visits  lcb       value     fpu_eval  winrate    puct      policy  move2  visits2   value2    winrate2  puct2     policy2
1        0      initialise  pass  1                 0.440006
1        1      explore     P18   0                 0.440006  0.440006  0.440006  0          0.449631  P2      0      0.440006  0.440006  0         0.29011
1        1      update      P18   1       -999999   0.301081
1        0      update      pass  2       -4126.29  0.569462
2        1      explore     P18   1                 0.781476  0.27237   0.698919  0.0825576  0.449631  P2      0      0.378905  0.27237   0.106535  0.29011
2        2      explore     O18   0                 0.301081  0.301081  0.301081  0          0.438193  O19     0      0.301081  0.301081  0         0.32953
2        2      update      O18   1       -999999   0.444114
2        1      update      P18   2       -4061.15  0.428483
2        0      update      pass  3       -18.6407  0.52768
Line 1: initialise. The root node (starting position) gets one playout, and a value. Without reading at all, the network thinks black's winrate is 44%.

Next line: first playout, first node. In the file I've shown how LZ chooses between P18 and P2, its top two choices. (In fact, it does the same calculations for all legal moves, but I didn't want to make the file too big.) There's quite a few numbers here. Reading from left to right:
  • value is winrate plus puct
  • fpu_eval is a number based on "first play urgency". I'll say a little more about that below. It's the same number for all nodes, but as the playouts progress, the fpu number will go down. (At the very beginning of the search, the first fpu_eval is exactly equal to the value of the root node. In general it's the network value of the parent node minus an adjustment depending on which other nodes have been visited.)
  • winrate is equal to the fpu_eval the first time LZ looks at a node, and later on it gets updated based on the network evaluation.
  • puct is the "exploration factor". More detail later
  • policy is the network policy value. This is the only number that doesn't change on repeated visits. It feeds into the puct value.
  • And then the same set of numbers for the second choice move
So here, P18 has a value of 0.44 (winrate inherited from the parent node because it's the first visit, and no puct adjustment). And P2 (and all the other moves) also has the same value of 0.44. So we use the policy value as a tie-break: P18 wins with 45% policy, while P2 has 29%. This finishes the first iteration of explore-expand-score.

Next line: first playout, update for P18. The visit count is now 1. The value gets updated, and some mysterious calculations produce 0.30. (I'm a bit hazy on the details here. But as you look at a bunch of playouts, you'll get the general idea). This value is from the perspective of white to play, so 0.30 means that, at first glance, black is ahead after playing P18. There's also a "lower confidence bound" (LCB). On small numbers of playouts, these numbers are pretty meaningless, but later the LCB will be just a little bit smaller than the value.

Next line: first playout, going back up the tree, update for root node. Visit count for the root is 2 (it got one visit for initialising, then a second visit for the playout). Magic calculations give us an updated value of 0.57.

Next line: second playout, first node. In the initial position, LZ will assess the scores of all legal moves and decide which is most promising. We've already done this for playout number 1, but as the visit counts change, some of the numbers change. The fpu_eval has gone down from 0.44 (first playout) to 0.27. For P18, it's already been visited, so we ignore the fpu_eval and instead use the number from the first playout. Remember it got updated to 0.30 from white's perspective: that's 0.70 (or 0.6989 rounded up) from black's perspective. And because the parent node has been visited, there's now an exploration bonus puct=0.08. Winrate + puct equals value of 0.78 for P18.

Same line, second choice move: P2 is still univisited, so its winrate is the fpu_eval of 0.27. But there's also a puct bonus of 0.11: P2 gets a higher bonus than P18, because it's had fewer visits. Total value for P2 is 0.38, lower than P18, so P18 is still first choice.

Digression: if you scroll down the CSV file, looking at explore depth 1 for each playout, you'll see the puct number for P2 getting higher and higher the longer it's ignored, until at playout number 14, P2 overtakes P18 as the first choice.

Next line: you'll see similar calculations for replies to P18. Because none of the replies have been visited yet, they all get the fpu_eval number, and we pick the highest policy move.

Next three lines: back up the tree, update the visit counts and values, calculate the LCBs (still not realistic, three visits isn't enough).

And you see how the tree grows, and starts to branch after 14 playouts.

Technical details of fpu and puct calculations behind the cut, for those who are interested.
I'll start with puct because I think I understand that part better. UCT is upper confidence bounds for trees. Google will show you a lot of research papers and blog posts with all the theory. Then PUCT is polynomial UCT, a modification to the basic UCT formula.

The basic UCT formula is u = constant times psa times square root (sum of visit counts for current node and all siblings) divided by (1 + visit count for current node).

What does psa stand for? I spent a while wondering why the LZ source code is full of public service announcements. But no, psa is P(s,a), the probability in game state s of action a.

So if you keep visiting the same node, then the bottom line of the fraction increases faster than the top, so the u value goes down: the exploration bonus for a node gets smaller the more you've already explored that node. But if you keep ignoring a node and visiting its siblings instead, the top line of the fraction increases and the bottom doesn't: the u value goes up (and there's no upper limit), so eventually, given enough visits, you will explore every node.

PUCT adds some extra terms to the top line of the fraction, but it's the same basic idea.

First play urgency (FPU) is a bit more mysterious. It's mentioned on a few web pages, but I've not seen it in the research literature, so I'm not sure if it was a "secret sauce" part of AlphaGo that didn't make it into the Nature paper, or if it was invented as part of LZ, or if it comes from somewhere else.

The formula is simply fpu_eval = network eval (i.e. winrate from neural net) - constant times sqrt (sum of policies for all children that have been visited). This is calculated for the parent node and then applied to all child nodes. So if no children have been visited yet, the second term is zero, and fpu of child = network eval of parent. But as more different nodes get visited, the fpu_eval goes down. The second part of the formula is referred to as "fpu reduction", because fpu means that you reduce the initial winrate estimate.

In the CSV file, if you look at the fpu_eval at depth 1, you'll see how it starts at 0.44 (equal to initial value of root node), goes down to 0.27 on the first playout (as soon as P18 has been visited), stays at 0.27 for the first 14 playouts (because P18 is being revisited every time), and goes down to 0.22 on playout 15 (once P2 gets its first visit).

As far as I can tell, the point of FPU reduction is to make the search tree narrower: using the fpu_eval instead of the raw winrate means that the second choice move has less chance of being selected. If a move has been ignored for a long time, and your winrate is looking OK, then you probably don't need to waste time exploring that move. You'll only get to it if the value of your first choice node ends up being lower than the fpu_eval number. In a future post I plan to show an example of where that has a dramatic impact.
xela
Lives in gote
Posts: 652
Joined: Sun Feb 09, 2014 4:46 am
Rank: Australian 3 dan
GD Posts: 200
Location: Adelaide, South Australia
Has thanked: 219 times
Been thanked: 281 times

Re: Exploring LZ's search algorithm: worked examples

Post by xela »

Now for the "big picture" view: what's the story of playouts 1-100? Here you might want to refer to the SGF file from the previous post. Let's see how LZ bounces around between three main variations, reassessing each of them.
Click Here To Show Diagram Code
[go]$$Bc Variation 1: playouts 1-5 and 51
$$ ---------------------------------------
$$ | . O . . . . . . . . . . 4 3 5 . . . . |
$$ | . X O . . O . b X O O O . 2 1 O O O . |
$$ | . O X O O . . . X X X X O O a X O . O |
$$ | . . X X O . O . . , . O X X X X X O . |
$$ | . . . X X O . . X O . X O . . . X . . |
$$ | . . . X O . O . X X X . X . . . X . . |[/go]
The first five playouts explore this position: here is LZ's first instinct. :w2: at a doesn't get examined at all: the network can see "at a glance" that it's hopeless and gives it a low policy value. After five playouts, LZ has worked out that this :w2: isn't working, so it turns to :w2: at :b3: instead. Later, after finding out that don't work either, it returns to this variation (playout 51) to see if white b next will do anything interesting. (It doesn't.) Overall, the position in this diagram gets seven playouts. Notice that LZ never tries :w6: at a, or any other local move. I think it can "see" that the white stones are cut off and unable to live independently, and doesn't need to read out "elementary" sequences.
Click Here To Show Diagram Code
[go]$$Bc Variation 2: playouts 6-13 and many more
$$ ---------------------------------------
$$ | . O . . . . . . . . . . . 2 c . . . . |
$$ | . X O . . O . . X O O O . a 1 O O O . |
$$ | . O X O O . . . X X X X O O b X O . O |
$$ | . . X X O . O . . , . O X X X X X O . |
$$ | . . . X X O . . X O . X O . . . X . . |
$$ | . . . X O . O . X X X . X . . . X . . |[/go]
This position gets most of LZ's attention, 79 visits in total. It looks at six choices for :b3: : in order of preference,
  • Tenuki at P2: policy value 24%, playout numbers 7, 27, 28, 33, 34, 35
  • a: 23%, 20 visits between playout 8 and 40
  • Tenuki at B16: 11%, one visit only at playout 22
  • b: 8%, playout numbers 29, 31, 32
  • Tenuki at N4: 6%, 1 visit only at playout 39
  • c: 6%, 47 visits between playout 41 and 100
So first it tries tenuki, gets a winrate of 41% for black P2 (the value in the file is 0.5884, but remember that's for white to play), and decides that it can do better. Then for :b3: at a we see this variation:
Click Here To Show Diagram Code
[go]$$Bc Variation 2a, black P2 next, winrate 44%
$$ ---------------------------------------
$$ | . O . . . . . . . . . . . 2 6 . . . . |
$$ | . X O . . O . . X O O O 4 3 1 O O O . |
$$ | . O X O O . . . X X X X O O 5 X O . O |
$$ | . . X X O . O . . , . O X X X X X O . |
$$ | . . . X X O . . X O . X O . . . X . . |
$$ | . . . X O . O . X X X . X . . . X . . |[/go]
Playing the most instinctive moves, LZ can reduce white's territory in sente and then turn to P2, and this improves the winrate slightly, but still isn't satisfactory. It takes a bit of trial and error before LZ gets to the right moves:
Click Here To Show Diagram Code
[go]$$Bc Variation 2b
$$ ---------------------------------------
$$ | . O . . . . . . d b . . 5 2 3 . . . . |
$$ | . X O . . O . . X O O O c 4 1 O O O . |
$$ | . O X O O . . . X X X X O O a X O . O |
$$ | . . X X O . O . . , . O X X X X X O . |
$$ | . . . X X O . . X O . X O . . . X . . |
$$ | . . . X O . O . X X X . X . . . X . . |[/go]
This position gets 34 visits, starting from playout number 52. It's obvious from the very first visit that this is bad for white (winrate of 5.6%), but it takes a number of visits for this information to filter back up the tree: LZ has to overcome the prejudice of both :b3: and :b5: being low-policy moves.

Before trying this :b5:, LZ looks briefly at a (playouts 43-49) before realising that white can live. As alternatives to :w4:, it quickly checks out (two playouts each) white a and white b. In response to :b5:, LZ reads out :w6: at b, c or d. It turns out that white can actually live, but at the cost of black damaging the area to the left, so this is still a winning proposition for black.
Click Here To Show Diagram Code
[go]$$Bc Variation 3, late entry, needs further investigation
$$ ---------------------------------------
$$ | . O . . . . . . . . . . . b 2 . . . . |
$$ | . X O . . O . . X O O O . a 1 O O O . |
$$ | . O X O O . . . X X X X O O . X O . O |
$$ | . . X X O . O . . , . O X X X X X O . |
$$ | . . . X X O . . X O . X O . . . X . . |
$$ | . . . X O . O . X X X . X . . . X . . |[/go]
LZ gives this a quick glance at playouts 79, 80 and 87-90, looking at black's replies at a and b. It seems to prefer black, but the results aren't conclusive. I expect we'd see more of this variation on a longer run.
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: Exploring LZ's search algorithm: worked examples

Post by Bill Spight »

Hmmm. I wonder about this position.
Click Here To Show Diagram Code
[go]$$Bc Black kills
$$ ---------------------------------------
$$ | . O . . . . . . . . . . . . . . . . . |
$$ | . X O . . O W B X O O O . . . O O O . |
$$ | . O X O O . . . X X X X O O . X O . O |
$$ | . . X X O . O . . , . O X X X X X O . |
$$ | . . . X X O . . X O . X O . . . X . . |
$$ | . . . X O . O . X X X . X . . . X . . |[/go]
In this position Black can kill. I know that the margin of victory does not matter, but surely the possibility of a clean kill will affect the winrate estimate, no? Does that then affect the search?
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
xela
Lives in gote
Posts: 652
Joined: Sun Feb 09, 2014 4:46 am
Rank: Australian 3 dan
GD Posts: 200
Location: Adelaide, South Australia
Has thanked: 219 times
Been thanked: 281 times

Re: Exploring LZ's search algorithm: worked examples

Post by xela »

Thanks Bill for the suggestion! I'll get to that soon, but first I want to show a couple of other scenarios.

Analysing the same position using network number 258 instead of 157, LZ explores fewer variations and goes into more depth. It's quicker to find the right answer, and spends less time exploring moves that don't work.
ex1-davies_tesuji-trace258_100po.csv
(140.47 KiB) Downloaded 648 times


Referring to variation numbers in my previous post:
  • It gets to variation 2 first, and only decides to check variation 1 on playout number 99. Variation 3 doesn't appear at all.
  • It still goes down the wrong path of variation 2a, but now discards it by playout number 7 (compared with playout number 40 for the previous attempt)
  • The position of variation 2b gets 69 visits (compared with 34 visits on the previous attempt). With the stronger network, LZ spends literally twice as much time evaluating the "correct answer" diagram rather than exploring alternatives.
Attachments
ex1-davies_tesuji-trace258_100po.sgf
(43.24 KiB) Downloaded 1018 times
xela
Lives in gote
Posts: 652
Joined: Sun Feb 09, 2014 4:46 am
Rank: Australian 3 dan
GD Posts: 200
Location: Adelaide, South Australia
Has thanked: 219 times
Been thanked: 281 times

Re: Exploring LZ's search algorithm: worked examples

Post by xela »

Now we'll analyse the position two moves earlier (this was an accident on my part, but produced interesting results!)
Click Here To Show Diagram Code
[go]$$Bc Black can capture the O17 stones
$$ ---------------------------------------
$$ | . O . . . . . . b c . . . . . . . . . |
$$ | . X O . . . . a X O O O . . 1 O O O . |
$$ | . O X O O . . d . X X X O O . X O . O |
$$ | . . X X O . O . . , . O X X X X X O . |
$$ | . . . X X O . . X O . X O . . . X . . |
$$ | . . . X O . O . X X X . X . . . X . . |[/go]
Here, white can live by exchanging a-d (not immediately, but a few moves in to the variation). But this does some damage to the top left, so it's an unclear position: perhaps white does better to tenuki immediately after :b1: (actually it's not clear-cut even after a few thousand playouts).

How does LZ assess this? I'm using network 258 again.

You know how sometimes Lizzie comes out in a rash, all covered in red spots?
rash.jpg
rash.jpg (78.3 KiB) Viewed 9836 times
This is the result of the "first-play urgency" (fpu) term that I mentioned earlier. Here's the fine print for this position:
trace_two_moves_earlier.csv
(51.71 KiB) Downloaded 431 times


So what exactly is happening here? The network's first assessment of P18 is that the position is good for white, 76% winrate. So this is the fpu value when looking for white's reply: white O19 gets a first-guess value of 76%. But once O19 is actually put on the board, its winrate estimate drops to 58% (you'll see 42% in the CSV file, this is from black's perspective). Meanwhile, having explored one node, the fpu_eval drops to 58%, but this is still a pretty big number for fpu.

On playout number 3, white O19's value is recalculated to 68%, but H18, which was previously the second choice, now gets 58% for FPU plus 12& for exploration bonus (puct), putting it in the lead.

Now that we've explored two different nodes, FPU drops to 52%, and both H18 and O19 are looking better than that, so the next 20 playouts stay with those two white moves. But by playout 22, LZ has found good black replies to those moves, so their evaluations have dropped below the fpu threshold, and it's time to try a new move. On playout 22, J17 scores 57% with FPU plus puct bonus and becomes the top choice. This looks promising for four playouts, but then the value drops, and again nothing is looking better than the FPU value, so LZ tries yet another new move at playout 27. And so on.

What's happening is that the FPU sets a bar for what a reasonable move should look like. In most normal positions, the high-policy moves (the first ones to be explored) will easily clear that bar, so LZ will build a narrow and deep tree. But if the high-policy moves don't live up to their initial promise, then LZ will widen the search until it either finds a better alternative or has tried everything on the board.
Attachments
trace_two_moves_earlier.sgf
(21.08 KiB) Downloaded 763 times
iopq
Dies with sente
Posts: 113
Joined: Wed Feb 27, 2019 11:19 am
Rank: 1d
GD Posts: 0
Universal go server handle: iopq
Has thanked: 11 times
Been thanked: 27 times

Re: Exploring LZ's search algorithm: worked examples

Post by iopq »

Bill Spight wrote:
jlt wrote:
Bill Spight wrote:The only bridge book I gave my sister, a social player, was one by the great bridge author, Victor Mollo, in which, for amateurs, he advocates seeing, not the methodical calculation of variations. As readers here may be aware, I advocate seeing for go. I believe that Kotov's method is great for training, but in actual play I think you have to allow your unconscious to work, as well.
If I understand correctly, during actual play you let sequences pop up to your mind without trying to read "all" variations. But in order to get an efficient neural net (=brain), you need to train it by studying go problems, pro games, etc. During that training process, you think that the methodical calculation of variations is "great", but do you consider it as necessary?
It is important not to dither. It is important to supplement the human tendency to depth first search and shore up its weaknesses. Kotov's method does that. As for the best way to train the human brain, quien sabe? I don't think that we know the best way to train neural networks.

I have touched on these questions over the years. See the SL page about the fudge factor, for instance. https://senseis.xmp.net/?GoProblemsTheFudgeFactor

I look forward to seeing what xela is finding out about LZ's searches. :)
In that page I saw the bulky five + hane shape so I immediately made the bulky five placement, then extend after block then bent four in the corner.

I didn't have to read the internal sequences other than the exact one you read because they are of no interest. Black doesn't have room to live even if he's solidly connected on the outside, no?

Maybe we need to train people by giving them 1 second to solve each problem until they can't intuit the solution anymore. Then show the solution and move on to the next problem.
xela
Lives in gote
Posts: 652
Joined: Sun Feb 09, 2014 4:46 am
Rank: Australian 3 dan
GD Posts: 200
Location: Adelaide, South Australia
Has thanked: 219 times
Been thanked: 281 times

Re: Exploring LZ's search algorithm: worked examples

Post by xela »

On the topic of not reading all the variation, but relying on intuition (or "seeing"):
jlt wrote:
Bill Spight wrote:...I advocate seeing for go. I believe that Kotov's method is great for training, but in actual play I think you have to allow your unconscious to work, as well.
If I understand correctly, during actual play you let sequences pop up to your mind without trying to read "all" variations.
RobertJasiek wrote:Such might be ok for strategic planning but for tactical reading it is a total failure.
In mathematics, I've seen "magic" emerge from pro-level intuition. It's not unusual for a mathematician to publish a theorem, and then for someone to notice a gap in the proof. This is analagous to saying you've seen the key move that solves the tsumego for black, but you forgot to analyse one of white's responses. There are two ways it can pan out. The gap turns out to be an error: the theorem actually is false and needs to be withdrawn. (All your stones get captured.) Or the gap turns out to be a "mere oversight": after some work (in some cases, several years and many pages of publications later), the theorem is verified. (There's the moment of panic when your opponent plays the weird move you didn't think of, but then you're able to find the right reply.)

When graduate students leave these sorts of gaps, both scenarios are likely, and it's happened that people have seen their thesis crumble just before the finish line. But in the case of world-class mathematicians, it's pretty much always the case that their initial intuition was right, and the theorem holds up to scrutiny. This happened dramatically with Wiles's proof of Fermat's last theorem, and also a few times during the classification of finite simple groups. And those are just recent examples. It's been working this way for a long time, as documented in Poincaré's famous book of more than a century ago.

In general, mathematical research progresses by intuitive leaps, which are filled in with calculations after the fact. And then the general public gets the false impression that mathematics is logical and systematic. The leaders in the field aren't the ones who are better at calculation (although they are certainly very very good at calculation), they're the ones who can "see" further than others.

So I'm quite prepared to believe that good go players, in a game situation, will frequently "just see" the right move without reading out all variations. (And if they're asked to explain their choice afterwards, their method of explaining will be to put down variations that weren't part of their conscious thought process during the game.)
xela
Lives in gote
Posts: 652
Joined: Sun Feb 09, 2014 4:46 am
Rank: Australian 3 dan
GD Posts: 200
Location: Adelaide, South Australia
Has thanked: 219 times
Been thanked: 281 times

Re: Exploring LZ's search algorithm: worked examples

Post by xela »

Kirby wrote:breadth first seems harder to do for humans. I like the idea of identifying some candidate plays, especially at the root of the tree. but after that, it's confusing....Thinking in terms of breadth is important, because you don't want to miss moves. but it's pretty darn hard to keep the entire game tree in your head before you can start eliminating options from search.
In chess there's something called the killer heuristic. Many chess players won't know it by name if they're not interested in computer chess, but will subconsciously apply it anyway. Roughly speaking, it says that a good move in one variation is likely to be a good move in other variations. So it allows you to prune the tree very effectively. Even for breadth first search, you can throw out many of the candidate moves instantly. I suspect this heuristic is much more effective in chess than in go, making breadth-first search relatively easier in chess.

(Caveat: I'm an awful chess player, so don't trust me on this one.)
xela
Lives in gote
Posts: 652
Joined: Sun Feb 09, 2014 4:46 am
Rank: Australian 3 dan
GD Posts: 200
Location: Adelaide, South Australia
Has thanked: 219 times
Been thanked: 281 times

Re: Exploring LZ's search algorithm: worked examples

Post by xela »

Slight digression: I suspect that Bill (and maybe others) would be interested in the long-standing debate between "symbolic" and "connectionist" approaches to AI. I'm thinking systematic reading=symbolic and "seeing"=connectionist. See https://neurovenge.antonomase.fr/
RobertJasiek
Judan
Posts: 6273
Joined: Tue Apr 27, 2010 8:54 pm
GD Posts: 0
Been thanked: 797 times
Contact:

Re: Exploring LZ's search algorithm: worked examples

Post by RobertJasiek »

xela wrote:good go players, in a game situation, will frequently "just see" the right move without reading out all variations.
If first we guess, then we also verify or refute.

(If mathematicians first state a conjecture, then they verify or refute it.)
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: Exploring LZ's search algorithm: worked examples

Post by Bill Spight »

xela wrote:
Kirby wrote:breadth first seems harder to do for humans. I like the idea of identifying some candidate plays, especially at the root of the tree. but after that, it's confusing....Thinking in terms of breadth is important, because you don't want to miss moves. but it's pretty darn hard to keep the entire game tree in your head before you can start eliminating options from search.
In chess there's something called the killer heuristic. Many chess players won't know it by name if they're not interested in computer chess, but will subconsciously apply it anyway. Roughly speaking, it says that a good move in one variation is likely to be a good move in other variations. So it allows you to prune the tree very effectively. Even for breadth first search, you can throw out many of the candidate moves instantly. I suspect this heuristic is much more effective in chess than in go, making breadth-first search relatively easier in chess.

(Caveat: I'm an awful chess player, so don't trust me on this one.)
Since go is a long game with a number of battlefields, it is unusual for a single play to be a killer. Maybe that is so a few times in a game. However, it is true that a good play in one variation will often be a good play in other variations, as well. That is behind the All Moves as First heuristic for go programs in pre-MCTS days. And one reason for persistent winrate swings in go is that both players are ignoring a big point or region and playing somewhere else. For instance, in the opening there may be winrate swings when each player fails to play in the last open corner.

Edit: This is one reason that I recommend local deep reading in actual play, as you may find a good play that was not at first obvious. :)
Last edited by Bill Spight on Fri Jan 10, 2020 9:22 am, edited 1 time in total.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
User avatar
jlt
Gosei
Posts: 1786
Joined: Wed Dec 14, 2016 3:59 am
GD Posts: 0
Has thanked: 185 times
Been thanked: 495 times

Re: Exploring LZ's search algorithm: worked examples

Post by jlt »

I looked at the order in which LZ258 explores the tree. I will ignore tenuki moves for simplicity and translate into human-understandable terms and diagrams.

The initial position is the following:
Click Here To Show Diagram Code
[go]$$Bc Initial position
$$ ----------------------------+
$$ . . . . . . . . . . . . . . |
$$ O . . X O O O . . 1 O O O . |
$$ . . . X X X X O O . X O . O |
$$ . O . . , . O X X X X X O . |
$$ O . . X O . X O . . . X . . |[/go]
LZ examines a first variation
Click Here To Show Diagram Code
[go]$$Bc Variation 1.1: not much damage
$$ ----------------------------+
$$ . . . . . . . . 2 6 . . . . |
$$ O . . X O O O 4 3 1 O O O . |
$$ . . . X X X X O O 5 X O . O |
$$ . O . . , . O X X X X X O . |
$$ O . . X O . X O . . . X . . |[/go]
Let's try :b3: at :b5:
Click Here To Show Diagram Code
[go]$$Bc Variation 1.2: not much damage either
$$ ----------------------------+
$$ . . . . . . . . 2 4 . . . . |
$$ O . . X O O O . . 1 O O O . |
$$ . . . X X X X O O 3 X O . O |
$$ . O . . , . O X X X X X O . |
$$ O . . X O . X O . . . X . . |[/go]
Let's try :b3: at :w4:
Click Here To Show Diagram Code
[go]$$Bc Variation 1.3.1. White is split but alive on both sides
$$ ----------------------------+
$$ . . . . . . . 6 2 3 . . . . |
$$ O . . X O O O . 4 1 O O O . |
$$ . . . X X X X O O 5 X O . O |
$$ . O . . , . O X X X X X O . |
$$ O . . X O . X O . . . X . . |[/go]
That's better than variations 1.1 and 1.2 so we can discard them and explore further the subtree starting at :w4: to see if we can do even better. The only other reasonable response is :b5: at N19 so we have to check if N19 is better than P17. Let's read a few moves deep :

Click Here To Show Diagram Code
[go]$$Bcm5 Variation 1.3.2.1.1. White is split but alive on both sides
$$ ----------------------------+
$$ . . . . 2 . 6 1 O X . . . . |
$$ O 5 . X O O O 4 O X O O O . |
$$ . . . X X X X O O 3 X O . O |
$$ . O . . , . O X X X X X O . |
$$ O . . X O . X O . . . X . . |[/go]
That's even better.
Let's try :b7: at :w10:.
Click Here To Show Diagram Code
[go]$$Bcm5 Variation 1.3.2.1.2.1. White is dead, game over.
$$ ----------------------------+
$$ . . . 4 2 . 3 1 O X . . . . |
$$ O . . X O O O 6 O X O O O . |
$$ . . . X X X X O O 5 X O . O |
$$ . O . . , . O X X X X X O . |
$$ O . . X O . X O . . . X . . |[/go]
(Of course White won't play :w10: there, LZ prefers to tenuki. Saving 5 stones in gote is not interesting right now.)

But perhaps we could try :b9: at H18?
Click Here To Show Diagram Code
[go]$$Bcm5 Variation 1.3.2.1.2.2.1 Very good for Black
$$ ----------------------------+
$$ . . 6 4 2 . 3 1 O X . . . . |
$$ O . 5 X O O O . O X O O O . |
$$ . . . X X X X O O 7 X O . O |
$$ . O . . , . O X X X X X O . |
$$ O . . X O . X O . . . X . . |[/go]
This doesn't prevent escaping but is very good for Black.
Click Here To Show Diagram Code
[go]$$Bcm5 Variation 1.3.2.1.2.2.2 Roughly the same
$$ ----------------------------+
$$ . 7 6 4 2 . 3 1 O X . . . . |
$$ O 8 5 X O O O . O X O O O . |
$$ . . . X X X X O O 9 X O . O |
$$ . O . . , . O X X X X X O . |
$$ O . . X O . X O . . . X . . |[/go]
Well, it seems that with :b7: we can split White and in addition capture a few stones. But we didn't check other possibilities for :w6:.
Click Here To Show Diagram Code
[go]$$Bcm5 Variation 1.3.2.2
$$ ----------------------------+
$$ . . . . . . . 1 O X . . . . |
$$ O . . X O O O 2 O X O O O . |
$$ . . . X X X X O O 3 X O . O |
$$ . O . . , . O X X X X X O . |
$$ O . . X O . X O . . . X . . |[/go]
This looks so bad for White, so let's come back to variation 1.3.2.1. We said that :b7: at M19 is good but can we do better, for instance with :b7: at G18?
Click Here To Show Diagram Code
[go]$$Bcm7 Variation 1.3.2.1.3.
$$ ----------------------------+
$$ . . . . O . . X O X . . . . |
$$ O 1 . X O O O . O X O O O . |
$$ . . . X X X X O O . X O . O |
$$ . O . . , . O X X X X X O . |
$$ O . . X O . X O . . . X . . |[/go]
This looks better for White, so let's come back to variation 1.3.2. with :w6: at J19.
Click Here To Show Diagram Code
[go]$$Wcm6 Variation 1.3.2.3.
$$ ----------------------------+
$$ . . . 1 . . . X O X . . . . |
$$ O . . X O O O . O X O O O . |
$$ . . . X X X X O O . X O . O |
$$ . O . . , . O X X X X X O . |
$$ O . . X O . X O . . . X . . |[/go]
Looks good for Black but we'll figure that out later. Let's continue first variation 1.3.2.1.3.

Click Here To Show Diagram Code
[go]$$Bcm7 Variation 1.3.2.1.3., continued
$$ ----------------------------+
$$ . . . . O . 2 X O X . . . . |
$$ O 1 . X O O O . O X O O O . |
$$ . . . X X X X O O . X O . O |
$$ . O . . , . O X X X X X O . |
$$ O . . X O . X O . . . X . . |[/go]
Still looks good but not optimal for Black. Let's finish first variation 1.3.2.2.
Click Here To Show Diagram Code
[go]$$Bcm5 Variation 1.3.2.2, continued:
$$ ----------------------------+
$$ . . . . 4 . . 1 O X . . . . |
$$ O . . X O O O 2 O X O O O . |
$$ . . . X X X X O O 3 X O . O |
$$ . O . . , . O X X X X X O . |
$$ O . . X O . X O . . . X . . |[/go]
White is dead, so variation 1.3.2.2. can be discarded, White won't play :w6: there.

Let's come back to variation 1.3.2.1.3 and read it until the end.
Click Here To Show Diagram Code
[go]$$Bcm7 Variation 1.3.2.1.3., continued
$$ ----------------------------+
$$ . . . . O . 2 X O X . . . . |
$$ O 1 . X O O O 4 O X O O O . |
$$ . . . X X X X O O 3 X O . O |
$$ . O . . , . O X X X X X O . |
$$ O . . X O . X O . . . X . . |[/go]
OK, we've seen that position already, it's good for Black but not the best variation, so we can forget that :b7:. But let's come back to variation 1.3.2.3.
Click Here To Show Diagram Code
[go]$$Wcm6 Variation 1.3.2.3., continued
$$ ----------------------------+
$$ . . . 1 . . . X O X . . . . |
$$ O . 2 X O O O . O X O O O . |
$$ . . . X X X X O O . X O . O |
$$ . O . . , . O X X X X X O . |
$$ O . . X O . X O . . . X . . |[/go]
Very good for Black, White should tenuki and give up his stones for the moment.

OK, so now that we have done 98 playouts, why not try some silly move for :w2:, just to check if we overlooked something?
Click Here To Show Diagram Code
[go]$$Wcm2 Other choice for :w2:
$$ ----------------------------+
$$ . . . . . . . . 2 . . . . . |
$$ O . . X O O O . 1 X O O O . |
$$ . . . X X X X O O . X O . O |
$$ . O . . , . O X X X X X O . |
$$ O . . X O . X O . . . X . . |[/go]
Oh no, that's really silly, White is dead, forget it.

Conclusion: it seems that the exploration is mostly depth first, from left to right, but LZ sometimes comes back to an earlier branch and reads a few moves further from there.

Here is the tree. Black edges represent a sequence starting with Black, green edges a sequence starting with White.

A blue vertex is a very good position for Black, a black vertex is slightly good for black, and a red vertex is bad for Black.
tree.png
tree.png (21.45 KiB) Viewed 9756 times
xela
Lives in gote
Posts: 652
Joined: Sun Feb 09, 2014 4:46 am
Rank: Australian 3 dan
GD Posts: 200
Location: Adelaide, South Australia
Has thanked: 219 times
Been thanked: 281 times

Re: Exploring LZ's search algorithm: worked examples

Post by xela »

Thanks jlt, nice pictures!
Bill Spight wrote:Hmmm. I wonder about this position.
Click Here To Show Diagram Code
[go]$$Bc Black kills
$$ ---------------------------------------
$$ | . O . . . . . . . . . . . . . . . . . |
$$ | . X O . . O W B X O O O . . . O O O . |
$$ | . O X O O . . . X X X X O O . X O . O |
$$ | . . X X O . O . . , . O X X X X X O . |
$$ | . . . X X O . . X O . X O . . . X . . |
$$ | . . . X O . O . X X X . X . . . X . . |[/go]
In this position Black can kill. I know that the margin of victory does not matter, but surely the possibility of a clean kill will affect the winrate estimate, no? Does that then affect the search?
Yes, it does have an effect. I'll post the raw data and you can form your own impressions :-)

First, analysing with network number 157.
davies_tesuji_bill_variation-trace157.csv
(92.91 KiB) Downloaded 445 times
Attachments
davies_tesuji_bill_variation-trace157.sgf
(31.29 KiB) Downloaded 705 times
Last edited by xela on Fri Jan 10, 2020 2:35 pm, edited 1 time in total.
xela
Lives in gote
Posts: 652
Joined: Sun Feb 09, 2014 4:46 am
Rank: Australian 3 dan
GD Posts: 200
Location: Adelaide, South Australia
Has thanked: 219 times
Been thanked: 281 times

Re: Exploring LZ's search algorithm: worked examples

Post by xela »

And network 258 (separate post because of "three attachments per post" limit, am I the only person who finds this really annoying?)

Attachments
davies_tesuji_bill_variation-trace258.sgf
(23.69 KiB) Downloaded 700 times
davies_tesuji_bill_variation-trace258.csv
(62.45 KiB) Downloaded 455 times
Post Reply