How LZ reads out ladders
Posted: Sun Mar 01, 2020 12:59 am
Carrying on from the other thread, now I'm using my modified LZ version to explore how LZ "understands" ladders. It's really interesting to look at older and newer LZ nets and see how they treat the same position.
In theory, there should be three things going on:
I'd expect that smaller networks (5 or 6 blocks) will need to play out pretty much the whole ladder, because they can't "see all the way across the board", while a 20 or 40 block network should be able to "take in the position at a glance" and understand the ladder status without playing out the moves.
So, on to some tests. Below are some taisha positions where both sides have made mistakes, and now white has the chance to start a ladder. I want to look at four scenarios:
Test position 1B (above): after white a, black should tenuki -- there are several possible moves, for example c, d, e, but b would be a bad mistake.
Test position 2A: white a is a mistake. Any of the points marked c are not too bad, although d is probably white's best option.
Test position 2B: after white a, black must play b.
I tested with seven different networks (28 permutations of test position + network), with 2,000 playouts each time. The networks were number 45 (5 blocks), 57 (also 5 blocks), 91 (6 blocks), 116 (10 blocks), 157 (15 blocks), 173 (20 blocks) and 258 (40 blocks).
Summary of results:
First is the interplay between network (policy and eval) and playouts for the medium sized nets. They need to read a few steps to evaluate the ladder correctly, but they don't need to read right to the end of the ladder. Of course, "LZ-157 can understand a ladder in 20 playouts" doesn't mean that it never makes ladder mistakes. If the ladder position is a few moves deep in a variation, then that specific position may not get enough playouts, so the ladder can still be "over the horizon" leading to a mistake.
Second is the fact that a 20-block network still isn't quite big enough to make an accurate assessment of the full board. I guess the first five or ten blocks are about understanding basic shapes, then the later blocks start to take in bigger chunks.
Third, it looks as though the 40-block network really can see the ladder status without having to read it out at all, at least for this position. We'd need to test on a bunch more positions to be sure. But I recall many of the "LZ can't do ladders" complaints happening around the time of moving from 15 to 20 blocks. Is it possible that the problem is solved simply by moving to a bigger network?
Finally, the attached GTP log includes all 28 tests, for people who like going through lots of data, showing the number of playouts, policy value, winrate and principal variations for each move. In many cases, reading out the ladder isn't the PV, it's buried amongst the other variations. Over the next few posts I'll show you a few examples.
In theory, there should be three things going on:
- Policy network: does LZ think the next move in the ladder is an obvious move to explore? Can it "see at a glance" whether a ladder capture is good or bad?
- Playouts: it can take about 60 moves to read a ladder that goes all the way across the board. Does that mean LZ needs a minimum of 60 playouts to read out a ladder? Or more playouts if it's reading out other variations along the way?
- Net evaluation: Once a few moves of a ladder appear on the board, can LZ recognise the position as good for white or good for black? Can it accurately measure the cost of playing out a bad ladder?
I'd expect that smaller networks (5 or 6 blocks) will need to play out pretty much the whole ladder, because they can't "see all the way across the board", while a 20 or 40 block network should be able to "take in the position at a glance" and understand the ladder status without playing out the moves.
So, on to some tests. Below are some taisha positions where both sides have made mistakes, and now white has the chance to start a ladder. I want to look at four scenarios:
- Test position 1A: good ladder, attacker's perspective. White's best move is to atari the black stone and start the ladder.
- Test position 1B: good ladder, defender's perspective. Black shouldn't pull out of atari, but should play elsewhere.
- Test position 2A: bad ladder, attacker's perspective. Here, for white to give atari is a mistake.
- Test position 2B: bad ladder, defender's persepctive. White has made a mistake and started the ladder: now black should pull out of atari.
Test position 1B (above): after white a, black should tenuki -- there are several possible moves, for example c, d, e, but b would be a bad mistake.
Test position 2A: white a is a mistake. Any of the points marked c are not too bad, although d is probably white's best option.
Test position 2B: after white a, black must play b.
I tested with seven different networks (28 permutations of test position + network), with 2,000 playouts each time. The networks were number 45 (5 blocks), 57 (also 5 blocks), 91 (6 blocks), 116 (10 blocks), 157 (15 blocks), 173 (20 blocks) and 258 (40 blocks).
Summary of results:
- For test positions 1A and 1B, LZ-45 fails: it tries to read out the ladder, but doesn't really "get" how ladders work -- many of the variations are ataris from the wrong side. For 1A, it wants to play C3 instead of G4. For 1B, it wants to pull out of atari. Remember that this network is already based on a million and a half self-play games and can challenge dan-level amateurs. Apparently, that's how well you can play based on good judgement and local shape, without being able to read well!
- All other networks get test positions 1A and 1B correct, with various amounts of reading. In the next few posts I'll give more details of how the different networks analysed the positions.
- For test positions 2A and 2B, even LZ-45 got it right, but LZ-116, 157 and 173 have an interesting blind spot here! They seem to fall into a local minimum where the policy network is sharp enough to make reading a lot more efficient, but not sharp enough to actually get the right answer. They read out a few steps of the ladder, not the whole ladder, then think they've got it and stop reading. On the other hand, LZ-258 gets the right answer without reading at all.
First is the interplay between network (policy and eval) and playouts for the medium sized nets. They need to read a few steps to evaluate the ladder correctly, but they don't need to read right to the end of the ladder. Of course, "LZ-157 can understand a ladder in 20 playouts" doesn't mean that it never makes ladder mistakes. If the ladder position is a few moves deep in a variation, then that specific position may not get enough playouts, so the ladder can still be "over the horizon" leading to a mistake.
Second is the fact that a 20-block network still isn't quite big enough to make an accurate assessment of the full board. I guess the first five or ten blocks are about understanding basic shapes, then the later blocks start to take in bigger chunks.
Third, it looks as though the 40-block network really can see the ladder status without having to read it out at all, at least for this position. We'd need to test on a bunch more positions to be sure. But I recall many of the "LZ can't do ladders" complaints happening around the time of moving from 15 to 20 blocks. Is it possible that the problem is solved simply by moving to a bigger network?
Finally, the attached GTP log includes all 28 tests, for people who like going through lots of data, showing the number of playouts, policy value, winrate and principal variations for each move. In many cases, reading out the ladder isn't the PV, it's buried amongst the other variations. Over the next few posts I'll show you a few examples.