Maybe in your games. When reviewing my games with LZ I see that a single play is a "killer move" every move for me AND my opponent where it literally swings the game and we don't take it for many moves. Like a cut that we both didn't see or something. It's good in every variation. We could stop playing locally at any point and just do it to start a bigger fight that has a higher temperature, so whatever we're doing is not importantBill Spight wrote:Since go is a long game with a number of battlefields, it is unusual for a single play to be a killer.xela wrote:In chess there's something called the killer heuristic. Many chess players won't know it by name if they're not interested in computer chess, but will subconsciously apply it anyway. Roughly speaking, it says that a good move in one variation is likely to be a good move in other variations. So it allows you to prune the tree very effectively. Even for breadth first search, you can throw out many of the candidate moves instantly. I suspect this heuristic is much more effective in chess than in go, making breadth-first search relatively easier in chess.Kirby wrote:breadth first seems harder to do for humans. I like the idea of identifying some candidate plays, especially at the root of the tree. but after that, it's confusing....Thinking in terms of breadth is important, because you don't want to miss moves. but it's pretty darn hard to keep the entire game tree in your head before you can start eliminating options from search.
(Caveat: I'm an awful chess player, so don't trust me on this one.)
Exploring LZ's search algorithm: worked examples
-
iopq
- Dies with sente
- Posts: 113
- Joined: Wed Feb 27, 2019 11:19 am
- Rank: 1d
- GD Posts: 0
- Universal go server handle: iopq
- Has thanked: 11 times
- Been thanked: 27 times
Re: Exploring LZ's search algorithm: worked examples
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: Exploring LZ's search algorithm: worked examples
If you had quoted my whole, short paragraph, you would have seen that I reported that phenomenon and the reason for it.iopq wrote:Maybe in your games. When reviewing my games with LZ I see that a single play is a "killer move" every move for me AND my opponent where it literally swings the game and we don't take it for many moves. Like a cut that we both didn't see or something. It's good in every variation. We could stop playing locally at any point and just do it to start a bigger fight that has a higher temperature, so whatever we're doing is not importantBill Spight wrote:Since go is a long game with a number of battlefields, it is unusual for a single play to be a killer.xela wrote: In chess there's something called the killer heuristic. Many chess players won't know it by name if they're not interested in computer chess, but will subconsciously apply it anyway. Roughly speaking, it says that a good move in one variation is likely to be a good move in other variations. So it allows you to prune the tree very effectively. Even for breadth first search, you can throw out many of the candidate moves instantly. I suspect this heuristic is much more effective in chess than in go, making breadth-first search relatively easier in chess.
(Caveat: I'm an awful chess player, so don't trust me on this one.)
What you are talking about is what I meant by the italicized sentence.Bill Spight wrote:Since go is a long game with a number of battlefields, it is unusual for a single play to be a killer. Maybe that is so a few times in a game. However, it is true that a good play in one variation will often be a good play in other variations, as well. That is behind the All Moves as First heuristic for go programs in pre-MCTS days. And one reason for persistent winrate swings in go is that both players are ignoring a big point or region and playing somewhere else. For instance, in the opening there may be winrate swings when each player fails to play in the last open corner.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
- ez4u
- Oza
- Posts: 2414
- Joined: Wed Feb 23, 2011 10:15 pm
- Rank: Jp 6 dan
- GD Posts: 0
- KGS: ez4u
- Location: Tokyo, Japan
- Has thanked: 2351 times
- Been thanked: 1332 times
Re: Exploring LZ's search algorithm: worked examples
In post #21 above xela wrote,
What I did here as an illustration was to start Lizzie with the LZ258 net, open xela's 258 analysis sgf file, go to the position indicated above with pondering off, make the move at P18, and then quickly turn pondering on and off to get 340-400 playouts as indicated at the top of the screen in Lizzie. I did it manually since I did not know how to do it more precisely. Why 340-400? That seemed comparable to the screenshot that xela has in post #21 above.
Below (3 attachments) and in the following post (3 more) are screenshots of the results of repeating the procedure 6 times in a row, shutting down Lizzie between each run. The allocation of the search and the resulting evaluation is significantly different between the runs.
I have some difficulty following what this whole thread is saying. The problem is that this (and most of the other recent discussions using bot results seem to assume that the bots produce "a" result that we can use as an assessment of the position under discussion. I do not find that to be the case if we run the analysis more than once.xela wrote:Now we'll analyse the position two moves earlier (this was an accident on my part, but produced interesting results!)
Here, white can live by exchanging a-d (not immediately, but a few moves in to the variation). But this does some damage to the top left, so it's an unclear position: perhaps white does better to tenuki immediately after(actually it's not clear-cut even after a few thousand playouts).
How does LZ assess this? I'm using network 258 again.
...
What I did here as an illustration was to start Lizzie with the LZ258 net, open xela's 258 analysis sgf file, go to the position indicated above with pondering off, make the move at P18, and then quickly turn pondering on and off to get 340-400 playouts as indicated at the top of the screen in Lizzie. I did it manually since I did not know how to do it more precisely. Why 340-400? That seemed comparable to the screenshot that xela has in post #21 above.
Below (3 attachments) and in the following post (3 more) are screenshots of the results of repeating the procedure 6 times in a row, shutting down Lizzie between each run. The allocation of the search and the resulting evaluation is significantly different between the runs.
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
- ez4u
- Oza
- Posts: 2414
- Joined: Wed Feb 23, 2011 10:15 pm
- Rank: Jp 6 dan
- GD Posts: 0
- KGS: ez4u
- Location: Tokyo, Japan
- Has thanked: 2351 times
- Been thanked: 1332 times
Re: Exploring LZ's search algorithm: worked examples
Here are my additional pictures.
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
-
xela
- Lives in gote
- Posts: 652
- Joined: Sun Feb 09, 2014 4:46 am
- Rank: Australian 3 dan
- GD Posts: 200
- Location: Adelaide, South Australia
- Has thanked: 219 times
- Been thanked: 281 times
Re: Exploring LZ's search algorithm: worked examples
Thanks, this is an important point that's got swamped in the discussion. Two points really.ez4u wrote:I have some difficulty following what this whole thread is saying. The problem is that this (and most of the other recent discussions using bot results seem to assume that the bots produce "a" result that we can use as an assessment of the position under discussion. I do not find that to be the case if we run the analysis more than once.
First, I started this thread because I'm interested in the process, not the results. Can we peer inside Leela Zero's brain and learn more about what's happening? How exactly does LZ choose which moves to explore? Why does it sometimes give up on one move, switch to another for a while, then switch back? How is it deciding when to switch and when to stay with its first choice? Why does it sometimes ignore things that humans find promising? And so on.
Everything is represented inside the software as numbers. Can a close look at the numbers help us learn what's happening? (Not to improve our go playing, just to understand more about this fascinating new technology.) So I've been posting CSV files and SGF files. The CSVs contain all the numbers: these are the fundamental things I want to study. (Well, to be really fundamental, what I'm studying is the LZ source code. The CSVs are a way to test and refine my understanding.) And then the SGFs are an aid to navigation, a way to quickly flip through the variations and decide which lines of the CSV to look at.
But other people around here seem to be more results-oriented than me. When I checked a couple of days ago, noone had opened up any of the CSVs. Literally not one download. (Recently, two of the files have been downloaded twice each.) That's OK, if people find the SGFs interesting in their own right then I'm happy to go with the flow. But you're right, it's a change of focus.
Secondly, why are you getting different results on each run? And should you be worried?
There's a couple of factors:
- LZ's neural nets are not symmetric. If you flip or rotate the board, you'll get slightly different evaluations. (There are eight possible symmetries.) Each time LZ evaluates the neural net, it will randomly choose one of the symmetries. A good way to see this is to use the "show policy" button in Lizzie. When you restart, you might get different policy values. For clear-cut positions, they should always be in the same ballpark. (For example, if one move has policy over 50%, and all other moves are below 5%, the exact numbers might move around, but the high policy move should always be the same move.) The network evaluations of winrates will also change, but Lizzie won't show you those numbers.
- If you run with two or more threads, then the threads race each other. If thread 1 happens to go a bit faster than thread 2 today, and then tomorrow thread 2 is the faster one, they'll explore and evaluate the nodes in a different order. Which one is fastest will vary according to other things happening within your computer's operating system.
For any Monte Carlo method, there's some mathematical theory around the idea of convergence. That is, the first few iterations will give varying results depending on the seed, but after enough (a few thousand?) playouts, you should end up with the same answer regardless of where you started. For "pure" MCTS in go (i.e. not using a neural net, just using random rollouts to the end of the game), I think the theory is fairly solid in this respect. When you factor in the network being asymmetric, I don't think anyone's done the maths rigorously, we're just relying on intuition and experience to say the results will be sensible.
-
xela
- Lives in gote
- Posts: 652
- Joined: Sun Feb 09, 2014 4:46 am
- Rank: Australian 3 dan
- GD Posts: 200
- Location: Adelaide, South Australia
- Has thanked: 219 times
- Been thanked: 281 times
Re: Exploring LZ's search algorithm: worked examples
To give you an idea of how much difference the asymmetry makes, here are the policy values and network winrates for this position:
Only moves a-f get policy values above 1% in any symmetry. (Sorry about the big vertical space, I don't know why this forum does that with tables.) You can see that regardless of the random number seeds, LZ will be looking at b, c and f in some order. The winrates vary a little bit but it's reasonable to expect they'll end up in the same place after a thousand playouts' worth of adjustment.
I would guess that your run number 3 is symmetry 2 in my table, and run numbers 4 and 6 are symmetry 7. The others are a bit harder to pick. (And why does P19 sometimes appear in your screen shots despite its policy value being lower than the moves I've shown? Because for some random number seeds, J17 gets a winrate estimate in the 20s so it gets put aside quickly, whereas P19 gets a winrate estimate in the high 40s so it's a more promising candidate for investigation.)
| move | symmetry 1 | symmetry 2 | symmetry 3 | symmetry 4 | symmetry 5 | symmetry 6 | symmetry 7 | symmetry 8 |
| N19 | 0.5% | 0.3% | 0.6% | 0.5% | 2.1% | 0.5% | 0.4% | 0.3% |
| O19 | 20.1% | 53.2% | 31.9% | 20.4% | 19.3% | 24.3% | 21.3% | 23.1% |
| H18 | 56.0% | 32.7% | 46.7% | 51.5% | 41.8% | 48.5% | 64.6% | 55.0% |
| N18 | 1.8% | 0.5% | 0.7% | 2.7% | 2.1% | 2.2% | 0.3% | 1.3% |
| O18 | 2.3% | 1.3% | 0.9% | 0.8% | 3.0% | 1.8% | 2.0% | 1.8% |
| J17 | 10.1% | 2.4% | 11.5% | 15.2% | 23.4% | 14.8% | 1.7% | 9.8% |
| winrate | 70.1% | 75.9% | 79.1% | 68.6% | 80.3% | 78.1% | 68.3% | 62.4% |
-
xela
- Lives in gote
- Posts: 652
- Joined: Sun Feb 09, 2014 4:46 am
- Rank: Australian 3 dan
- GD Posts: 200
- Location: Adelaide, South Australia
- Has thanked: 219 times
- Been thanked: 281 times
Re: Exploring LZ's search algorithm: worked examples
Oh, and the "show policy" button in Lizzie seems to show different policy values from the ones that LZ will display on the command line using the "heatmap all" GTP command. I haven't got to the bottom of that one yet. The numbers are close, but not identical.
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: Exploring LZ's search algorithm: worked examples
I suppose that the first symmetry is a reflection in a line. But which line? The 10th file, the 10th rank, the diagonal from A-01 to T-19, or the diagonal from A-19 to T-01? 
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: Exploring LZ's search algorithm: worked examples
Why not? I can guess, but I'd rather not.xela wrote:LZ's neural nets are not symmetric.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
dfan
- Gosei
- Posts: 1598
- Joined: Wed Apr 21, 2010 8:49 am
- Rank: AGA 2k Fox 3d
- GD Posts: 61
- KGS: dfan
- Has thanked: 891 times
- Been thanked: 534 times
- Contact:
Re: Exploring LZ's search algorithm: worked examples
How could they be? Can you define the mathematical constraint you'd like to put on the convolutional kernels?Bill Spight wrote:Why not? I can guess, but I'd rather not.xela wrote:LZ's neural nets are not symmetric.
One thing you could theoretically do to handle symmetries is to canonicalize every input position to the network. Create a total ordering on board positions (e.g., sort positions by the state of the A1 point, then the A2 point if the two positions are the same at A1, then the A3 point, etc.). Then when you want to evaluate a position, take the eight transformations of it (including the identity), select the one that results in the "smallest" position, apply that transformation, send it through the network, and then apply the inverse transformation to the policy. That sounds like kind of a waste of energy, though.
Similarly, you could define the output of the policy network to be the average of all eight policies on the transformed positions, but now you're effectively doing eight times as much work to evaluate each position.
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: Exploring LZ's search algorithm: worked examples
Well, considering that the square go board, unlike the chess board, has no essentially identifiable sides, or quadrants, or semi-quadrants, and thus has eightfold symmetry, why shouldn't the neural nets have eightfold symmetry?dfan wrote:How could they be? Can you define the mathematical constraint you'd like to put on the convolutional kernels?Bill Spight wrote:Why not? I can guess, but I'd rather not.xela wrote:LZ's neural nets are not symmetric.
I don't know what assumptions you are making, to make things so difficult.One thing you could theoretically do to handle symmetries is to canonicalize every input position to the network. Create a total ordering on board positions (e.g., sort positions by the state of the A1 point, then the A2 point if the two positions are the same at A1, then the A3 point, etc.). Then when you want to evaluate a position, take the eight transformations of it (including the identity), select the one that results in the "smallest" position, apply that transformation, send it through the network, and then apply the inverse transformation to the policy. That sounds like kind of a waste of energy, though.
Similarly, you could define the output of the policy network to be the average of all eight policies on the transformed positions, but now you're effectively doing eight times as much work to evaluate each position.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
lightvector
- Lives in sente
- Posts: 759
- Joined: Sat Jun 19, 2010 10:11 pm
- Rank: maybe 2d
- GD Posts: 0
- Has thanked: 114 times
- Been thanked: 916 times
Re: Exploring LZ's search algorithm: worked examples
There are known ways of enforcing symmetry (search for "equivariant neural nets"). They are quite bad compared to simply randomly applying one of the 8 symmetries to your training data on every data sample as you feed the data to the neural net. With the latter method, the neural net of course won't be fully symmetric in practice, as it will start off not symmetric, and will randomly be learning the different symmetries for each position in a different order, etc. The "true" ideal target you're training it to converge to in the limit of infinite training is symmetric though.
A fun consequence of this is that usually the more thoroughly the neural net "understands" a position, the more closely all 8 symmetries will agree on that position, so you can use the divergence between the 8 symmetries as a crude indicator of the neural net's certainty on a position.
Why is enforcing symmetry bad?
Intuition: Perhaps it's better to allow the neural net to learn potentially asymmetric things on its way to converging to the ideal symmetric function, it's less likely to get stuck in a local hill as the asymmetric states can act as more bridges for the neural net to better transition between different nearly-symmetric states.
Also a practical issue: The current specific ways of enforcing symmetry via equivariant nets enforce a certain property of "equivariance" in the weights of the neural net at each layer. This construction has the neural net apply symmetry-rotated/reflected versions of every set of weights internally at every layer. If any isolated bit of the weights that the neural net would have ended up learning would have been roughly symmetric anyways (e.g. maybe in an early layer the neural net has a "pseudoeye" detector that looks for an empty point surrounded by four adjacent stones of the same color - a fully symmetric local shape), then holding computational cost of the net fixed this results in an 2 or 4 or 8-fold waste of the capacity of that bit of the neural net to have symmetrized copies of those weights that are already naturally symmetric.
Maybe you think: instead of trying some complicated equivariance thing involving mirrored copies of asymmetric weights, why not just simply enforce that the convolutional filters at every layer are symmetric directly? This is even worse. Many symmetric functions are most naturally expressed as combinations of asymmetric functions - e.g. things like count liberties northward - count liberties westward - count liberties eastward - count liberties southward - add all together. All four operations individually are asymmetric, since each one only counts in a specific direction, but their combination is symmetric. If you enforce literal symmetry at every internal layer, you prevent the neural net from doing things like this - "equivariance" what we call it when you still allow the net to do this, where north,west,east,south can all individually compute asymmetric things temporarily, even over many layers, but are symmetric to *each other*.
A fun consequence of this is that usually the more thoroughly the neural net "understands" a position, the more closely all 8 symmetries will agree on that position, so you can use the divergence between the 8 symmetries as a crude indicator of the neural net's certainty on a position.
Why is enforcing symmetry bad?
Intuition: Perhaps it's better to allow the neural net to learn potentially asymmetric things on its way to converging to the ideal symmetric function, it's less likely to get stuck in a local hill as the asymmetric states can act as more bridges for the neural net to better transition between different nearly-symmetric states.
Also a practical issue: The current specific ways of enforcing symmetry via equivariant nets enforce a certain property of "equivariance" in the weights of the neural net at each layer. This construction has the neural net apply symmetry-rotated/reflected versions of every set of weights internally at every layer. If any isolated bit of the weights that the neural net would have ended up learning would have been roughly symmetric anyways (e.g. maybe in an early layer the neural net has a "pseudoeye" detector that looks for an empty point surrounded by four adjacent stones of the same color - a fully symmetric local shape), then holding computational cost of the net fixed this results in an 2 or 4 or 8-fold waste of the capacity of that bit of the neural net to have symmetrized copies of those weights that are already naturally symmetric.
Maybe you think: instead of trying some complicated equivariance thing involving mirrored copies of asymmetric weights, why not just simply enforce that the convolutional filters at every layer are symmetric directly? This is even worse. Many symmetric functions are most naturally expressed as combinations of asymmetric functions - e.g. things like count liberties northward - count liberties westward - count liberties eastward - count liberties southward - add all together. All four operations individually are asymmetric, since each one only counts in a specific direction, but their combination is symmetric. If you enforce literal symmetry at every internal layer, you prevent the neural net from doing things like this - "equivariance" what we call it when you still allow the net to do this, where north,west,east,south can all individually compute asymmetric things temporarily, even over many layers, but are symmetric to *each other*.
-
dfan
- Gosei
- Posts: 1598
- Joined: Wed Apr 21, 2010 8:49 am
- Rank: AGA 2k Fox 3d
- GD Posts: 61
- KGS: dfan
- Has thanked: 891 times
- Been thanked: 534 times
- Contact:
Re: Exploring LZ's search algorithm: worked examples
I was speaking from an implementation point of view (thus my question about the form of the constraint on the convolutional kernels), rather than the mathematics of the desired output (which I agree "should be" symmetrical). Anyway, lightvector answered this (at least) as well as I can.Bill Spight wrote:Well, considering that the square go board, unlike the chess board, has no essentially identifiable sides, or quadrants, or semi-quadrants, and thus has eightfold symmetry, why shouldn't the neural nets have eightfold symmetry?dfan wrote:How could they be? Can you define the mathematical constraint you'd like to put on the convolutional kernels?Bill Spight wrote: Why not? I can guess, but I'd rather not.
I'm not sure what this sentence means, but I'm also not sure it's worth pursuing! If you want to spell it out more I will try to respond, but I also don't mind just dropping it.I don't know what assumptions you are making, to make things so difficult.One thing you could theoretically do to handle symmetries is to canonicalize every input position to the network. Create a total ordering on board positions (e.g., sort positions by the state of the A1 point, then the A2 point if the two positions are the same at A1, then the A3 point, etc.). Then when you want to evaluate a position, take the eight transformations of it (including the identity), select the one that results in the "smallest" position, apply that transformation, send it through the network, and then apply the inverse transformation to the policy. That sounds like kind of a waste of energy, though.
Similarly, you could define the output of the policy network to be the average of all eight policies on the transformed positions, but now you're effectively doing eight times as much work to evaluate each position.
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: Exploring LZ's search algorithm: worked examples
Thanks to you, lightvector, and dfan for your replies. 
I said I'd rather not guess, but here is my guess, anyway. Enforcing symmetry is quite likely to slow learning. For instance, suppose that altering the weight for Q-17 on the full board gives the most improvement for a training game or set of training games. But if you applied the same alteration to all 8 3-4 points, it might even make the player play worse. OTOH, slower learning might not be, in the end, a bad thing. For instance, enforcing symmetry would reduce path dependency by joining many paths into one.

Ordering the counting is necessary for efficiency. I wondered if they count somewhat like parrots do. Parrots can count to 6 in the sense that they can distinguish between there being 5 objects in a small enough space and there being 6 objects there, but the difference between 6 objects and 7 objects gives them trouble. I have imagined that the bots gradually learn to distinguish between N and N-1 dame as N increases.
I said I'd rather not guess, but here is my guess, anyway. Enforcing symmetry is quite likely to slow learning. For instance, suppose that altering the weight for Q-17 on the full board gives the most improvement for a training game or set of training games. But if you applied the same alteration to all 8 3-4 points, it might even make the player play worse. OTOH, slower learning might not be, in the end, a bad thing. For instance, enforcing symmetry would reduce path dependency by joining many paths into one.
Very interesting.lightvector wrote:There are known ways of enforcing symmetry (search for "equivariant neural nets"). They are quite bad compared to simply randomly applying one of the 8 symmetries to your training data on every data sample as you feed the data to the neural net. With the latter method, the neural net of course won't be fully symmetric in practice, as it will start off not symmetric, and will randomly be learning the different symmetries for each position in a different order, etc. The "true" ideal target you're training it to converge to in the limit of infinite training is symmetric though.
A fun consequence of this is that usually the more thoroughly the neural net "understands" a position, the more closely all 8 symmetries will agree on that position, so you can use the divergence between the 8 symmetries as a crude indicator of the neural net's certainty on a position.
Interesting. On its face that is quite different from my intuition that enforcing symmetry would be likely to smooth out the fitness landscape.Why is enforcing symmetry bad?
Intuition: Perhaps it's better to allow the neural net to learn potentially asymmetric things on its way to converging to the ideal symmetric function, it's less likely to get stuck in a local hill as the asymmetric states can act as more bridges for the neural net to better transition between different nearly-symmetric states.
Now that you bring up counting liberties, I have wondered how the networks learn to do so. I kind of doubt whether they learn a counting operation. And even if they did, how do they learn how to order the counting? After all, there are 120 ways to count 5 dame.Also a practical issue: The current specific ways of enforcing symmetry via equivariant nets enforce a certain property of "equivariance" in the weights of the neural net at each layer. This construction has the neural net apply symmetry-rotated/reflected versions of every set of weights internally at every layer. If any isolated bit of the weights that the neural net would have ended up learning would have been roughly symmetric anyways (e.g. maybe in an early layer the neural net has a "pseudoeye" detector that looks for an empty point surrounded by four adjacent stones of the same color - a fully symmetric local shape), then holding computational cost of the net fixed this results in an 2 or 4 or 8-fold waste of the capacity of that bit of the neural net to have symmetrized copies of those weights that are already naturally symmetric.
Maybe you think: instead of trying some complicated equivariance thing involving mirrored copies of asymmetric weights, why not just simply enforce that the convolutional filters at every layer are symmetric directly? This is even worse. Many symmetric functions are most naturally expressed as combinations of asymmetric functions - e.g. things like count liberties northward - count liberties westward - count liberties eastward - count liberties southward - add all together. All four operations individually are asymmetric, since each one only counts in a specific direction, but their combination is symmetric. If you enforce literal symmetry at every internal layer, you prevent the neural net from doing things like this - "equivariance" what we call it when you still allow the net to do this, where north,west,east,south can all individually compute asymmetric things temporarily, even over many layers, but are symmetric to *each other*.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
Mike Novack
- Lives in sente
- Posts: 1045
- Joined: Mon Aug 09, 2010 9:36 am
- GD Posts: 0
- Been thanked: 182 times
Re: Exploring LZ's search algorithm: worked examples
Not just parrots. We humans can count small numbers of things if we want to, because we have learned counting as a method for when the number of things is too great for "number recognition". But we do not ordinarily count small numbers of objects. We recognize how many (just like the parrots). I believe that there have been recent papers about AI neural nets being able to learn this ability. Keep in mind, OUR brains ARE neural nets.Bill Spight wrote: I wondered if they count somewhat like parrots do. Parrots can count to 6 in the sense that they can distinguish between there being 5 objects in a small enough space and there being 6 objects there, but the difference between 6 objects and 7 objects gives them trouble. I have imagined that the bots gradually learn to distinguish between N and N-1 dame as N increases.