Thanks to you, lightvector, and dfan for your replies.
I said I'd rather not guess, but here is my guess, anyway. Enforcing symmetry is quite likely to slow learning. For instance, suppose that altering the weight for Q-17 on the full board gives the most improvement for a training game or set of training games. But if you applied the same alteration to all 8 3-4 points, it might even make the player play worse. OTOH, slower learning might not be, in the end, a bad thing. For instance, enforcing symmetry would reduce path dependency by joining many paths into one.
lightvector wrote:
There are known ways of enforcing symmetry (search for "equivariant neural nets"). They are quite bad compared to simply randomly applying one of the 8 symmetries to your training data on every data sample as you feed the data to the neural net. With the latter method, the neural net of course won't be fully symmetric in practice, as it will start off not symmetric, and will randomly be learning the different symmetries for each position in a different order, etc. The "true" ideal target you're training it to converge to in the limit of infinite training is symmetric though.
A fun consequence of this is that usually the more thoroughly the neural net "understands" a position, the more closely all 8 symmetries will agree on that position, so you can use the divergence between the 8 symmetries as a crude indicator of the neural net's certainty on a position.
Very interesting.
Quote:
Why is enforcing symmetry bad?
Intuition: Perhaps it's better to allow the neural net to learn potentially asymmetric things on its way to converging to the ideal symmetric function, it's less likely to get stuck in a local hill as the asymmetric states can act as more bridges for the neural net to better transition between different nearly-symmetric states.
Interesting. On its face that is quite different from my intuition that enforcing symmetry would be likely to smooth out the fitness landscape.
Quote:
Also a practical issue: The current specific ways of enforcing symmetry via equivariant nets enforce a certain property of "equivariance" in the weights of the neural net at each layer. This construction has the neural net apply symmetry-rotated/reflected versions of every set of weights internally at every layer. If any isolated bit of the weights that the neural net would have ended up learning would have been roughly symmetric anyways (e.g. maybe in an early layer the neural net has a "pseudoeye" detector that looks for an empty point surrounded by four adjacent stones of the same color - a fully symmetric local shape), then holding computational cost of the net fixed this results in an 2 or 4 or 8-fold waste of the capacity of that bit of the neural net to have symmetrized copies of those weights that are already naturally symmetric.
Maybe you think: instead of trying some complicated equivariance thing involving mirrored copies of asymmetric weights, why not just simply enforce that the convolutional filters at every layer are symmetric directly? This is even worse. Many symmetric functions are most naturally expressed as combinations of asymmetric functions - e.g. things like count liberties northward - count liberties westward - count liberties eastward - count liberties southward - add all together. All four operations individually are asymmetric, since each one only counts in a specific direction, but their combination is symmetric. If you enforce literal symmetry at every internal layer, you prevent the neural net from doing things like this - "equivariance" what we call it when you still allow the net to do this, where north,west,east,south can all individually compute asymmetric things temporarily, even over many layers, but are symmetric to *each other*.
Now that you bring up counting liberties, I have wondered how the networks learn to do so. I kind of doubt whether they learn a counting operation. And even if they did, how do they learn how to order the counting? After all, there are 120 ways to count 5 dame.
Ordering the counting is necessary for efficiency. I wondered if they count somewhat like parrots do. Parrots can count to 6 in the sense that they can distinguish between there being 5 objects in a small enough space and there being 6 objects there, but the difference between 6 objects and 7 objects gives them trouble. I have imagined that the bots gradually learn to distinguish between N and N-1 dame as N increases.