In the past year or two there's been some exciting research into understanding exactly what neural networks are doing. For example there's this
ongoing project to reverse engineer InceptionV1.
Essentially they're taking a trained model and thoroughly analyzing it, neuron by neuron, using techniques like feature visualization, and finally designing replacement neurons by hand.
I'm thinking about starting a similar project to analyze a Go-playing neural network. I'm not sure how much progress is realistic to expect, since it seems like an enormous task. However, this also seems like something that could give us a new understanding of Go. Rather than just trying to mimic AI, we would be able to fully describe the theory it's using to think about the game.
One early decision for this project would be which network to study. I think it makes sense to start small, e.g. with a 6-block network. I really like KataGo for its support of various rulesets, but its extra input features such as ladder status seem like they'd make feature visualization harder. It might make sense to go with some older model, e.g. from the LeelaZero project, or to train a custom KataGo model without these extra input features. Also, I vaguely recall someone working on creating compressed or sparse versions of KataGo networks. Working with a smaller or sparser model in this way may also reduce the amount of work needed to understand it.
Another question to consider is how to create the feature visualizations. For example we might start by optimizing positions in a continuous space where each intersection is some linear combination of black, white, and empty summing to 1. This could be visualized using stones colored in continuously varying shades of gray with continuously varying transparency as well. If these visualizations seem too nonsensical we could also try a version using discrete optimization.
I'm curious if people have any thoughts on the feasibility of such a project or any useful tips on how to get started.