John Fairbairn wrote:
You can arguably derive a set of generic styles, such as thickness oriented, territory based, but how do you tell the program what to do when the other side is being bloody minded and pre-empting your chosen style?
My simplistic view of this is as follows.
Let's say AlphaGo has evaluation function F, which gives a probability distribution of winning the game for various moves played at a selected board position. Because F was trained in a way that we can't really break down, it's hard to really interpret how AlphaGo produces this evaluation function.
But let's say, thinking of things in human terms, that given a board position, concepts that humans have such as thickness, number of liberties, strength/weakness of groups, etc., play a role in impacting the function F. Theoretically, if we understood that black magic that happens behind constructing F, the function could be tweaked such that, say, strength and weakness of groups, is weighted more heavily in impacting F. In doing so, we'd produce an evaluation function that would not be optimized toward winning, as the original function was, but rather it'd be biased toward favoring variables that impact the strength and weakness of groups.
The problem is, F is constructed primarily through self-play. So we can't break down F in a meaningful way in order to tweak these parameters. The machine learning algorithm came up with the function, and we kind of just have to accept it for what it is.
However, as I was discussing with Bill, we could produce a different evaluation function G that doesn't learn through self-play. But rather, it'd learn by pattern recognition to predict a human player's move, after it's been trained on thousands of games (like the first version of the policy network). Even though we can't break down the process of constructing G, it is possible to train G from input features of the board and sample games.
Because G, in contrast to F, is purely a supervised learning problem, we can control how G is constructed by controlling the input data that it's trained on. This is because we can omit the self-play aspect. So theoretically, if we have thousands of games that match a particular style of a player Mr. X, we can produce a program that can predict the next move that Mr. X would likely play given a new board position.
There are issues with this, though:
1.) Like you say, players change style and play according to fashion. It'd be difficult to get a large set of games that accurately encompass what we'd like to mimic in terms of "style".
2.) Even with AlphaGo, the supervised component alone wasn't that strong. So you might be able to get a program that can kind of guess how a player would play with some amount of accuracy, but it'd still be wrong a lot of the time, and wouldn't be that strong.