Question on Training

Javaness2 · #1

I was reading on reddit about somebody who created a variation of AQ which he was trying to train to solve life and death
Is it possible to train to bots to different ends than being the best. It's a childish thought I know, but could I train a bot to simply emulate a specific player?

I thought this might be vaguely useful. If there is a certain area of Go I suck in, less than others, I could use such a bot to rectify, or at least try to rectify, such weakness.

WindCaliber · #2

That's basically what DeepMind was doing initially with supervised learning. You'll hear Demis Hassabis talking often during the pre-Master era of AlphaGo about how likely AlphaGo predicted certain moves would be played by humans (recall the 1 in 10000 for LSD's move 78).

The problem is that there almost certainly would not be enough training data from a single player. For example, in their initial paper, DeepMind stated that they trained AlphaGo from 30 million KGS positions. If we assume 250 moves per game, that comes out to be 150,000 games. Compare this to someone like Cho Chikun, who has played "merely" 2000+ games. After training that long, AlphaGo was only able to predict the move 55.7% of the time(which is relatively good).

So theoretically you could train a bot to emulate a player, but practically, you can't. That's the reason DeepMind chose to train on KGS amateur games, because there aren't enough professional games to train from. Even if you did train it to emulate a player, you'd still want it to give it the goal of winning though, to make sure that you're not learning "bad" moves, I would think.

Bill Spight · #3

Javaness2 wrote:

I was reading on reddit about somebody who created a variation of AQ which he was trying to train to solve life and death
Is it possible to train to bots to different ends than being the best.

Well they have to be good, if not the best, at something. The main problem I see with a bot learning life and death problems is that there are not very many of them. You could train a bot to solve every life and death problem in existence, and it might still mess up a new life and death problem. There are ways around that, for instance, by using some of the existing problems for training and some for testing, and aiming to minimize the errors on the test material. Another idea is to train two bots, one to solve life and death problems and one to create them; the two bots would compete with each other. OC, you would have to be able to check both the solution and the problem. If the creator bot created an impossible problem, then it should lose. Fortunately, checking is the easier task. But you would have a checker bot. :cry:

Quote:

It's a childish thought I know, but could I train a bot to simply emulate a specific player?

Sure. Computers emulated the styles of specific music composers back in the 1970s. It is best to emulate a dead player, so that the player cannot say that they would never make such a play.

Quote:

I thought this might be vaguely useful. If there is a certain area of Go I suck in, less than others, I could use such a bot to rectify, or at least try to rectify, such weakness.

Having a bot learn to beat you is much, much easier than having it learn to beat Lee Sedol.

I don't know how easy it would be to train it on a laptop, but even if it took 40 days, that would not be so bad, would it? It would be useful for your own training, as it would target your specific weaknesses. You might train more than one bot, because different bots might pose different challenges.

Kirby · #4

You can train a computer to emulate a particular style, but there are two factors that are pretty important:

1.) Model should be sufficient to describe the pattern.
2.) You need enough data to find a pattern.

Without the right balance here, you’ll either overfit or underfit the pattern, and your algorithm will make bad predictions.

Bill Spight · #5

Kirby wrote:

You can train a computer to emulate a particular style, but there are two factors that are pretty important:

1.) Model should be sufficient to describe the pattern.
2.) You need enough data to find a pattern.

Without the right balance here, you’ll either overfit or underfit the pattern, and your algorithm will make bad predictions.

A long time ago I wrote a program that simulated the "Happy Birthday" song. You could definitely recognize "Happy Birthday" in its tunes, but, as a friend pointed out, its modulations were definitely weird. :lol:

A bot could certainly learn to emulate Cho Chikun, but it might be 20 kyu.

Javaness2 · #6

I hope that data set length wouldn't be a limiting factor there. It is, after all, already possible to classify the general style of a player. So if I have a whopping great database of sgfs, I should be able to combine together players with similar styles.

Commercially, it seems an interesting pathway to follow. Bruce Wilcox had a program that proclaimed to play with different styles.

Uberdude · #7

I like to think AlphaGo's habit of wasting ko threats and going on tilt with stupid sentes before rage quitting when losing is the KGS style from initial training shining through ;-)

Bill Spight · #8

Javaness2 wrote:

I hope that data set length wouldn't be a limiting factor there. It is, after all, already possible to classify the general style of a player. So if I have a whopping great database of sgfs, I should be able to combine together players with similar styles.

Commercially, it seems an interesting pathway to follow. Bruce Wilcox had a program that proclaimed to play with different styles.

Defining a style is not all that obvious, but if you can make a style database then you can do the initial training on that database, and trust that when the bot trains by self play, the best aspects of the style will remain.

You can also reward plays that fit a certain style. That effectively alters the scoring, but if doing so makes a difference of only a few points per game, or is followed up by self play without the style rewards, you can still get a strong player.

Kirby · #9

To have a bot that played like a particular person wouldn't necessarily even need self play or reinforcement learning, given enough games. You could just have a neural network trained on a bunch of that player's games to predict the next move from a given board position. That's pretty much how the policy network started out, just that the dataset wasn't limited to a single person. Didn't they end up getting something like 60% accuracy in guessing the next move in a given strong player's game?

Someone like 'TheCaptain' on KGS comes to mind, who has a distinct playing style. Maybe if you had thousands of his games, you could train a neural network to predict where TheCaptain will play from a given board position.

That alone probably won't be a very skilled bot - didn't DeepMind say that the neural network used for the initial policy network ended up being like a low dan player or something?

Anyway, good luck.

Bill Spight · **#10**

Hmmm. I thought that training a bot to guess the next move, whether of a single player or in general, involved "rewarding" correct guesses. (OC, it's inanimate, so no actual reward is given.

)

John Fairbairn · **#11**

Most players change their style at least once during a career. It's part of how they try to improve. Some follow fashion for similar reasons - to see if it works - and some try to play like their most successful opponents; to see what makes them tick.

And many players try to play differently according to whether it's a slow game or blitz.

What is classed as a player's style (a misnomer, surely) is his choice of moves in a very, very limited set of positions. The other 99% of the game he plays in more or less standard fashion.

You can arguably derive a set of generic styles, such as thickness oriented, territory based, but how do you tell the program what to do when the other side is being bloody minded and pre-empting your chosen style?

The so-called styles offered in chess and early go programs, such as "attacking" or "cautious" or "adventurous" were just mildly entertaining alternative ways of saying "stupid" or "haven't a clue".

And think about this: why does a pro player try to follow a style? He's not entertaining himself - it's because he thinks that's the best way to win. So if it has been demonstrated that there is a better way to win, why would anyone want to follow an inferior way?

Kirby · **#12**

Bill Spight wrote:

Hmmm. I thought that training a bot to guess the next move, whether of a single player or in general, involved "rewarding" correct guesses. (OC, it's inanimate, so no actual reward is given.

)

There were a couple of different techniques in play for the earlier versions of AlphaGo: reinforcement learning and neural networks. As I understand, the *first* version of the policy network was just a regular neural network, trained on a bunch of games from KGS. After that, I think they refined the policy network to become stronger by having it play against earlier versions of itself, and setting up the reward/reinforcement learning component. They were able to increase the strength of the policy network through self play. This is where AlphaGo really benefits from self-play - it learns from experience, and not from supervision (here are a bunch of games - find the pattern to give you a function that produces the next move).

So in a sense, the initial neural network gets "reward" through training when the output of the network produces a move that matches what the human played, but I think most of the time, the talk about rewards is in the context of reinforcement learning - which is the part where AlphaGo is learning from experience of playing itself.

Anyway, AlphaGo Zero eliminated the need for human games, so they kind of bypassed this "kickstart" phase, and just learned from experience of playing against itself.

I'm still wrapping my head around that impressive result.

Kirby · **#13**

John Fairbairn wrote:

You can arguably derive a set of generic styles, such as thickness oriented, territory based, but how do you tell the program what to do when the other side is being bloody minded and pre-empting your chosen style?

My simplistic view of this is as follows.

Let's say AlphaGo has evaluation function F, which gives a probability distribution of winning the game for various moves played at a selected board position. Because F was trained in a way that we can't really break down, it's hard to really interpret how AlphaGo produces this evaluation function.

But let's say, thinking of things in human terms, that given a board position, concepts that humans have such as thickness, number of liberties, strength/weakness of groups, etc., play a role in impacting the function F. Theoretically, if we understood that black magic that happens behind constructing F, the function could be tweaked such that, say, strength and weakness of groups, is weighted more heavily in impacting F. In doing so, we'd produce an evaluation function that would not be optimized toward winning, as the original function was, but rather it'd be biased toward favoring variables that impact the strength and weakness of groups.

The problem is, F is constructed primarily through self-play. So we can't break down F in a meaningful way in order to tweak these parameters. The machine learning algorithm came up with the function, and we kind of just have to accept it for what it is.

However, as I was discussing with Bill, we could produce a different evaluation function G that doesn't learn through self-play. But rather, it'd learn by pattern recognition to predict a human player's move, after it's been trained on thousands of games (like the first version of the policy network). Even though we can't break down the process of constructing G, it is possible to train G from input features of the board and sample games.

Because G, in contrast to F, is purely a supervised learning problem, we can control how G is constructed by controlling the input data that it's trained on. This is because we can omit the self-play aspect. So theoretically, if we have thousands of games that match a particular style of a player Mr. X, we can produce a program that can predict the next move that Mr. X would likely play given a new board position.

There are issues with this, though:
1.) Like you say, players change style and play according to fashion. It'd be difficult to get a large set of games that accurately encompass what we'd like to mimic in terms of "style".
2.) Even with AlphaGo, the supervised component alone wasn't that strong. So you might be able to get a program that can kind of guess how a player would play with some amount of accuracy, but it'd still be wrong a lot of the time, and wouldn't be that strong.

Bill Spight · **#14**

John Fairbairn wrote:

And think about this: why does a pro player try to follow a style? He's not entertaining himself - it's because he thinks that's the best way to win. So if it has been demonstrated that there is a better way to win, why would anyone want to follow an inferior way?

Emphasis mine.

That's a big if!

Bill Spight · **#15**

Kirby wrote:

John Fairbairn wrote:

You can arguably derive a set of generic styles, such as thickness oriented, territory based, but how do you tell the program what to do when the other side is being bloody minded and pre-empting your chosen style?

Because G, in contrast to F, is purely a supervised learning problem, we can control how G is constructed by controlling the input data that it's trained on. This is because we can omit the self-play aspect. So theoretically, if we have thousands of games that match a particular style of a player Mr. X, we can produce a program that can predict the next move that Mr. X would likely play given a new board position.

And we can add the self play aspect later. The program will improve, but will probably still retain stylistic features that it learned before. The only real problem I see is the possibility of a local optimum related to the original style. For instance, if the original style is one of incrementally adding secure territory, the network might get stuck in that neighborhood for a long time. You can see this kind of thing with humans. A lot of weak players stay weak because they actually prefer bad plays. And if they do happen to play a good play, they may end up in unfamiliar territory where they do even worse than they usually do.

For one's own training, I think it might be interesting to have two bots. One which has learned to play like you do, and one which has learned to play against the first one, but is not much stronger than you are.

Kirby · **#16**

Bill Spight wrote:

A lot of weak players stay weak because they actually prefer bad plays. And if they do happen to play a good play, they may end up in unfamiliar territory where they do even worse than they usually do.

For sure - I do this a lot! To be sure, I misread and get surprised, but often my evaluation is imbalanced.

Question on Training

Who is online