Developing weak AIs in KaTrain

sanderl · #1

Hello everyone,

I’m the developer of KaTrain, a program that uses KataGo and Lizzie-like features to allow you to play against AIs and get feedback on the moves you make, either immediately, or after playing.
In addition to being part of the program, the AI strategies run as bots on OGS, having played around 6000 games over the last 3 weeks.

I’ve not been an active member here, but know this place for it’s in-depth discussions. So rather than just drive by, drop an advertisement and leave, I thought I’d take the opportunity to start a discussion on how to make modern AIs play at kyu level.
Ideas on how to make them nicer to play against are much appreciated -- I’m sure some of the things I’m doing were invented a long time ago.

KataGo Engine based AIs:
The default engine is just insanely strong, even with a weaker network and limited playouts, but it's output can be used in interesting ways.

'Balance' AI
This AI has a set goal of winning by ~2 points, and makes moves that lose at most some amount of points as long as it’s lead stays above that. It tends to win by more against kyu players like me, since it cannot find moves bad enough quickly enough to balance.
The adaptiveness combined with the set outcome makes this less than ideal to play against, and I've not deployed it as a bot.

'ScoreLoss' AI
The latest one made, based on limitations of the ‘Pick’ strategy described below, and currently running on ogs as this player:
It makes moves with probability ~e^(-pointsLost * strength).
This seems to play a nice varied style with some strange joseki choices and occasional larger mistakes, but not missing really urgent moves and being extremely good at life&death.
It’s main weakness is endgame, where 1 point is a much larger mistake to make. I’m considering adding a setting to increase strength in the endgame to prevent this.

Example game (strength=0.5)
https://online-go.com/game/23904595

Policy network based AIs
A very common bot strategy also known as ‘1 playout’ - the move chosen is simply the one proposed by the policy network. This plays around 4d OGS and is very cheap to run.
I recently added returning this information to the JSON analysis engine of KataGo, which allowed for some more manipulation of this information to weaken it.

Policy-Pick moves AI
This AI was the first one developed with the goal of making a weaker AI.
It chooses a +b * <number of legal moves> at random, and plays the one with the highest policy value.
In addition, it has an override value to just play the top move if the policy value is high enough. Regardless, the best move from this random selection is really bad often enough that it can appear strange.

Example game (a=5, b=0.33, override=0.95)
https://online-go.com/game/23911871

Variants of this AI: Influence, Territory, Local, Tenuki
Given this framework, it’s easy to bias the moves picked towards some area, giving rise to 4 bots with Influential style, territorial style, local style, tenuki style.
Of these, only local style is stronger. Dropping the style in the endgame is necessary for tenuki in particular to not get atarid to death by a player exploiting it, and for influence/territory to close up the boundaries.

Example game (local, a=15,b=0,override=0.95,stddev=1.5 i.e. really close moves)
https://online-go.com/game/23910309

Example game (local, a=15,b=0.4,override=0.95,threshold=3.5,weight=10, endgame=0.4 i.e. hard penalty on moves under the 3.5th line,start endgame mode when board is 40% full)
https://online-go.com/game/23892536

'Policy-Weighted' AI
Another fairly simple idea: take the policy values p, and play the move with probability ~p^(weaken_fac) among moves where p>threshold. The result is a nice varied style without doing too many crazy things, and is probably my favourite to play against.

Example game (threshold = 0.001, weaken_fac=1.25)
https://online-go.com/game/23910134

General issues

Human mistakes are difficult to quantify in terms of points - missing L&D in a corner is a very human thing, but ignoring atari on 20 stones is not, even though the point loss could be identical.
Likewise policy value is not a guarantee for urgency - responding with one of two urgent moves to save a corner means the max. value is around 0.5, while playing a certain joseki can easily take you to >0.8 even though there are plenty of other reasonable options.
Knowing when to pass is hard! All the bots currently have several overrides for passing earlier than they would otherwise. Particularly if a human passes 3 times in a row we trust them.
Introducing parameters that a user can adjust and understand is another challenge, increasing ‘strength’ is intuitive, many others less so.

Hope to get some interesting ideas here for further improvements.

Bill Spight · #2

With KataGo, I have wondered if you could use its komi setting ability to give a good match to humans. I don't know if this would work. Quien sabe?

After the human has made their play, evaluate the position with no search at all, in terms of the expected score. Set the komi accordingly, so that the expected score would be approximately 0. Then pick a play, maybe with a few hundred rollouts.

sanderl · #3

Bill Spight wrote:

With KataGo, I have wondered if you could use its komi setting ability to give a good match to humans. I don't know if this would work. Quien sabe?

After the human has made their play, evaluate the position with no search at all, in terms of the expected score. Set the komi accordingly, so that the expected score would be approximately 0. Then pick a play, maybe with a few hundred rollouts.

The analysis engine does allow you to set komi for each evaluation separately.
However, I think the suggested strategy is essentially equivalent to valuing score very highly (since win rate will always be ~50%, it will think any point matters).
This will make it play slightly suboptimal from an AI vs AI perspective - since it will play more aggressively - but also crush any human player below professional level.

I wonder what the opposite will do though, setting utility to be win rate based and simply making it think it's ahead by 100 points all the time. :lol:

jlt · #4

Maybe the following could produce interesting results.

Among all moves with policy value p > threshold1, let A be the set of moves that lose less than 2 points compared to the best move, and B the set of moves that lose between 2 and 5 points compared to the the best move.

If the AI is ahead by more than threshold2, and if B is nonempty, pick a random move (or the worst move, or the best move?) in B.

In all other cases, pick a random move (or the worst move, or the best move?) in A.

Bill Spight · #5

sanderl wrote:

Bill Spight wrote:

With KataGo, I have wondered if you could use its komi setting ability to give a good match to humans. I don't know if this would work. Quien sabe?

After the human has made their play, evaluate the position with no search at all, in terms of the expected score. Set the komi accordingly, so that the expected score would be approximately 0. Then pick a play, maybe with a few hundred rollouts.

The analysis engine does allow you to set komi for each evaluation separately.
However, I think the suggested strategy is essentially equivalent to valuing score very highly (since win rate will always be ~50%, it will think any point matters).
This will make it play slightly suboptimal from an AI vs AI perspective - since it will play more aggressively - but also crush any human player below professional level.

Right. It's still trained to win, not to tie.

Bill Spight · #6

Ah! Now I remember what I was thinking about.

The idea was not to weaken the bot, but actually to change the komi. OC, the bot changes it internally, so it doesn't play crazy.

So the opponent makes a mistake, according to the bot's initial evaluation. The bot changes the komi accordingly, which gives that message to the human. Immediate feedback.

Furthermore, the feedback is in a form that makes sense to the human, namely points of territory or area. Percentages could serve as a pat on the back — or bottom, depending

— but don't make as much sense to the human. In addition, if the human actually estimates the territory, the komi figured by the bot can give them a number to check their estimation against, in real time.

Also, you could report the komi rounded to the nearest integer, or maybe rounded up or down to a fractional komi, such that if the human plays as well as the bot from that point on, they are expected to win. Small reward.

But small changes, such as a few percentage points, which change the estimated komi by a small fraction of a point, would not be reported to the human, since, again, the human hardly knows what to make of them in real time. During play such small changes would be a distraction.

Edit: Also, a bot that is not weakened could serve as a model for the human, while a model that was weakened to human strength could possibly promote the formation of bad habits.

jlt · #7

A bot which is not weakened will bully kyu players too much. The difficulty is to build a weakened bot whose moves are not too strange or too bad, and which doesn't always make the same kinds of mistakes.

Other suggestion that might or might not work: if winrate(best move)-winrate(second best move) < threshold, and if the bot is behind, choose the second best move; otherwise, choose the best move.

If the bot is still too strong, try an algorithm like:

if (winrate(best move)-winrate(third best move) < threshold and the bot is behind), choose the third best move;
else if (winrate(best move)-winrate(second best move) < threshold and the bot is behind), choose the second best move;
else choose the best move.

sanderl · #8

Bill Spight wrote:

Ah! Now I remember what I was thinking about.
The idea was not to weaken the bot, but actually to change the komi. OC, the bot changes it internally, so it doesn't play crazy.

Ah, but KataGo tends to not play crazy at all, since scoreLead is part of its utility.

sanderl · #9

jlt wrote:

Maybe the following could produce interesting results.

Among all moves with policy value p > threshold1, let A be the set of moves that lose less than 2 points compared to the best move, and B the set of moves that lose between 2 and 5 points compared to the the best move.

If the AI is ahead by more than threshold2, and if B is nonempty, pick a random move (or the worst move, or the best move?) in B.

In all other cases, pick a random move (or the worst move, or the best move?) in A.

The problem with this is that you have

policy for the entire board
point/win rate for the engine's top move according to MCTS, increasingly inaccurate for less favoured moves.

Getting the point value for ~100 moves means 100 queries, which is a tad slow for smooth and interactive play.

The 'balance' option does something to this effect though. Try it out!

Bill Spight · **#10**

sanderl wrote:

Bill Spight wrote:

Ah! Now I remember what I was thinking about.
The idea was not to weaken the bot, but actually to change the komi. OC, the bot changes it internally, so it doesn't play crazy.

Ah, but KataGo tends to not play crazy at all, since scoreLead is part of its utility.

Change the komi for the human, but not the bot?

jlt · **#11**

sanderl wrote:

The 'balance' option does something to this effect though. Try it out!

I just tried it a couple of times, I find the bot still pretty strong, each time it beat me by more than 35 points (I am 4-5 kyu on OGS), but the games feel more playable than against Katago. On the other hand, it sometimes plays strange moves when ahead. It's hard for a bot to find a move that is bad enough, and yet feels natural!

sanderl · **#12**

jlt wrote:

sanderl wrote:

The 'balance' option does something to this effect though. Try it out!

I just tried it a couple of times, I find the bot still pretty strong, each time it beat me by more than 35 points (I am 4-5 kyu on OGS), but the games feel more playable than against Katago. On the other hand, it sometimes plays strange moves when ahead. It's hard for a bot to find a move that is bad enough, and yet feels natural!

Indeed, same thing happens to me (~7k OGS). Try 'P:weighted' and see how it feels. I find that I can beat that one with teach/undo support, but it beats me without it.

sanderl · **#13**

Bill Spight wrote:

sanderl wrote:

Bill Spight wrote:

Ah! Now I remember what I was thinking about.
The idea was not to weaken the bot, but actually to change the komi. OC, the bot changes it internally, so it doesn't play crazy.

Ah, but KataGo tends to not play crazy at all, since scoreLead is part of its utility.

Change the komi for the human, but not the bot?

I am still not sure what you think changing komi does to these bots. :-?

ez4u · **#14**

sanderl wrote:

Hello everyone,

I’m the developer of KaTrain, a program that uses KataGo and Lizzie-like features to allow you to play against AIs and get feedback on the moves you make, either immediately, or after playing.
In addition to being part of the program, the AI strategies run as bots on OGS, having played around 6000 games over the last 3 weeks.

I’ve not been an active member here, but know this place for it’s in-depth discussions. So rather than just drive by, drop an advertisement and leave, I thought I’d take the opportunity to start a discussion on how to make modern AIs play at kyu level.
Ideas on how to make them nicer to play against are much appreciated -- I’m sure some of the things I’m doing were invented a long time ago.
...

Hope to get some interesting ideas here for further improvements.

This is such a great idea! :clap:

Keep at it. I have no advice to offer off the top of my head, but I can't help but think this is one of the really necessary steps for the average player to benefit from the AI revolution. :tmbup:

Bill Spight · **#15**

Bill Spight wrote:

sanderl wrote:

Bill Spight wrote:

Ah! Now I remember what I was thinking about.
The idea was not to weaken the bot, but actually to change the komi. OC, the bot changes it internally, so it doesn't play crazy.

Ah, but KataGo tends to not play crazy at all, since scoreLead is part of its utility.

Change the komi for the human, but not the bot?

sanderl wrote:

I am still not sure what you think changing komi does to these bots. :-?

As I tried to clarify, my idea of changing the komi at each turn is for the human's benefit. As you pointed out, it does not have much effect on the bot. Except, I think, to make sure that the bot knows about the conditions it faces. I.e., it will always be aware that it faces a close game, and will play accordingly.

jann · **#16**

Changing komi seems similar to simply printing the bot's evaluation (incl. expected score) each turn - same feedback.

I recall my brother several years ago, who played a few games with a chess engine for fun. He said afterwards that he won't do this again, because of those evaluation numbers that were printed after each move - and which kept monotonely changing to the program's favor regardless of which move he played.

jlt · **#17**

sanderl wrote:

Try 'P:weighted' and see how it feels.

I just played a couple of quick games. In the first one I resigned quickly after losing a group. I won the second game by 18.5. The game felt pretty natural for about 150 moves, but the AI started playing strange moves when it was behind, wasted ko threats although a ko was going on, and didn't see immediate threats at the late endgame.

Bill Spight · **#18**

jann wrote:

Changing komi seems similar to simply printing the bot's evaluation (incl. expected score) each turn - same feedback.

Well, yes, it is.

However. . . .

Quote:

I recall my brother several years ago, who played a few games with a chess engine for fun. He said afterwards that he won't do this again, because of those evaluation numbers that were printed after each move - and which kept monotonely changing to the program's favor regardless of which move he played.

Unlike with chess, changing the komi doesn't simply give the human feedback, it also changes the chances of winning the game, in the human's favor.

Unless the human is very, very, very strong.

Bill Spight · **#19**

OK, here is an idea for a weak AI of SDK strength.

Train a policy network on 10 kyu games. But use the bot's regular MCTS and value network for search and evaluation of leaf nodes. The bot will choose candidate plays like weak humans, but evaluate them well. Depending upon the amount of search, that should produce a bot that plays at SDK levels.

sanderl · **#20**

Bill Spight wrote:

OK, here is an idea for a weak AI of SDK strength.

Train a policy network on 10 kyu games. But use the bot's regular MCTS and value network for search and evaluation of leaf nodes. The bot will choose candidate plays like weak humans, but evaluate them well. Depending upon the amount of search, that should produce a bot that plays at SDK levels.

That would certainly work, I think lightvector trained a net once that could predict moves at all levels. Training such a net would take more time though!
A similar approach is using a 6b net from the middle of the training run, I'm waiting for the training run from katago to be fully published before trying that, but some from the old run look promising. They're not quite human errors though.

Developing weak AIs in KaTrain

Who is online