Hello everyone,
I’m the developer of
KaTrain, a program that uses KataGo and Lizzie-like features to allow you to play against AIs and get feedback on the moves you make, either immediately, or after playing.
In addition to being part of the program, the AI strategies run as bots on OGS, having played around 6000 games over the last 3 weeks.
I’ve not been an active member here, but know this place for it’s in-depth discussions. So rather than just drive by, drop an advertisement and leave, I thought I’d take the opportunity to start a discussion on how to make modern AIs play at kyu level.
Ideas on how to make them nicer to play against are much appreciated -- I’m sure some of the things I’m doing were invented a long time ago.
KataGo Engine based AIs:The default engine is just insanely strong, even with a weaker network and limited playouts, but it's output can be used in interesting ways.
'Balance' AIThis AI has a set goal of winning by ~2 points, and makes moves that lose at most some amount of points as long as it’s lead stays above that. It tends to win by more against kyu players like me, since it cannot find moves bad enough quickly enough to balance.
The adaptiveness combined with the set outcome makes this less than ideal to play against, and I've not deployed it as a bot.
'ScoreLoss' AIThe latest one made, based on limitations of the ‘Pick’ strategy described below, and currently running on ogs as
this player:
It makes moves with probability ~e^(-pointsLost * strength).
This seems to play a nice varied style with some strange joseki choices and occasional larger mistakes, but not missing really urgent moves and being extremely good at life&death.
It’s main weakness is endgame, where 1 point is a much larger mistake to make. I’m considering adding a setting to increase strength in the endgame to prevent this.
Example game (strength=0.5)https://online-go.com/game/23904595
Policy network based AIsA very common bot strategy also known as ‘1 playout’ - the move chosen is simply the one proposed by the policy network. This plays around 4d OGS and is very cheap to run.
I recently added returning this information to the JSON analysis engine of KataGo, which allowed for some more manipulation of this information to weaken it.
Policy-Pick moves AIThis AI was the first one developed with the goal of making a weaker AI.
It chooses a +b * <number of legal moves> at random, and plays the one with the highest policy value.
In addition, it has an override value to just play the top move if the policy value is high enough. Regardless, the best move from this random selection is really bad often enough that it can appear strange.
Example game (a=5, b=0.33, override=0.95)https://online-go.com/game/23911871Variants of this AI: Influence, Territory, Local, TenukiGiven this framework, it’s easy to bias the moves picked towards some area, giving rise to 4 bots with Influential style, territorial style, local style, tenuki style.
Of these, only local style is stronger. Dropping the style in the endgame is necessary for tenuki in particular to not get atarid to death by a player exploiting it, and for influence/territory to close up the boundaries.
Example game (local, a=15,b=0,override=0.95,stddev=1.5 i.e. really close moves)https://online-go.com/game/23910309
Example game (local, a=15,b=0.4,override=0.95,threshold=3.5,weight=10, endgame=0.4 i.e. hard penalty on moves under the 3.5th line,start endgame mode when board is 40% full)https://online-go.com/game/23892536'Policy-Weighted' AIAnother fairly simple idea: take the policy values p, and play the move with probability ~p^(weaken_fac) among moves where p>threshold. The result is a nice varied style without doing too many crazy things, and is probably my favourite to play against.
Example game (threshold = 0.001, weaken_fac=1.25)https://online-go.com/game/23910134
General issues- Human mistakes are difficult to quantify in terms of points - missing L&D in a corner is a very human thing, but ignoring atari on 20 stones is not, even though the point loss could be identical.
- Likewise policy value is not a guarantee for urgency - responding with one of two urgent moves to save a corner means the max. value is around 0.5, while playing a certain joseki can easily take you to >0.8 even though there are plenty of other reasonable options.
- Knowing when to pass is hard! All the bots currently have several overrides for passing earlier than they would otherwise. Particularly if a human passes 3 times in a row we trust them.
- Introducing parameters that a user can adjust and understand is another challenge, increasing ‘strength’ is intuitive, many others less so.
Hope to get some interesting ideas here for further improvements.
