Bots trained for possibility of ties?

Maharani · #1

Are any bots, such as KataGo, trained with ties as a possible outcome for games played with integer komi? I know that komi can be set to 7 in KataGo, but does KataGo actually understand what a tie is? How about other bots?

If the answer is currently no, I think it would be highly interesting to experiment with this.

lightvector · #2

KataGo models a tie as being half of a win and half of a loss (this is actually configurable though!), and behaves accordingly, and the winrate will reflect this.

I've had one user request an explicit modeling of the probability of a tie. I never got around to doing this, unfortunately, since it would be some work and some complexity to code to track this separately from just the winrate, so it's just folded into the winrate. But aside from not being able to explicitly visualize the predicted probability of a tie, it should be handled correctly.

Bill Spight · #3

Maharani wrote:

Are any bots, such as KataGo, trained with ties as a possible outcome for games played with integer komi? I know that komi can be set to 7 in KataGo, but does KataGo actually understand what a tie is? How about other bots?

If the answer is currently no, I think it would be highly interesting to experiment with this.

What do you do with ties during training? If you are trying to decide which program is stronger, then ignoring ties is preferable. But if you train for that, then the bot may not learn to prefer a tie to a loss. So it may be better to count ties, one way or other. Maybe the old program should get to count a tie as a win, I dunno.

Maharani · #4

Fascinating - thank you for the swift reply!

Yeah, would be very interesting to know what the probability of a tie is for an empty board with 7 komi, but I understand that this would be too complex to implement.

Bill Spight wrote:

Maybe the old program should get to count a tie as a win, I dunno.

The points you raised make sense to me, but this seems like a great solution

jann · #5

Bill Spight wrote:

What do you do with ties during training? If you are trying to decide which program is stronger, then ignoring ties is preferable.

I think ties just pull the value net for the given game towards 0, the correct output. I also doubt ignoring ties is preferable in any case (in test matches they should pull the strength diff towards 0 as well - otherwise if A wins 1 and ties 9 you would think it is much stronger).

Bill Spight · #6

jann wrote:

Bill Spight wrote:

What do you do with ties during training? If you are trying to decide which program is stronger, then ignoring ties is preferable.

I think ties just pull the value net for the given game towards 0, the correct output. I also doubt ignoring ties is preferable in any case (in test matches they should pull the strength diff towards 0 as well - otherwise if A wins 1 and ties 9 you would think it is much stronger).

If the question is which program is stronger you ignore ties. I learned that in my first research class.

How much stronger a program is is a different question.

As for what is correct for the value net I couldn't say.

emerus · #7

A new FineArt model might be trained for ties. It plays on FoxGo with 2 handicap and 0 komi.

Bill Spight · #8

As I was out today, I wondered about training chess engines, since draws are a large part of high level chess. I think I would use two game matches for self play, with each player playing Black in one game and White in the other. (Since the question is which program is better, we ignore ties of the two game match, where each player wins a game or both games are draws.) I expect that most decisive matches will be won by 1 point, one win and one tie. That illustrates the value of ties in two game matches. You don't have to win both games, you can tie one of them.

Whether such two game matches pull the value net in each game towards zero, I couldn't say.

----

Since emerus brings up Fine Art playing with two stones, I must say I also like the idea of two game matches, switching colors, for training handicap play.

jann · #9

Bill Spight wrote:

I think I would use two game matches for self play, with each player playing Black in one game and White in the other. (Since the question is which program is better

I'm not sure exactly what you mean here. Training and selfplay for NN bots serves two purposes: create targets for policy training (since search results are of higher quality than raw policy), and target data for value training (the eventual game outcome is a bit of info about the prospects of positions occurred in that particular game). How would twin matches fit here (other than two individual matches)?

Bill Spight · **#10**

jann wrote:

Bill Spight wrote:

I think I would use two game matches for self play, with each player playing Black in one game and White in the other. (Since the question is which program is better

I'm not sure exactly what you mean here. Training and selfplay for NN bots serves two purposes: create targets for policy training (since search results are of higher quality than raw policy), and target data for value training (the eventual game outcome is a bit of info about the prospects of positions occurred in that particular game). How would twin matches fit here (other than two individual matches)?

Well, first, I am asking a different question. Which bot is better? To answer that question, ties do not matter, since they give no information about which one is better. OC, self play is a slight misnomer, since the bots will differ to some extent. The two game match does make a tie in a single game desirable to some extent, as a bot can win the match with a tie and a win.

Speaking in general, you want to reinforce correct decisions. In fact, you want to reinforce better decisions, even if they are not correct, or not known to be correct. Winning a two game match is evidence of making better decisions, even when those decisions result in a tie in one of the games. Thus, a two game match can reinforce decisions that a single game would not.

Now, it is possible to reinforce decisions that do not lead to a win. For instance, in SOAR subgoals are created and decisions that lead to reaching a subgoal are reinforced. In playing go, a subgoal might be to read a ladder out to resolution. Reading the ladder out may not win a game, but the decisions made to read it out correctly may still be reinforced. Or a goal may be to predict the result of the game. Even if the game is lost, decisions that led to a correct prediction may still be reinforced. Another, related goal may be to predict the result of the two game match. This decision may be made with or without the knowledge of the result of the other game. How these decisions are reinforced is a matter of implementation.

Oh, I meant to mention. In a two game match it is possible to reinforce decisions that win one of the games, as well as the two game match. Whether that's a good idea or not is an empirical question. For instance, if the bots have a long series of two game matches where the first player wins, resulting in tied matches, do we really want to reinforce the decisions by the first player which led to those wins? My guess is that it may be better to reinforce decisions by the second player that lead to a tie in a single game.

Bill Spight · **#11**

One thing I have suggested for go is a two game match decided on total points. If the games are played without knowledge of the result of the other game, then we can reinforce the decisions that led to the greater win in one game, and the lesser loss in the other. Now, with KataGo we can set the komi, right? In that case say that Black wins a three stone game by 39 points sans komi. For the second game we could set the komi to 39 and reinforce decisions on that basis. Based upon empirical results, we could come up with a komi for the first game, as well.

lightvector · **#12**

And what makes you think that KataGo is not already doing some or all of these things? :razz:

Maharani · **#13**

lightvector wrote:

And what makes you think that KataGo is not already doing some or all of these things? :razz:

Please do elaborate... :3

dfan · **#14**

Bill Spight wrote:

One thing I have suggested for go is a two game match decided on total points. If the games are played without knowledge of the result of the other game, then we can reinforce the decisions that led to the greater win in one game, and the lesser loss in the other.

Yes. In fact, you can play O(n^2) "virtual two-game matches" with only n games if the games are played without knowledge of the other game's result; pretend games 1 and 2 are a match, games 1 and 3, games 1 and 4, etc. The "value" of a game result ends up being what fraction of the other game results it is superior to, which for those with a probability background is known as a cumulative distribution function. I have a paper about this that should go up on arXiv.org this coming week. (It learned well, but I only tried it on a simple game.)

Bill Spight · **#15**

dfan wrote:

Bill Spight wrote:

One thing I have suggested for go is a two game match decided on total points. If the games are played without knowledge of the result of the other game, then we can reinforce the decisions that led to the greater win in one game, and the lesser loss in the other.

Yes. In fact, you can play O(n^2) "virtual two-game matches" with only n games if the games are played without knowledge of the other game's result; pretend games 1 and 2 are a match, games 1 and 3, games 1 and 4, etc. The "value" of a game result ends up being what fraction of the other game results it is superior to, which for those with a probability background is known as a cumulative distribution function. I have a paper about this that should go up on arXiv.org this coming week. (It learned well, but I only tried it on a simple game.)

However, to satisfy the requirement of switching sides, it should be the odd numbered games vs. the even numbered games, assuming that's how you number them.

Then you get some number of virtual matches less than N*N/4, since you eliminate virtual ties when deciding which player is better.

dfan · **#16**

Bill Spight wrote:

dfan wrote:

Bill Spight wrote:

One thing I have suggested for go is a two game match decided on total points. If the games are played without knowledge of the result of the other game, then we can reinforce the decisions that led to the greater win in one game, and the lesser loss in the other.

Yes. In fact, you can play O(n^2) "virtual two-game matches" with only n games if the games are played without knowledge of the other game's result; pretend games 1 and 2 are a match, games 1 and 3, games 1 and 4, etc. The "value" of a game result ends up being what fraction of the other game results it is superior to, which for those with a probability background is known as a cumulative distribution function. I have a paper about this that should go up on arXiv.org this coming week. (It learned well, but I only tried it on a simple game.)

However, to satisfy the requirement of switching sides, it should be the odd numbered games vs. the even numbered games, assuming that's how you number them.

Then you get some number of virtual matches less than N*N/4, since you eliminate virtual ties when deciding which player is better.

In my setup, the games are all training games, where both players are the same bot, so the games really are all comparable. (For example, I can pretend I was White in game n and Black in game m, which really means that I "win" if my performance as White in game n is better than a clone's performance as White in game m, and this can be done for any value of m /= n.)

Bill Spight · **#17**

Bill Spight wrote:

dfan wrote:

Bill Spight wrote:

One thing I have suggested for go is a two game match decided on total points. If the games are played without knowledge of the result of the other game, then we can reinforce the decisions that led to the greater win in one game, and the lesser loss in the other.

Yes. In fact, you can play O(n^2) "virtual two-game matches" with only n games if the games are played without knowledge of the other game's result; pretend games 1 and 2 are a match, games 1 and 3, games 1 and 4, etc. The "value" of a game result ends up being what fraction of the other game results it is superior to, which for those with a probability background is known as a cumulative distribution function. I have a paper about this that should go up on arXiv.org this coming week. (It learned well, but I only tried it on a simple game.)

However, to satisfy the requirement of switching sides, it should be the odd numbered games vs. the even numbered games, assuming that's how you number them.

Then you get some number of virtual matches less than N*N/4, since you eliminate virtual ties when deciding which player is better.

dfan wrote:

In my setup, the games are all training games, where both players are the same bot, so the games really are all comparable. (For example, I can pretend I was White in game n and Black in game m, which really means that I "win" if my performance as White in game n is better than a clone's performance as White in game m, and this can be done for any value of m /= n.)

I have some questions, which your paper may answer. Depends on your audience, I suppose.

Anyway, rank statistics are nice for a Bayesian approach.

dfan · **#18**

dfan wrote:

Bill Spight wrote:

One thing I have suggested for go is a two game match decided on total points. If the games are played without knowledge of the result of the other game, then we can reinforce the decisions that led to the greater win in one game, and the lesser loss in the other.

Yes. In fact, you can play O(n^2) "virtual two-game matches" with only n games if the games are played without knowledge of the other game's result; pretend games 1 and 2 are a match, games 1 and 3, games 1 and 4, etc. The "value" of a game result ends up being what fraction of the other game results it is superior to, which for those with a probability background is known as a cumulative distribution function. I have a paper about this that should go up on arXiv.org this coming week. (It learned well, but I only tried it on a simple game.)

Here it is: Self-Play Learning Without a Reward Metric. The presentation in the paper starts with CDF-based rewards and then derives the virtual-match approach from it, but in fact the history of the idea is the other way around; we started with two-game matches as you describe (independent evolution; if you mentioned it here I didn't see it) and then realized that CDF-based rewards modeled the same thing in the end and converged much more quickly.

Bots trained for possibility of ties?

Who is online