Bots trained for possibility of ties?

dfan · Post by **dfan** » Sun Dec 15, 2019 11:26 am

Bill Spight wrote:
dfan wrote:
Bill Spight wrote:One thing I have suggested for go is a two game match decided on total points. If the games are played without knowledge of the result of the other game, then we can reinforce the decisions that led to the greater win in one game, and the lesser loss in the other.
Yes. In fact, you can play O(n^2) "virtual two-game matches" with only n games if the games are played without knowledge of the other game's result; pretend games 1 and 2 are a match, games 1 and 3, games 1 and 4, etc. The "value" of a game result ends up being what fraction of the other game results it is superior to, which for those with a probability background is known as a cumulative distribution function. I have a paper about this that should go up on arXiv.org this coming week. (It learned well, but I only tried it on a simple game.)
However, to satisfy the requirement of switching sides, it should be the odd numbered games vs. the even numbered games, assuming that's how you number them. Then you get some number of virtual matches less than N*N/4, since you eliminate virtual ties when deciding which player is better.

In my setup, the games are all training games, where both players are the same bot, so the games really are all comparable. (For example, I can pretend I was White in game n and Black in game m, which really means that I "win" if my performance as White in game n is better than a clone's performance as White in game m, and this can be done for any value of m /= n.)

Bill Spight · Post by **Bill Spight** » Sun Dec 15, 2019 12:52 pm

Bill Spight wrote:
dfan wrote:
Bill Spight wrote:One thing I have suggested for go is a two game match decided on total points. If the games are played without knowledge of the result of the other game, then we can reinforce the decisions that led to the greater win in one game, and the lesser loss in the other.
Yes. In fact, you can play O(n^2) "virtual two-game matches" with only n games if the games are played without knowledge of the other game's result; pretend games 1 and 2 are a match, games 1 and 3, games 1 and 4, etc. The "value" of a game result ends up being what fraction of the other game results it is superior to, which for those with a probability background is known as a cumulative distribution function. I have a paper about this that should go up on arXiv.org this coming week. (It learned well, but I only tried it on a simple game.)
However, to satisfy the requirement of switching sides, it should be the odd numbered games vs. the even numbered games, assuming that's how you number them. Then you get some number of virtual matches less than N*N/4, since you eliminate virtual ties when deciding which player is better.

dfan wrote:In my setup, the games are all training games, where both players are the same bot, so the games really are all comparable. (For example, I can pretend I was White in game n and Black in game m, which really means that I "win" if my performance as White in game n is better than a clone's performance as White in game m, and this can be done for any value of m /= n.)

I have some questions, which your paper may answer. Depends on your audience, I suppose.

Anyway, rank statistics are nice for a Bayesian approach.

dfan · Post by **dfan** » Mon Dec 16, 2019 6:38 pm

dfan wrote:
Bill Spight wrote:One thing I have suggested for go is a two game match decided on total points. If the games are played without knowledge of the result of the other game, then we can reinforce the decisions that led to the greater win in one game, and the lesser loss in the other.
Yes. In fact, you can play O(n^2) "virtual two-game matches" with only n games if the games are played without knowledge of the other game's result; pretend games 1 and 2 are a match, games 1 and 3, games 1 and 4, etc. The "value" of a game result ends up being what fraction of the other game results it is superior to, which for those with a probability background is known as a cumulative distribution function. I have a paper about this that should go up on arXiv.org this coming week. (It learned well, but I only tried it on a simple game.)

Here it is: Self-Play Learning Without a Reward Metric. The presentation in the paper starts with CDF-based rewards and then derives the virtual-match approach from it, but in fact the history of the idea is the other way around; we started with two-game matches as you describe (independent evolution; if you mentioned it here I didn't see it) and then realized that CDF-based rewards modeled the same thing in the end and converged much more quickly.

Life In 19x19

Bots trained for possibility of ties?

Re: Bots trained for possibility of ties?

Re: Bots trained for possibility of ties?

Re: Bots trained for possibility of ties?