Bots trained for possibility of ties?

For discussing go computing, software announcements, etc.
dfan
Gosei
Posts: 1598
Joined: Wed Apr 21, 2010 8:49 am
Rank: AGA 2k Fox 3d
GD Posts: 61
KGS: dfan
Has thanked: 891 times
Been thanked: 534 times
Contact:

Re: Bots trained for possibility of ties?

Post by dfan »

Bill Spight wrote:
dfan wrote:
Bill Spight wrote:One thing I have suggested for go is a two game match decided on total points. If the games are played without knowledge of the result of the other game, then we can reinforce the decisions that led to the greater win in one game, and the lesser loss in the other.
Yes. In fact, you can play O(n^2) "virtual two-game matches" with only n games if the games are played without knowledge of the other game's result; pretend games 1 and 2 are a match, games 1 and 3, games 1 and 4, etc. The "value" of a game result ends up being what fraction of the other game results it is superior to, which for those with a probability background is known as a cumulative distribution function. I have a paper about this that should go up on arXiv.org this coming week. (It learned well, but I only tried it on a simple game.)
However, to satisfy the requirement of switching sides, it should be the odd numbered games vs. the even numbered games, assuming that's how you number them. :) Then you get some number of virtual matches less than N*N/4, since you eliminate virtual ties when deciding which player is better.
In my setup, the games are all training games, where both players are the same bot, so the games really are all comparable. (For example, I can pretend I was White in game n and Black in game m, which really means that I "win" if my performance as White in game n is better than a clone's performance as White in game m, and this can be done for any value of m /= n.)
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: Bots trained for possibility of ties?

Post by Bill Spight »

Bill Spight wrote:
dfan wrote:
Bill Spight wrote:One thing I have suggested for go is a two game match decided on total points. If the games are played without knowledge of the result of the other game, then we can reinforce the decisions that led to the greater win in one game, and the lesser loss in the other.
Yes. In fact, you can play O(n^2) "virtual two-game matches" with only n games if the games are played without knowledge of the other game's result; pretend games 1 and 2 are a match, games 1 and 3, games 1 and 4, etc. The "value" of a game result ends up being what fraction of the other game results it is superior to, which for those with a probability background is known as a cumulative distribution function. I have a paper about this that should go up on arXiv.org this coming week. (It learned well, but I only tried it on a simple game.)
However, to satisfy the requirement of switching sides, it should be the odd numbered games vs. the even numbered games, assuming that's how you number them. :) Then you get some number of virtual matches less than N*N/4, since you eliminate virtual ties when deciding which player is better.
dfan wrote:In my setup, the games are all training games, where both players are the same bot, so the games really are all comparable. (For example, I can pretend I was White in game n and Black in game m, which really means that I "win" if my performance as White in game n is better than a clone's performance as White in game m, and this can be done for any value of m /= n.)
I have some questions, which your paper may answer. Depends on your audience, I suppose. :) Anyway, rank statistics are nice for a Bayesian approach.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
dfan
Gosei
Posts: 1598
Joined: Wed Apr 21, 2010 8:49 am
Rank: AGA 2k Fox 3d
GD Posts: 61
KGS: dfan
Has thanked: 891 times
Been thanked: 534 times
Contact:

Re: Bots trained for possibility of ties?

Post by dfan »

dfan wrote:
Bill Spight wrote:One thing I have suggested for go is a two game match decided on total points. If the games are played without knowledge of the result of the other game, then we can reinforce the decisions that led to the greater win in one game, and the lesser loss in the other.
Yes. In fact, you can play O(n^2) "virtual two-game matches" with only n games if the games are played without knowledge of the other game's result; pretend games 1 and 2 are a match, games 1 and 3, games 1 and 4, etc. The "value" of a game result ends up being what fraction of the other game results it is superior to, which for those with a probability background is known as a cumulative distribution function. I have a paper about this that should go up on arXiv.org this coming week. (It learned well, but I only tried it on a simple game.)
Here it is: Self-Play Learning Without a Reward Metric. The presentation in the paper starts with CDF-based rewards and then derives the virtual-match approach from it, but in fact the history of the idea is the other way around; we started with two-game matches as you describe (independent evolution; if you mentioned it here I didn't see it) and then realized that CDF-based rewards modeled the same thing in the end and converged much more quickly.
Post Reply