In my setup, the games are all training games, where both players are the same bot, so the games really are all comparable. (For example, I can pretend I was White in game n and Black in game m, which really means that I "win" if my performance as White in game n is better than a clone's performance as White in game m, and this can be done for any value of m /= n.)Bill Spight wrote:However, to satisfy the requirement of switching sides, it should be the odd numbered games vs. the even numbered games, assuming that's how you number them.dfan wrote:Yes. In fact, you can play O(n^2) "virtual two-game matches" with only n games if the games are played without knowledge of the other game's result; pretend games 1 and 2 are a match, games 1 and 3, games 1 and 4, etc. The "value" of a game result ends up being what fraction of the other game results it is superior to, which for those with a probability background is known as a cumulative distribution function. I have a paper about this that should go up on arXiv.org this coming week. (It learned well, but I only tried it on a simple game.)Bill Spight wrote:One thing I have suggested for go is a two game match decided on total points. If the games are played without knowledge of the result of the other game, then we can reinforce the decisions that led to the greater win in one game, and the lesser loss in the other.Then you get some number of virtual matches less than N*N/4, since you eliminate virtual ties when deciding which player is better.
Bots trained for possibility of ties?
-
dfan
- Gosei
- Posts: 1598
- Joined: Wed Apr 21, 2010 8:49 am
- Rank: AGA 2k Fox 3d
- GD Posts: 61
- KGS: dfan
- Has thanked: 891 times
- Been thanked: 534 times
- Contact:
Re: Bots trained for possibility of ties?
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: Bots trained for possibility of ties?
Bill Spight wrote:However, to satisfy the requirement of switching sides, it should be the odd numbered games vs. the even numbered games, assuming that's how you number them.dfan wrote:Yes. In fact, you can play O(n^2) "virtual two-game matches" with only n games if the games are played without knowledge of the other game's result; pretend games 1 and 2 are a match, games 1 and 3, games 1 and 4, etc. The "value" of a game result ends up being what fraction of the other game results it is superior to, which for those with a probability background is known as a cumulative distribution function. I have a paper about this that should go up on arXiv.org this coming week. (It learned well, but I only tried it on a simple game.)Bill Spight wrote:One thing I have suggested for go is a two game match decided on total points. If the games are played without knowledge of the result of the other game, then we can reinforce the decisions that led to the greater win in one game, and the lesser loss in the other.Then you get some number of virtual matches less than N*N/4, since you eliminate virtual ties when deciding which player is better.
I have some questions, which your paper may answer. Depends on your audience, I suppose.dfan wrote:In my setup, the games are all training games, where both players are the same bot, so the games really are all comparable. (For example, I can pretend I was White in game n and Black in game m, which really means that I "win" if my performance as White in game n is better than a clone's performance as White in game m, and this can be done for any value of m /= n.)
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
dfan
- Gosei
- Posts: 1598
- Joined: Wed Apr 21, 2010 8:49 am
- Rank: AGA 2k Fox 3d
- GD Posts: 61
- KGS: dfan
- Has thanked: 891 times
- Been thanked: 534 times
- Contact:
Re: Bots trained for possibility of ties?
Here it is: Self-Play Learning Without a Reward Metric. The presentation in the paper starts with CDF-based rewards and then derives the virtual-match approach from it, but in fact the history of the idea is the other way around; we started with two-game matches as you describe (independent evolution; if you mentioned it here I didn't see it) and then realized that CDF-based rewards modeled the same thing in the end and converged much more quickly.dfan wrote:Yes. In fact, you can play O(n^2) "virtual two-game matches" with only n games if the games are played without knowledge of the other game's result; pretend games 1 and 2 are a match, games 1 and 3, games 1 and 4, etc. The "value" of a game result ends up being what fraction of the other game results it is superior to, which for those with a probability background is known as a cumulative distribution function. I have a paper about this that should go up on arXiv.org this coming week. (It learned well, but I only tried it on a simple game.)Bill Spight wrote:One thing I have suggested for go is a two game match decided on total points. If the games are played without knowledge of the result of the other game, then we can reinforce the decisions that led to the greater win in one game, and the lesser loss in the other.