Can you explain/elaborate on this?Rémi wrote:This would be unlike KGS, where the best way to increase one's rating is to stop playing.
Rémi
Whole History Rating open source implementation.
-
hyperpape
- Tengen
- Posts: 4382
- Joined: Thu May 06, 2010 3:24 pm
- Rank: AGA 3k
- GD Posts: 65
- OGS: Hyperpape 4k
- Location: Caldas da Rainha, Portugal
- Has thanked: 499 times
- Been thanked: 727 times
Re: Whole History Rating open source implementation.
-
Rémi
- Lives with ko
- Posts: 170
- Joined: Sat Jan 14, 2012 4:11 pm
- Rank: KGS 4 kyu
- GD Posts: 0
- Has thanked: 32 times
- Been thanked: 119 times
- Contact:
Re: Whole History Rating open source implementation.
hyperpape wrote:Can you explain/elaborate on this?Rémi wrote:This would be unlike KGS, where the best way to increase one's rating is to stop playing.
Rémi
With the KGS rating system (not WHR), if you stop playing, your rating improves like your past opponents. So if you want your rating to make fast progress you can select opponents who are making fast progress, and then stop playing.
Rémi
-
yoyoma
- Lives in gote
- Posts: 653
- Joined: Mon Apr 19, 2010 8:45 pm
- GD Posts: 0
- Location: Austin, Texas, USA
- Has thanked: 54 times
- Been thanked: 213 times
Re: Whole History Rating open source implementation.
I found what look like some numerical stability problems. I had similar problems when I implemented this as well with Newton's method failing or oscillating. For example:
player vs anchor, 20 even games, 50% wins
180 days later:
player gives anchor 600 Elo advantage (maybe 3 stones, depends on your model), 50% wins
This seems to break the code. I ended up using a slower but more stable minorize-majorize algorithm.
player vs anchor, 20 even games, 50% wins
180 days later:
player gives anchor 600 Elo advantage (maybe 3 stones, depends on your model), 50% wins
This seems to break the code. I ended up using a slower but more stable minorize-majorize algorithm.
Code: Select all
require 'whole_history_rating'
@whr = WholeHistoryRating::Base.new
for game in (1..10) do
@whr.create_game("anchor", "player", "B", 1, 0)
@whr.create_game("anchor", "player", "W", 1, 0)
end
for game in (1..10) do
@whr.create_game("anchor", "player", "B",180, 600)
@whr.create_game("anchor", "player", "W",180, 600)
end
for i in (1..10) do
@whr.iterate(10)
print @whr.ratings_for_player("anchor"), " "
print @whr.ratings_for_player("player"), "\n"
end
/var/lib/gems/1.9.1/gems/whole_history_rating-0.1.2/lib/whole_history_rating/player.rb:149:in `block in update_by_ndim_newton': uninitialized constant WholeHistoryRating::Player::WHR (NameError)
-
pete
- Beginner
- Posts: 18
- Joined: Sun Apr 25, 2010 6:17 pm
- GD Posts: 0
- Location: Northfield, MN
- Been thanked: 5 times
- Contact:
Re: Whole History Rating open source implementation.
yoyoma wrote:player gives anchor 600 Elo advantage (maybe 3 stones, depends on your model), 50% wins
Yes, giving it crazy input can produce crazy output. I should have documented the handicap parameter better, but I'm not sure what the exact range is.
On GoShrine, handicap values for a single stone range roughly between 30-60 elo, depending on the strength of the players. So 600 elo is quite a bit. I've yet to see it go unstable with real data.
I'm guessing what is happening is that we're running into floating point precision issues with certain params. If stability does end up being a problem for real data, I'd definitely take a deeper look at it, though I might need some help from Remi.
-Pete
Creator of GoShrine
-
hyperpape
- Tengen
- Posts: 4382
- Joined: Thu May 06, 2010 3:24 pm
- Rank: AGA 3k
- GD Posts: 65
- OGS: Hyperpape 4k
- Location: Caldas da Rainha, Portugal
- Has thanked: 499 times
- Been thanked: 727 times
Re: Whole History Rating open source implementation.
So you're not asserting this is true of ordinary players who have (previously) played opponents selected more or less at random?Rémi wrote:hyperpape wrote:Can you explain/elaborate on this?Rémi wrote:This would be unlike KGS, where the best way to increase one's rating is to stop playing.
Rémi
With the KGS rating system (not WHR), if you stop playing, your rating improves like your past opponents. So if you want your rating to make fast progress you can select opponents who are making fast progress, and then stop playing.
Rémi
-
yoyoma
- Lives in gote
- Posts: 653
- Joined: Mon Apr 19, 2010 8:45 pm
- GD Posts: 0
- Location: Austin, Texas, USA
- Has thanked: 54 times
- Been thanked: 213 times
Re: Whole History Rating open source implementation.
pete wrote:yoyoma wrote:player gives anchor 600 Elo advantage (maybe 3 stones, depends on your model), 50% wins
Yes, giving it crazy input can produce crazy output. I should have documented the handicap parameter better, but I'm not sure what the exact range is.
On GoShrine, handicap values for a single stone range roughly between 30-60 elo, depending on the strength of the players. So 600 elo is quite a bit. I've yet to see it go unstable with real data.
I'm guessing what is happening is that we're running into floating point precision issues with certain params. If stability does end up being a problem for real data, I'd definitely take a deeper look at it, though I might need some help from Remi.
-Pete
60 elo for a handicap stone? That is far too low! KGS uses 148 per rank for 30k-5k, and 226 per rank for 2d+ (The constants are given in a different form here http://senseis.xmp.net/?KGSRatingMath log(e^0.85)*400=148 to convert to Elo form). EGF uses similar numbers. Besides, even using your too low value of 60, this is just a 10 rank improvement over 6 months. It's very easy to go from 25kyu to 15kyu in 6 months.
- quantumf
- Lives in sente
- Posts: 844
- Joined: Tue Apr 20, 2010 11:36 pm
- Rank: 3d
- GD Posts: 422
- KGS: komi
- Has thanked: 180 times
- Been thanked: 151 times
Re: Whole History Rating open source implementation.
So after 5 games (3 wins 2 losses) I still don't have a rank. This is somewhat frustrating and not encouraging me to carry on trying. In general I prefer servers that allow one to self-select a starting rank, and find KGS quite annoying, but even KGS gives me a rank after 2 games. Kind of off-topic, but relevant in the sense that there are usability considerations that override perfection/accuracy in ranking systems.
-
Rémi
- Lives with ko
- Posts: 170
- Joined: Sat Jan 14, 2012 4:11 pm
- Rank: KGS 4 kyu
- GD Posts: 0
- Has thanked: 32 times
- Been thanked: 119 times
- Contact:
Re: Whole History Rating open source implementation.
yoyoma wrote:I found what look like some numerical stability problems. I had similar problems when I implemented this as well with Newton's method failing or oscillating.
Newton's method is very efficient but tricky. In order to guarantee it works, it is necessary to check that the Newton iteration brings an improvement in the log-likelihood. If it does not, a fallback method should be used (such as a line search in the gradient direction).
IIRC, in my implementation I add a small negative constant to the diagonal of the Hessian before inversion. This prevents instability very well, at almost no cost in terms of efficiency. Maybe a good fallback method would be to increase this additional diagonal until the Newton's step increases the log-likelihood.
Rémi
-
Rémi
- Lives with ko
- Posts: 170
- Joined: Sat Jan 14, 2012 4:11 pm
- Rank: KGS 4 kyu
- GD Posts: 0
- Has thanked: 32 times
- Been thanked: 119 times
- Contact:
Re: Whole History Rating open source implementation.
hyperpape wrote:So you're not asserting this is true of ordinary players who have (previously) played opponents selected more or less at random?
If you don't play on KGS, your rating will improve like your opponents.
Rémi
-
pete
- Beginner
- Posts: 18
- Joined: Sun Apr 25, 2010 6:17 pm
- GD Posts: 0
- Location: Northfield, MN
- Been thanked: 5 times
- Contact:
Re: Whole History Rating open source implementation.
yoyoma wrote:60 elo for a handicap stone? That is far too low! KGS uses 148 per rank for 30k-5k, and 226 per rank for 2d+ (The constants are given in a different form here http://senseis.xmp.net/?KGSRatingMath log(e^0.85)*400=148 to convert to Elo form). EGF uses similar numbers. Besides, even using your too low value of 60, this is just a 10 rank improvement over 6 months. It's very easy to go from 25kyu to 15kyu in 6 months.
KGS and WHR have a different elo scale, I believe. The total spread of ranks from my 40k games on GoShrine is ~2000 elo which, if spread evenly, is 50 elo per rank/stone.
Again, I'd be interested to know if you see this with real data. The example you propose is not completely impossible in general usage, but is one I would certainly not see on GoShrine (600 elo is a 15 stone handicap at 25 kyu),
-Pete
Creator of GoShrine
-
Rémi
- Lives with ko
- Posts: 170
- Joined: Sat Jan 14, 2012 4:11 pm
- Rank: KGS 4 kyu
- GD Posts: 0
- Has thanked: 32 times
- Been thanked: 119 times
- Contact:
Re: Whole History Rating open source implementation.
pete wrote:yoyoma wrote:60 elo for a handicap stone? That is far too low! KGS uses 148 per rank for 30k-5k, and 226 per rank for 2d+ (The constants are given in a different form here http://senseis.xmp.net/?KGSRatingMath log(e^0.85)*400=148 to convert to Elo form). EGF uses similar numbers. Besides, even using your too low value of 60, this is just a 10 rank improvement over 6 months. It's very easy to go from 25kyu to 15kyu in 6 months.
KGS and WHR have a different elo scale, I believe. The total spread of ranks from my 40k games on GoShrine is ~2000 elo which, if spread evenly, is 50 elo per rank/stone.
Again, I'd be interested to know if you see this with real data. The example you propose is not completely impossible in general usage, but is one I would certainly not see on GoShrine (600 elo is a 15 stone handicap at 25 kyu),
-Pete
How did you select the volatility meta-parameter of WHR? handicap values?
In my experiments, it was very clear that the handicap value changes a lot with player strength, and also volatility. When choosing the volatility in order to optimize prediction quality over the KGS database, it was too low (14 Elo^2/Day) for beginners, so it produced very "compressed" ratings.
For a rating system to properly understand the variations of strength in a pool of players that mixes beginners and experts, it is really necessary to consider that the strengths of beginners changes faster than the strengths of experts.
Rémi
-
pete
- Beginner
- Posts: 18
- Joined: Sun Apr 25, 2010 6:17 pm
- GD Posts: 0
- Location: Northfield, MN
- Been thanked: 5 times
- Contact:
Re: Whole History Rating open source implementation.
quantumf wrote:So after 5 games (3 wins 2 losses) I still don't have a rank. This is somewhat frustrating and not encouraging me to carry on trying. In general I prefer servers that allow one to self-select a starting rank, and find KGS quite annoying, but even KGS gives me a rank after 2 games. Kind of off-topic, but relevant in the sense that there are usability considerations that override perfection/accuracy in ranking systems.
Thanks for the feedback, quantum. I'm leaning towards implementing what Remi suggested about using the lower confidence bound as the rating, which would give you a rank much sooner (though probably lower than your actual rank).
Creator of GoShrine
-
pete
- Beginner
- Posts: 18
- Joined: Sun Apr 25, 2010 6:17 pm
- GD Posts: 0
- Location: Northfield, MN
- Been thanked: 5 times
- Contact:
Re: Whole History Rating open source implementation.
Rémi wrote:How did you select the volatility meta-parameter of WHR? handicap values?
I did some optimization runs, and came up with 300 Elo^2/day, somehow. You can configure the library like this:
Code: Select all
@whr = WholeHistoryRating::Base.new(:w2 => 17)I know 300 seems like a lot. But it does still seem to produce sensible results, and allows beginners to make more rapid progress.
BTW, yoyoma, if you bump :w2 down below 100, your example remains stable.
Rémi wrote:In my experiments, it was very clear that the handicap value changes a lot with player strength, and also volatility. When choosing the volatility in order to optimize prediction quality over the KGS database, it was too low (14 Elo^2/Day) for beginners, so it produced very "compressed" ratings.
For a rating system to properly understand the variations of strength in a pool of players that mixes beginners and experts, it is really necessary to consider that the strengths of beginners changes faster than the strengths of experts.
Rémi
Are you working on a new version of WHR that takes this into consideration?
Creator of GoShrine
-
pete
- Beginner
- Posts: 18
- Joined: Sun Apr 25, 2010 6:17 pm
- GD Posts: 0
- Location: Northfield, MN
- Been thanked: 5 times
- Contact:
Re: Whole History Rating open source implementation.
As an aside, I'm glad to finally have some questions and feedback on this code that I struggled to write.
I'm certainly open to the possibility that there may be mistakes in the code, and would love to have someone other than me look it over. That's one of the reasons I open sourced it. If you see anything, or have questions, send a pull request on github, or just send me an email.
-Pete
I'm certainly open to the possibility that there may be mistakes in the code, and would love to have someone other than me look it over. That's one of the reasons I open sourced it. If you see anything, or have questions, send a pull request on github, or just send me an email.
-Pete
Creator of GoShrine
-
yoyoma
- Lives in gote
- Posts: 653
- Joined: Mon Apr 19, 2010 8:45 pm
- GD Posts: 0
- Location: Austin, Texas, USA
- Has thanked: 54 times
- Been thanked: 213 times
Re: Whole History Rating open source implementation.
pete wrote:yoyoma wrote:60 elo for a handicap stone? That is far too low! KGS uses 148 per rank for 30k-5k, and 226 per rank for 2d+ (The constants are given in a different form here http://senseis.xmp.net/?KGSRatingMath log(e^0.85)*400=148 to convert to Elo form). EGF uses similar numbers. Besides, even using your too low value of 60, this is just a 10 rank improvement over 6 months. It's very easy to go from 25kyu to 15kyu in 6 months.
KGS and WHR have a different elo scale, I believe. The total spread of ranks from my 40k games on GoShrine is ~2000 elo which, if spread evenly, is 50 elo per rank/stone.
Again, I'd be interested to know if you see this with real data. The example you propose is not completely impossible in general usage, but is one I would certainly not see on GoShrine (600 elo is a 15 stone handicap at 25 kyu),
-Pete
I like playing around with rating math, sorry for the tldr text.
I did convert from the KGS scale to the standard Elo scale, and it looks like your WHR code handicap parameter takes a standard Elo scale number.
KGS: P = 1 / ( 1 + e^(k*(RankB-RankA)) ) [k=0.85 for 30k-5k, k=1.3 for 2d+]
Elo: P = 1 / ( 1 + 10^((RankB-RankA)/400)) )
So for kyu players and 1 rank difference: RankB-RankA=1 and k=0.85. Then you can solve for what the Elo difference is. EGF has some statistics on even games here: http://gemma.ujf.cas.cz/~cieply/GO/statev.html
Generally for weaker kyu players the chance of upset is around 45%, for stronger players it goes down. I put the expected win rates for KGS and EGF formulas, along with the observed win rates for EGF tournaments here:
Code: Select all
| | KGS | EGF | EGF | KGS | EGF | EGF |
| | exp. | exp. | obs. | exp. | exp. | obs. |
| even game | win % | win % | win % | elo | elo | elo |
|-----------|-------|-------|-------|-------|-------|-------|
| 10k vs 9k | 30.0 | 33.9 | 44.8 | 148 | 116 | 36 |
| 5d vs 6d | 21.4 | 20.1 | 27.8 | 226 | 232 | 166 |
You can see quite a discrepancy between the win rates predicted by the EGF formula and those observed. Since ratings are estimated values of random variables, the observed win% will usually be lower than the expected win% (errors in the rating estimation tend to create more upsets than expected). Also these statistics are mostly from McMahon tournaments, which tends to match underrated 10kyus with overrated 9kyus.
Remi do you have any numbers like this for observed KGS games to get numbers for Elo/Rank from them?