Page 2 of 3

Re: Whole History Rating open source implementation.

Posted: Tue May 29, 2012 11:32 am
by hyperpape
Rémi wrote:This would be unlike KGS, where the best way to increase one's rating is to stop playing.

Rémi
Can you explain/elaborate on this?

Re: Whole History Rating open source implementation.

Posted: Tue May 29, 2012 11:43 am
by Rémi
hyperpape wrote:
Rémi wrote:This would be unlike KGS, where the best way to increase one's rating is to stop playing.

Rémi
Can you explain/elaborate on this?


With the KGS rating system (not WHR), if you stop playing, your rating improves like your past opponents. So if you want your rating to make fast progress you can select opponents who are making fast progress, and then stop playing.

Rémi

Re: Whole History Rating open source implementation.

Posted: Tue May 29, 2012 4:07 pm
by yoyoma
I found what look like some numerical stability problems. I had similar problems when I implemented this as well with Newton's method failing or oscillating. For example:

player vs anchor, 20 even games, 50% wins
180 days later:
player gives anchor 600 Elo advantage (maybe 3 stones, depends on your model), 50% wins

This seems to break the code. I ended up using a slower but more stable minorize-majorize algorithm.

Code: Select all

require 'whole_history_rating'

@whr = WholeHistoryRating::Base.new

for game in (1..10) do
   @whr.create_game("anchor", "player", "B", 1, 0)
   @whr.create_game("anchor", "player", "W", 1, 0)
end
for game in (1..10) do
   @whr.create_game("anchor", "player", "B",180, 600)
   @whr.create_game("anchor", "player", "W",180, 600)
end

for i in (1..10) do
  @whr.iterate(10)
  print @whr.ratings_for_player("anchor"), "   "
  print @whr.ratings_for_player("player"), "\n"
end

/var/lib/gems/1.9.1/gems/whole_history_rating-0.1.2/lib/whole_history_rating/player.rb:149:in `block in update_by_ndim_newton': uninitialized constant WholeHistoryRating::Player::WHR (NameError)

Re: Whole History Rating open source implementation.

Posted: Tue May 29, 2012 4:30 pm
by pete
yoyoma wrote:player gives anchor 600 Elo advantage (maybe 3 stones, depends on your model), 50% wins


Yes, giving it crazy input can produce crazy output. I should have documented the handicap parameter better, but I'm not sure what the exact range is.

On GoShrine, handicap values for a single stone range roughly between 30-60 elo, depending on the strength of the players. So 600 elo is quite a bit. I've yet to see it go unstable with real data.

I'm guessing what is happening is that we're running into floating point precision issues with certain params. If stability does end up being a problem for real data, I'd definitely take a deeper look at it, though I might need some help from Remi. :)

-Pete

Re: Whole History Rating open source implementation.

Posted: Tue May 29, 2012 5:24 pm
by hyperpape
Rémi wrote:
hyperpape wrote:
Rémi wrote:This would be unlike KGS, where the best way to increase one's rating is to stop playing.

Rémi
Can you explain/elaborate on this?


With the KGS rating system (not WHR), if you stop playing, your rating improves like your past opponents. So if you want your rating to make fast progress you can select opponents who are making fast progress, and then stop playing.

Rémi
So you're not asserting this is true of ordinary players who have (previously) played opponents selected more or less at random?

Re: Whole History Rating open source implementation.

Posted: Tue May 29, 2012 8:26 pm
by yoyoma
pete wrote:
yoyoma wrote:player gives anchor 600 Elo advantage (maybe 3 stones, depends on your model), 50% wins


Yes, giving it crazy input can produce crazy output. I should have documented the handicap parameter better, but I'm not sure what the exact range is.

On GoShrine, handicap values for a single stone range roughly between 30-60 elo, depending on the strength of the players. So 600 elo is quite a bit. I've yet to see it go unstable with real data.

I'm guessing what is happening is that we're running into floating point precision issues with certain params. If stability does end up being a problem for real data, I'd definitely take a deeper look at it, though I might need some help from Remi. :)

-Pete


60 elo for a handicap stone? That is far too low! KGS uses 148 per rank for 30k-5k, and 226 per rank for 2d+ (The constants are given in a different form here http://senseis.xmp.net/?KGSRatingMath log(e^0.85)*400=148 to convert to Elo form). EGF uses similar numbers. Besides, even using your too low value of 60, this is just a 10 rank improvement over 6 months. It's very easy to go from 25kyu to 15kyu in 6 months.

Re: Whole History Rating open source implementation.

Posted: Wed May 30, 2012 12:09 am
by quantumf
So after 5 games (3 wins 2 losses) I still don't have a rank. This is somewhat frustrating and not encouraging me to carry on trying. In general I prefer servers that allow one to self-select a starting rank, and find KGS quite annoying, but even KGS gives me a rank after 2 games. Kind of off-topic, but relevant in the sense that there are usability considerations that override perfection/accuracy in ranking systems.

Re: Whole History Rating open source implementation.

Posted: Wed May 30, 2012 2:11 am
by Rémi
yoyoma wrote:I found what look like some numerical stability problems. I had similar problems when I implemented this as well with Newton's method failing or oscillating.


Newton's method is very efficient but tricky. In order to guarantee it works, it is necessary to check that the Newton iteration brings an improvement in the log-likelihood. If it does not, a fallback method should be used (such as a line search in the gradient direction).

IIRC, in my implementation I add a small negative constant to the diagonal of the Hessian before inversion. This prevents instability very well, at almost no cost in terms of efficiency. Maybe a good fallback method would be to increase this additional diagonal until the Newton's step increases the log-likelihood.

Rémi

Re: Whole History Rating open source implementation.

Posted: Wed May 30, 2012 2:16 am
by Rémi
hyperpape wrote:So you're not asserting this is true of ordinary players who have (previously) played opponents selected more or less at random?


If you don't play on KGS, your rating will improve like your opponents.

Rémi

Re: Whole History Rating open source implementation.

Posted: Wed May 30, 2012 8:26 am
by pete
yoyoma wrote:60 elo for a handicap stone? That is far too low! KGS uses 148 per rank for 30k-5k, and 226 per rank for 2d+ (The constants are given in a different form here http://senseis.xmp.net/?KGSRatingMath log(e^0.85)*400=148 to convert to Elo form). EGF uses similar numbers. Besides, even using your too low value of 60, this is just a 10 rank improvement over 6 months. It's very easy to go from 25kyu to 15kyu in 6 months.


KGS and WHR have a different elo scale, I believe. The total spread of ranks from my 40k games on GoShrine is ~2000 elo which, if spread evenly, is 50 elo per rank/stone.

Again, I'd be interested to know if you see this with real data. The example you propose is not completely impossible in general usage, but is one I would certainly not see on GoShrine (600 elo is a 15 stone handicap at 25 kyu),

-Pete

Re: Whole History Rating open source implementation.

Posted: Wed May 30, 2012 8:37 am
by Rémi
pete wrote:
yoyoma wrote:60 elo for a handicap stone? That is far too low! KGS uses 148 per rank for 30k-5k, and 226 per rank for 2d+ (The constants are given in a different form here http://senseis.xmp.net/?KGSRatingMath log(e^0.85)*400=148 to convert to Elo form). EGF uses similar numbers. Besides, even using your too low value of 60, this is just a 10 rank improvement over 6 months. It's very easy to go from 25kyu to 15kyu in 6 months.


KGS and WHR have a different elo scale, I believe. The total spread of ranks from my 40k games on GoShrine is ~2000 elo which, if spread evenly, is 50 elo per rank/stone.

Again, I'd be interested to know if you see this with real data. The example you propose is not completely impossible in general usage, but is one I would certainly not see on GoShrine (600 elo is a 15 stone handicap at 25 kyu),

-Pete


How did you select the volatility meta-parameter of WHR? handicap values?

In my experiments, it was very clear that the handicap value changes a lot with player strength, and also volatility. When choosing the volatility in order to optimize prediction quality over the KGS database, it was too low (14 Elo^2/Day) for beginners, so it produced very "compressed" ratings.

For a rating system to properly understand the variations of strength in a pool of players that mixes beginners and experts, it is really necessary to consider that the strengths of beginners changes faster than the strengths of experts.

Rémi

Re: Whole History Rating open source implementation.

Posted: Wed May 30, 2012 8:41 am
by pete
quantumf wrote:So after 5 games (3 wins 2 losses) I still don't have a rank. This is somewhat frustrating and not encouraging me to carry on trying. In general I prefer servers that allow one to self-select a starting rank, and find KGS quite annoying, but even KGS gives me a rank after 2 games. Kind of off-topic, but relevant in the sense that there are usability considerations that override perfection/accuracy in ranking systems.


Thanks for the feedback, quantum. I'm leaning towards implementing what Remi suggested about using the lower confidence bound as the rating, which would give you a rank much sooner (though probably lower than your actual rank).

Re: Whole History Rating open source implementation.

Posted: Wed May 30, 2012 9:06 am
by pete
Rémi wrote:How did you select the volatility meta-parameter of WHR? handicap values?


I did some optimization runs, and came up with 300 Elo^2/day, somehow. You can configure the library like this:

Code: Select all

@whr = WholeHistoryRating::Base.new(:w2 => 17)


I know 300 seems like a lot. But it does still seem to produce sensible results, and allows beginners to make more rapid progress.

BTW, yoyoma, if you bump :w2 down below 100, your example remains stable.

Rémi wrote:In my experiments, it was very clear that the handicap value changes a lot with player strength, and also volatility. When choosing the volatility in order to optimize prediction quality over the KGS database, it was too low (14 Elo^2/Day) for beginners, so it produced very "compressed" ratings.

For a rating system to properly understand the variations of strength in a pool of players that mixes beginners and experts, it is really necessary to consider that the strengths of beginners changes faster than the strengths of experts.

Rémi


Are you working on a new version of WHR that takes this into consideration? :)

Re: Whole History Rating open source implementation.

Posted: Wed May 30, 2012 9:15 am
by pete
As an aside, I'm glad to finally have some questions and feedback on this code that I struggled to write.

I'm certainly open to the possibility that there may be mistakes in the code, and would love to have someone other than me look it over. That's one of the reasons I open sourced it. If you see anything, or have questions, send a pull request on github, or just send me an email.

-Pete

Re: Whole History Rating open source implementation.

Posted: Wed May 30, 2012 11:37 am
by yoyoma
pete wrote:
yoyoma wrote:60 elo for a handicap stone? That is far too low! KGS uses 148 per rank for 30k-5k, and 226 per rank for 2d+ (The constants are given in a different form here http://senseis.xmp.net/?KGSRatingMath log(e^0.85)*400=148 to convert to Elo form). EGF uses similar numbers. Besides, even using your too low value of 60, this is just a 10 rank improvement over 6 months. It's very easy to go from 25kyu to 15kyu in 6 months.


KGS and WHR have a different elo scale, I believe. The total spread of ranks from my 40k games on GoShrine is ~2000 elo which, if spread evenly, is 50 elo per rank/stone.

Again, I'd be interested to know if you see this with real data. The example you propose is not completely impossible in general usage, but is one I would certainly not see on GoShrine (600 elo is a 15 stone handicap at 25 kyu),

-Pete


I like playing around with rating math, sorry for the tldr text. :)

I did convert from the KGS scale to the standard Elo scale, and it looks like your WHR code handicap parameter takes a standard Elo scale number.
KGS: P = 1 / ( 1 + e^(k*(RankB-RankA)) ) [k=0.85 for 30k-5k, k=1.3 for 2d+]
Elo: P = 1 / ( 1 + 10^((RankB-RankA)/400)) )

So for kyu players and 1 rank difference: RankB-RankA=1 and k=0.85. Then you can solve for what the Elo difference is. EGF has some statistics on even games here: http://gemma.ujf.cas.cz/~cieply/GO/statev.html
Generally for weaker kyu players the chance of upset is around 45%, for stronger players it goes down. I put the expected win rates for KGS and EGF formulas, along with the observed win rates for EGF tournaments here:

Code: Select all

|           | KGS   | EGF   | EGF   | KGS   | EGF   | EGF   |
|           | exp.  | exp.  | obs.  | exp.  | exp.  | obs.  |
| even game | win % | win % | win % | elo   | elo   | elo   |
|-----------|-------|-------|-------|-------|-------|-------|
| 10k vs 9k | 30.0  | 33.9  | 44.8  | 148   | 116   | 36    |
| 5d vs 6d  | 21.4  | 20.1  | 27.8  | 226   | 232   | 166   |


You can see quite a discrepancy between the win rates predicted by the EGF formula and those observed. Since ratings are estimated values of random variables, the observed win% will usually be lower than the expected win% (errors in the rating estimation tend to create more upsets than expected). Also these statistics are mostly from McMahon tournaments, which tends to match underrated 10kyus with overrated 9kyus.

Remi do you have any numbers like this for observed KGS games to get numbers for Elo/Rank from them?