Whole History Rating open source implementation.

Tell the community about tournaments, new go sites, software updates, etc.
Rémi
Lives with ko
Posts: 170
Joined: Sat Jan 14, 2012 4:11 pm
Rank: KGS 4 kyu
GD Posts: 0
Has thanked: 32 times
Been thanked: 119 times
Contact:

Re: Whole History Rating open source implementation.

Post by Rémi »

yoyoma wrote:Remi do you have any numbers like this for observed KGS games to get numbers for Elo/Rank from them?


I did most of my experiments without handicap. If I find time in the days to come, I'll try to take a closer look. But I have been saying this to myself since the WHR paper in 2008, so I am not sure I'll do it soon.

Rémi
pete
Beginner
Posts: 18
Joined: Sun Apr 25, 2010 6:17 pm
GD Posts: 0
Location: Northfield, MN
Been thanked: 5 times
Contact:

Re: Whole History Rating open source implementation.

Post by pete »

Yoyoma,

I'm wondering if we have different models in our heads at this point. When you present a probability statement like P = 1 / ( 1 + 10^((RankB-RankA)/400)) ) and then go on to say that RankB and RankA are actual kyu/dan ranks, I don't follow.

The model that WHR uses (and Remi, correct me if I misspeak) is P(A wins) = NaturalA/(NaturalA+NaturalB). To convert from Natural ratings to ELO, use the formula (NaturalX * 400.0)/ln(10). WHR primarily works on Natural scaled ratings internally. In my library, I convert the user's input into Natural ratings, and convert output back into ELO.

This produces a "linear" strength scale. Linear in the sense that the probability of a 1000 ELO player beating a 900 ELO player is the same as that of a 200 ELO player beating a 100 ELO player. (see the test_winrates_are_equal_for_same_elo_delta test in the library).

Historically, Go ranks are tied to handicap stones, and stronger players can use stones more effectively, thus ranks are not an equal distance apart in terms of strength. So it is in the conversion from ELO to ranks (which happens outside of the library, and in GoShrine code), that the strength scale takes on a curve.

Since a handicap stone is a varying amount of ELO, based on the players' strengths, the library supports the use of a callback, which allows the calling to code to implement a curve for handicap values as well.

Does this clear matters up? Essentially WHR knows nothing about the curved scale of go ranks and go handicaps, but just does what it's good at, computing estimates of relative strengths on a flat scale.

-Pete
Creator of GoShrine
yoyoma
Lives in gote
Posts: 653
Joined: Mon Apr 19, 2010 8:45 pm
GD Posts: 0
Location: Austin, Texas, USA
Has thanked: 54 times
Been thanked: 213 times

Re: Whole History Rating open source implementation.

Post by yoyoma »

Yes we need to be clear what scales we're talking about. What you call Natural I thought was called Gamma.
Natural = ln(Gamma).
Elo = Natural*400/ln(10)
I think these are the same as the definitions given in 2.1 of http://remi.coulom.free.fr/WHR/WHR.pdf (Greek letter gamma = Gamma, lowercase r = Natural, uppercase R = Elo).

Code: Select all

|Elo    |Natural|Gamma  | win%|
|0.00   |0      |1.00   |0.50 |
|30.00  |0.075  |1.19   |0.46 |
|60.00  |0.15   |1.41   |0.41 |
|400.00 |1      |10.00  |0.09 |


Am I right that the handicap argument for Game::initialize is on the classic Elo scale? I see this bits of code that make me think so:

opponent_elo = bpd.elo + black_advantage # Addition used here, as I expected
rval = 10**(opponent_elo/400.0) # Here is the conversion from Elo to Natural

When I wrote: "So for kyu players and 1 rank difference: RankB-RankA=1 and k=0.85.", that was for the KGS formula, which uses a Natural scale: P = 1 / ( 1 + e^(k*(RankB-RankA)) ). So for that formula ranks are fixed to always be 1 rank = 1.0 on the Natural scale. And the "k" parameter is used to change expected win rates for dans vs kyus.

So to compare apples to apples I converted from that formula to the classic Elo formula which uses log10 and has the 400 constant in there. I did a similar conversion from EGF GoR's parameter they call "a" (http://www.europeangodatabase.eu/EGD/EG ... system.php).

When you wrote your system used 30-60 Elo per rank, I assumed you meant the classic Elo scale using log10 and the 400 constant, is that right? I added a table for those values:

Code: Select all

|           | KGS   | EGF   | EGF   | KGS   | EGF   | EGF   |
|           | exp.  | exp.  | obs.  | exp.  | exp.  | obs.  |
| even game | win % | win % | win % | elo   | elo   | elo   |
|-----------|-------|-------|-------|-------|-------|-------|
| 10k vs 9k | 30.0  | 33.9  | 44.8  | 148   | 116   | 36    |
| 5d vs 6d  | 21.4  | 20.1  | 27.8  | 226   | 232   | 166   |

30 Elo difference | 45.7% |  (go shrine lower end 1 rank difference)
60 Elo difference | 41.5% |  (go shrine lower end 1 rank difference)
pete
Beginner
Posts: 18
Joined: Sun Apr 25, 2010 6:17 pm
GD Posts: 0
Location: Northfield, MN
Been thanked: 5 times
Contact:

Re: Whole History Rating open source implementation.

Post by pete »

yoyoma wrote:

Code: Select all

|           | KGS   | EGF   | EGF   | KGS   | EGF   | EGF   |
|           | exp.  | exp.  | obs.  | exp.  | exp.  | obs.  |
| even game | win % | win % | win % | elo   | elo   | elo   |
|-----------|-------|-------|-------|-------|-------|-------|
| 10k vs 9k | 30.0  | 33.9  | 44.8  | 148   | 116   | 36    |
| 5d vs 6d  | 21.4  | 20.1  | 27.8  | 226   | 232   | 166   |

30 Elo difference | 45.7% |  (go shrine lower end 1 rank difference)
60 Elo difference | 41.5% |  (go shrine lower end 1 rank difference)


Ok, I understand the table now, thanks for being patient, Your assumptions are correct about handicap being in ELO, and that the ELO in my WHR implementation is the same ELO you are talking about. The 30 & 60 elo deltas do indeed give the winrates that you list in the table above.

I'm wondering if you would indulge my curiosity and expand upon your explanation for why the observed values in the table above are at such odds with the expected winrates. "errors in the rating estimation" should create errors in both directions, overestimating, and underestimating, no? And why do McMahon tournaments match underrated 10kyus with overrated 9kyus? Wouldn't they also match overrated 9kyus with underrated 10kyus?

I'm willing to accept that my ELO values might be low, but perhaps existing rating systems are also erring on the high side, as the above tables might suggest.

-Pete
Creator of GoShrine
yoyoma
Lives in gote
Posts: 653
Joined: Mon Apr 19, 2010 8:45 pm
GD Posts: 0
Location: Austin, Texas, USA
Has thanked: 54 times
Been thanked: 213 times

Re: Whole History Rating open source implementation.

Post by yoyoma »

pete wrote:
yoyoma wrote:

Code: Select all

|           | KGS   | EGF   | EGF   | KGS   | EGF   | EGF   |
|           | exp.  | exp.  | obs.  | exp.  | exp.  | obs.  |
| even game | win % | win % | win % | elo   | elo   | elo   |
|-----------|-------|-------|-------|-------|-------|-------|
| 10k vs 9k | 30.0  | 33.9  | 44.8  | 148   | 116   | 36    |
| 5d vs 6d  | 21.4  | 20.1  | 27.8  | 226   | 232   | 166   |

30 Elo difference | 45.7% |  (go shrine lower end 1 rank difference)
60 Elo difference | 41.5% |  (go shrine lower end 1 rank difference)


Ok, I understand the table now, thanks for being patient, Your assumptions are correct about handicap being in ELO, and that the ELO in my WHR implementation is the same ELO you are talking about. The 30 & 60 elo deltas do indeed give the winrates that you list in the table above.

I'm wondering if you would indulge my curiosity and expand upon your explanation for why the observed values in the table above are at such odds with the expected winrates. "errors in the rating estimation" should create errors in both directions, overestimating, and underestimating, no? And why do McMahon tournaments match underrated 10kyus with overrated 9kyus? Wouldn't they also match overrated 9kyus with underrated 10kyus?

I'm willing to accept that my ELO values might be low, but perhaps existing rating systems are also erring on the high side, as the above tables might suggest.

-Pete


I probably shouldn't have thrown in the errors in rating estimation part, because I don't know much about it. I read that somewhere but I can't find it. Basically what I understood is that when you have two players who are estimated to be 1500 and 1600, with some normal distribution of what their ratings *really* are... Blah blah lots of math I can't do on my own (hehe), turns out just using the 1500 and 1600 numbers by themselves gives a lower probability of upsets than using the full distributions? Honestly I don't know how that works so maybe someone can explain better, or maybe I'll find where I read it.

The McMahon one is easier to understand. Take a tournament with two 9ks and two 10ks, and many 30k-11k and 8k+. In round one, the 9ks play each other and the 10ks play each other. In round 2, the 9k winner players the 10k loser. Typically this will be whichever 9k was most underrated and whichever 10k was most overrated. So in general in McMahon tournaments, underrated players go up and overrated players go down, meeting each other and creating more than expected upsets. How big this effect is I don't know.
User avatar
daniel_the_smith
Gosei
Posts: 2116
Joined: Wed Apr 21, 2010 8:51 am
Rank: 2d AGA
GD Posts: 1193
KGS: lavalamp
Tygem: imapenguin
IGS: lavalamp
OGS: daniel_the_smith
Location: Silicon Valley
Has thanked: 152 times
Been thanked: 330 times
Contact:

Re: Whole History Rating open source implementation.

Post by daniel_the_smith »

I don't have anything to contribute but I'm very much enjoying the thread!
That which can be destroyed by the truth should be.
--
My (sadly neglected, but not forgotten) project: http://dailyjoseki.com
Kaya.gs
Lives with ko
Posts: 294
Joined: Fri Aug 12, 2011 10:52 am
Rank: 6d
GD Posts: 0
KGS: Dexmorgan
Wbaduk: c0nanbatt
Has thanked: 25 times
Been thanked: 78 times
Contact:

Re: Whole History Rating open source implementation.

Post by Kaya.gs »

Its a nice discussion :).

I think that it could be a valuable effort to set up a testing environment for the testing of different rating systems. I had planned on doing this on OpenKaya, but i never compiled a set of games to make estimates with .

It can be very fruitful to agree on some systematic testing, so everytime we try out new rating systems and more specifically, tweaking on those systems, we can easily compare them.

Just figure running tests against different compliations (with handicap, witohut handi, with bots, etc) and getting figures directly like:

Accuracy
Glicko -> 40%
WHR(GoShrine's) -> 47%
WHR(yoyoma's) -> 49%
Tygem's -> ?


Performance
Glicko -> X operations
WHR(GoShrine's) -> Y operations
WHR(yoyoma's) -> Z operations
Tygem's -> ?

and so on.

Id like to get this rodeo going at some point , although its not top priority for us now.

Making it an open standard could end up serving in other places, like chess, or just a novel use like comparing EGF rating with the same game results with different systems.
Founder of Kaya.gs
hyperpape
Tengen
Posts: 4382
Joined: Thu May 06, 2010 3:24 pm
Rank: AGA 3k
GD Posts: 65
OGS: Hyperpape 4k
Location: Caldas da Rainha, Portugal
Has thanked: 499 times
Been thanked: 727 times

Re: Whole History Rating open source implementation.

Post by hyperpape »

I wonder: while having real life games is nice, there is the problem that game pairings are influenced by the rating system. Perhaps that's not an issue for reasonable systems, but since some systems (Tygem) are very slow to fix large errors, that could introduce a real distortion.
bakekoq
Dies with sente
Posts: 122
Joined: Fri May 25, 2012 10:22 pm
Rank: KGS 12kyu
GD Posts: 0
KGS: hojoin
OGS: sojiro
bakekoq
Online playing schedule: wah,almost everyday in OGS right now.I still can't play in KGS because the limitness of my network.

Re: Whole History Rating open source implementation.

Post by bakekoq »

hello.
may I know how to install it?
it can be good for me and my clubs in the future.
pete
Beginner
Posts: 18
Joined: Sun Apr 25, 2010 6:17 pm
GD Posts: 0
Location: Northfield, MN
Been thanked: 5 times
Contact:

Re: Whole History Rating open source implementation.

Post by pete »

bakekoq wrote:hello.
may I know how to install it?
it can be good for me and my clubs in the future.


There are instructions on the linked page. It's a ruby gem, so you must be familiar with ruby and rubygems first.
Creator of GoShrine
Post Reply