Life In 19x19

Posted: **Fri Sep 22, 2017 2:30 am**

Hi all,

I'm Dave de Vos, a Dutch go player.

I wanted to investigate the EGD rating system. I attempted to make a revised system that fixes some issues that I notice in the EGD rating system.

The EGD rating system was originated by Aleš Cieply and it is explained here. The EGD manager Aldo Podavini kindly provided the game history from the EGD for me to play with. He also suggested to reverse engineer the EGD rating system and reproduce the EGD rating history and go from there to tweak it. That is what I did: http://goratings.eu. On the About page I explain a bit more. I used part of the introduction from that page here.

My main concern is the a function. It is used to compute an expected game result, so it should predict winrates reasonably well. But the expected winrates from the a function used by the system don't match all that well with observed winrates. (See 1/a predicted EGD vs 1/a observed EGD). The expected odds are about twice the observed odds, so the expectations of the EGD are clearly too high. Only around rating 100 and rating 2700 its predictions come closer to the observations.

This means that all players lose more than the system expects against a lower rated player and win more than the system expects against a higher rated player. Over time, this will contract the rating range: 1 grade difference will correspond to less than 100 points rating difference. The most frequent opponent has a rating of about 1700, and the frequency tapers off below and above. Because of this, I expect that the rating range will contract towards 1700. But this trend may be obscured by other deflation or inflation effects.

For some time, I've had this feeling that there is gradual deflation in the mid-dan region of a few rating points per year. I suspect that to some degree this deflation may be attributed to the the above cause. And even if it isn't, I see no reason to use a model that doesn't match with observations. So I implemented a revised rating system that uses an a function that matches the observed winrates better.

On the Player Rating History page you can compare the rating histories computed with this revised rating system. It is still anonymous, so you'll have to look up the player ID (PIN) on the EGD search page.

This site is still under construction. As it is now, it's just a quick and dirty contraption to share my thoughts and results. I'm still tweaking the system, so the charts may also evolve over time.

I welcome your questions, remarks, suggestions and other feedback.

Posted: **Fri Sep 22, 2017 3:27 am**

I hope your work goes well, and that any positive changes you identify can be implemented.

I wondered for some time if it was possible to make a second iteration of the ratings algorithm to correct really out of date ranks. If exit_rating - entry_rating exceeds 100 (or perhaps 150) for 1 or more players in a tournament, then compensate this probability defying event by 1 re-iteration of the algorithm.

Posted: **Fri Sep 22, 2017 3:44 am**

Nice project davos (and I remember many fun games together back in old OGS days). I too have the feeling at mid-dan that loses to weaker players happen more than the system expects: in many smaller UK tournaments I go to (not London open or Brit championship) I'm the highest rated by a stone or two, so I'm a 4d (2395) playing 1-2ds. For these sort of typical games I need to win something like seven games for every one I lose to maintain the same rating. I am a bit of a byo-yomi blunderer but I think even for a typical player that ratio is too high for such a strength difference. From a rating standpoint these tournaments end up rather "nothing to gain and everything to lose", but I don't care so much about that anymore. Also I think the EGD doesn't award enough rating points for beating much stronger players: if I beat someone my rating I get 8 points, if I beat some 2700+ super-strong like Ilya, Fan Hui, Hwang Inseong etc I just get 16 points. I feel that deserves a lot more points than 2 wins against a 4d, and is also likely indicative of an improving player: a normal 4d won't beat an 8d but an improving 4d who will be 5d or 6d soon might so boost their rating so as not to hurt/deflate the normal 4ds they are crushing along the way. Also the 2800 doesn't have to lose the same number of points (and indeed only loses ~10 in EGD) so such an event is a good opportunity to inject extra points into the system to reflect the growing total strength of the player population.

Posted: **Fri Sep 22, 2017 7:21 am**

I don't think expected results should match actual results, because some of the player population is improving, and therefore they are underrated. In the simplest case consider a rating system with only two players. One is rated 10kyu and one 1dan. Suppose the 1dan player's true rating remains 1dan, and the 10kyu player's true rating rapidly improves to 1dan. They play even games, but the 10kyu will be winning more than expected.

Maybe the 'a' function does need changing, but I don't think you can use the method you described. To find the correct 'a' function by looking at actual results you need to only look at games between players that are already accurately rated. Even in that case there will always be some uncertainty in the ratings, and that uncertainty will cause expected vs actual to be slightly different. I think I read a paper on chess Elo studies about this but I can't find it now.

Maybe another way to look at the problem you see, given what I think problem is (improving players), if the rating system responded to improving players faster it would reduce the problem.

Posted: **Fri Sep 22, 2017 7:38 am**

I thought that the idea was not that they should match, but that the fit should be slightly better than it currently is.

It would be quite nice to see some kind of histogram for the rating distribution at (say) 2006 and 2016 for the 2 models

Posted: **Fri Sep 22, 2017 8:34 am**

Javaness2 wrote: I wondered for some time if it was possible to make a second iteration of the ratings algorithm to correct really out of date ranks.

I'm not sure I understand what you man by out of date ranks. I just reprocess the full game history of all players in every test run.

Javaness2 wrote:If exit_rating - entry_rating exceeds 100 (or perhaps 150) for 1 or more players in a tournament, then compensate this probability defying event by 1 re-iteration of the algorithm.

I haven't looked into details like that. I'm looking at the full EGD game history (12,000 tournaments, almost 900,000 games). I assume rating defying tournaments like that are rare, so I think they won't affect the statistics very much.

Posted: **Fri Sep 22, 2017 8:54 am**

Uberdude wrote:Nice project davos (and I remember many fun games together back in old OGS days).

Thanks and yes, I remember them too

Uberdude wrote: I too have the feeling at mid-dan that loses to weaker players happen more than the system expects: in many smaller UK tournaments I go to (not London open or Brit championship) I'm the highest rated by a stone or two, so I'm a 4d (2395) playing 1-2ds. For these sort of typical games I need to win something like seven games for every one I lose to maintain the same rating. I am a bit of a byo-yomi blunderer but I think even for a typical player that ratio is too high for such a strength difference. From a rating standpoint these tournaments end up rather "nothing to gain and everything to lose", but I don't care so much about that anymore. Also I think the EGD doesn't award enough rating points for beating much stronger players: if I beat someone my rating I get 8 points, if I beat some 2700+ super-strong like Ilya, Fan Hui, Hwang Inseong etc I just get 16 points. I feel that deserves a lot more points than 2 wins against a 4d, and is also likely indicative of an improving player: a normal 4d won't beat an 8d but an improving 4d who will be 5d or 6d soon might so boost their rating so as not to hurt/deflate the normal 4ds they are crushing along the way. Also the 2800 doesn't have to lose the same number of points (and indeed only loses ~10 in EGD) so such an event is a good opportunity to inject extra points into the system to reflect the growing total strength of the player population.

In a standard Elo rating system, the maximum points gained or lost in a game is determined by the K factor. In chess, it is usually around 24. Some systems use 16 for higher ratings and 32 for lower ratings.
In the EGD system, it is called the con factor, which ranges from 10 to about 100, with a value of 24 at 1d.

In my revised system I use similar values in the dan region, but it does not grow as big for lower ratings to reduce wild rating oscillations at lower ratings. You can compare http://goratings.eu/Probabilities/Points_EGD with http://goratings.eu/Probabilities/Points_Revised to see the difference. Those charts also include the epsilon term. The revised system used a bigger value for epsilon. It seemed neccessary to reduce deflation over the years.

Posted: **Fri Sep 22, 2017 9:14 am**

I think it is not so small, looking at http://www.europeangodatabase.eu/EGD/cr ... dgob=false

Posted: **Fri Sep 22, 2017 9:18 am**

yoyoma wrote:I don't think expected results should match actual results, because some of the player population is improving, and therefore they are underrated. In the simplest case consider a rating system with only two players. One is rated 10kyu and one 1dan. Suppose the 1dan player's true rating remains 1dan, and the 10kyu player's true rating rapidly improves to 1dan. They play even games, but the 10kyu will be winning more than expected.

Maybe the 'a' function does need changing, but I don't think you can use the method you described. To find the correct 'a' function by looking at actual results you need to only look at games between players that are already accurately rated. Even in that case there will always be some uncertainty in the ratings, and that uncertainty will cause expected vs actual to be slightly different. I think I read a paper on chess Elo studies about this but I can't find it now.

Maybe another way to look at the problem you see, given what I think problem is (improving players), if the rating system responded to improving players faster it would reduce the problem.

Improving players are a source of deflation, lowering the ratings of the other players.
The EGD system has 2 mechanisms to handle this issue:

1: A go players enter a tournament with a declared rank. If a player improves quickly, he may skip a rank in the next tournament and the EGD will then reset that players rating to the new rank, so he won't have to earn those points (removing points from the system in the process). This is called a rating reset in the EGD system.

2: The EGD has an epsilon parameter which is intended to handle the issue of improving players. Every player gets some free points for every game to even out the points lost on average to improving players. It is implemented in such a way that lower rated players get more than higher rated players. Determining a good value for epsilon is tricky. You need to collect statistics to estimate the average points lost to improving players. My feeling is that the EGD uses a value that is too small, so I used a larger value. I chose it so that on average, it keeps a good match between declared ratings and computed ratings in the kyu range. My value tapers off in the mid-dan region to avoid inflating dan ratings to values greater than declared.

On average, it seems that declared ranks are a reasonable indication of peoples rating. I assume many kyu players who don't play many tournaments determine their declared ranks from casual handicap games against stronger players in their club. The EGD also contains handicap games, but I haven't come to analyze their statistics yet.

Posted: **Fri Sep 22, 2017 9:19 am**

Javaness2 wrote:I thought that the idea was not that they should match, but that the fit should be slightly better than it currently is.

It would be quite nice to see some kind of histogram for the rating distribution at (say) 2006 and 2016 for the 2 models

A good idea. I will do that.

Posted: **Fri Sep 22, 2017 9:21 am**

Javaness2 wrote:I think it is not so small, looking at http://www.europeangodatabase.eu/EGD/cr ... dgob=false

I'm not sure how you can judge the relative size of epsilon from that list.

Posted: **Fri Sep 22, 2017 10:50 am**

gennan wrote:
Javaness2 wrote:I think it is not so small, looking at http://www.europeangodatabase.eu/EGD/cr ... dgob=false
I'm not sure how you can judge the relative size of epsilon from that list.

Sorry, I was not very clear there. If you sort the last rating change in this page you will find several instances of positive rating change over 100 points in size. I suppose that these come from 30kyu entered as 20kyu, or some guy who hasn't played in a rated event for N months but has improved several stones in strength in the meantime.

So my idea there is essentially to resubmit the tournament result with the starting rank for such players adjusted to the value with which they exited on the first application on the algorithm. I hope that's a bit clearer. It's what the old FFG system used to do.

Posted: **Fri Sep 22, 2017 1:47 pm**

Javaness2 wrote:
gennan wrote:
Javaness2 wrote:I think it is not so small, looking at http://www.europeangodatabase.eu/EGD/cr ... dgob=false
I'm not sure how you can judge the relative size of epsilon from that list.
Sorry, I was not very clear there. If you sort the last rating change in this page you will find several instances of positive rating change over 100 points in size. I suppose that these come from 30kyu entered as 20kyu, or some guy who hasn't played in a rated event for N months but has improved several stones in strength in the meantime.

So my idea there is essentially to resubmit the tournament result with the starting rank for such players adjusted to the value with which they exited on the first application on the algorithm. I hope that's a bit clearer. It's what the old FFG system used to do.

Ok. I think your observation applies to the K factor rather than the epsilon parameter. The EGD uses a rather large K factor at low ratings that allows large oscillations like this (one could call it a quirk of the EGD rating system), but for higher ratings it is stable enough, I think.

I did use a smaller K factor and a larger epsilon to have smaller oscillations while still allowing the system to follow quickly improving players as good as the EGD. For example, this is the history of Mateusz Surma: http://goratings.eu/Home/History?PIN=12837968. But for quickly improving players, I think the EGD rating reset policy is quite effective (and for compensating the potential deflation from quickly improving players, it is more important than than the epsilon parameter). I used the same rating reset policy in my revised system.

But this policy is rather crude. The epsilon parameter tries to compensate for slowly improving players (improving less than 2 ranks between tournaments they participate in), of which there are many more than quickly improving players. The deflation effect of that is quite subtle (for the average of all active european tournament players, I estimate it at 2 rating points per year more then the EGD estimate).

A normal Elo system does not do iterations to find some equilibrium rating values and neither did I in my revised system. That kind of system sounds more like the WRH rating system from Rémi Coulom: https://www.goratings.org/en/. Rémi's ratings reflect the relative skill / succes of players to make a ranking list for the world's top players, but those ratings don't map to go ranks. For amateurs, a reliable mapping between ratings and go ranks is the most important feature of the EGD rating system IMO. I want to keep that feature and improve it. A normal Elo-like rating system or WRH-like system is not anchored. They are designed to maintain a ranking list where only the order and relative distance matters. They have no mechanism to compensate for subtle long term overall deflation or inflation. A mapping to go ranks that stays reliable over a 20 year time span is a different matter.

BTW, I'm still playing with the system, so some of these parameters will change.

Posted: **Fri Sep 22, 2017 2:14 pm**

Uberdude wrote:Also I think the EGD doesn't award enough rating points for beating much stronger players: if I beat someone my rating I get 8 points, if I beat some 2700+ super-strong like Ilya, Fan Hui, Hwang Inseong etc I just get 16 points. I feel that deserves a lot more points than 2 wins against a 4d, and is also likely indicative of an improving player: a normal 4d won't beat an 8d but an improving 4d who will be 5d or 6d soon might so boost their rating so as not to hurt/deflate the normal 4ds they are crushing along the way. Also the 2800 doesn't have to lose the same number of points (and indeed only loses ~10 in EGD) so such an event is a good opportunity to inject extra points into the system to reflect the growing total strength of the player population.

This behaviour is normal in an Elo-like rating system. Elo (and EGD) award rating changes by probabilities and those don't go over 1 (times the player's K factor when converting to rating points change). For the behaviour that you'd like, the system should award rating point change by odds, not by probabilities. It would be more like a betting system than an Elo system. Perhaps your preference is not unexpected, because Brits seem to rather like betting

.

But I think it's an interesting idea. I'll probably give it a try.

Posted: **Sat Sep 23, 2017 3:21 pm**

gennan wrote:
Javaness2 wrote:I thought that the idea was not that they should match, but that the fit should be slightly better than it currently is.

It would be quite nice to see some kind of histogram for the rating distribution at (say) 2006 and 2016 for the 2 models
A good idea. I will do that.

I added histograms for every year (since 1996) for both the EGD and the revised system: http://goratings.eu/Histograms.
I also fixed a bug and made some changes to the parameters.

Life In 19x19

Revised European go ratings

Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings