Revised European go ratings

gennan · Post by **gennan** » Sun Sep 24, 2017 8:25 am

I improved the histograms. They are now normalized rating distributions.
For example EGD rating distribution 2013 vs Revised rating distribution 2013.

When looking at the EGD distributions, it seems that over the years, many players graded around 1k (between 6k and 5d) have grown weaker than their rank.
In later years, this trend seems to be reversed a bit, but this could be due to players finally complying to the EGD rating system and demoting themselves.
But I suspect this is an artifact of the parameter values of the EGD system.

When choosing different parameter values (based on observations), the picture changes.
In the revised system, over the years, many players around 13k (between 18k and 8k) have grown stronger than their rank.
But I think it's understandable that they are conservative in promoting themselves, because that would mean 'disobeying' the EGD rating system.

Pio2001 · Post by **Pio2001** » Sun Sep 24, 2017 2:59 pm

gennan wrote:On average, it seems that declared ranks are a reasonable indication of peoples rating. I assume many kyu players who don't play many tournaments determine their declared ranks from casual handicap games against stronger players in their club. The EGD also contains handicap games, but I haven't come to analyze their statistics yet.

Hi,
I'm in charge of the registration of new players in the club of Lyon (France).

The only relationship between the ranks and the real strength that is enforced by the system is that the bottom rank is 30 kyu (I won't talk about grades because we don't use the european system in France).
Besides that, it is we, people in charge of giving their first rank to new players, that may shift the entire scale up or down in comparison to an objective playing strength. If we underevaluate the level of the players, the whole scale goes down, if we overestimate their abilities, the whole scale goes up.

I personally use to register beginners at 20 kyu, but some people rather register them at 30 kyu. We have no directions in this matter.

For players who are not beginners, the most widely used estimation is the kgs rank, but with a correction. When I started, 3 years ago, I was told that there was about 3 kyu of difference between the KGS scale and the EGF scale.
But it seems that year after year, the difference between the scales is getting bigger in the 10 kyu region. 10 years ago, 8 kyu KGS was around 11 kyu EGF. Now it's rather around 13 kyu EGF.

A problem is that now that we know this, instead of registering new players 3 ranks below their KGS rank, like we used to, we tend to register them 5 kyu below...
Which in turn widens the gap between the two scales, since we are thus deflating the EGF scale. This is an endless loop. In 10 years, maybe we will have to register people 7 kyu below their KGS rank, which will in turn get the two scales 9 ranks apart...

OGS made a survey of their members ranks around all the ranking systems in the world.
It turned out that the KGS scale was higher than all other scales in the world in the kyu region, and that the Tygem scale was skewed relatively to all other scales (the "size" of their ranks is different).

gennan · Post by **gennan** » Mon Sep 25, 2017 12:05 am

Pio2001 wrote:
gennan wrote:On average, it seems that declared ranks are a reasonable indication of peoples rating. I assume many kyu players who don't play many tournaments determine their declared ranks from casual handicap games against stronger players in their club. The EGD also contains handicap games, but I haven't come to analyze their statistics yet.
Hi,
I'm in charge of the registration of new players in the club of Lyon (France).

The only relationship between the ranks and the real strength that is enforced by the system is that the bottom rank is 30 kyu (I won't talk about grades because we don't use the european system in France).
Besides that, it is we, people in charge of giving their first rank to new players, that may shift the entire scale up or down in comparison to an objective playing strength. If we underevaluate the level of the players, the whole scale goes down, if we overestimate their abilities, the whole scale goes up.

I personally use to register beginners at 20 kyu, but some people rather register them at 30 kyu. We have no directions in this matter.

For players who are not beginners, the most widely used estimation is the kgs rank, but with a correction. When I started, 3 years ago, I was told that there was about 3 kyu of difference between the KGS scale and the EGF scale.
But it seems that year after year, the difference between the scales is getting bigger in the 10 kyu region. 10 years ago, 8 kyu KGS was around 11 kyu EGF. Now it's rather around 13 kyu EGF.

A problem is that now that we know this, instead of registering new players 3 ranks below their KGS rank, like we used to, we tend to register them 5 kyu below...
Which in turn widens the gap between the two scales, since we are thus deflating the EGF scale. This is an endless loop. In 10 years, maybe we will have to register people 7 kyu below their KGS rank, which will in turn get the two scales 9 ranks apart...

OGS made a survey of their members ranks around all the ranking systems in the world.
It turned out that the KGS scale was higher than all other scales in the world in the kyu region, and that the Tygem scale was skewed relatively to all other scales (the "size" of their ranks is different).

If I understand correctly, your issue is not about deflation of EGD kyu ratings. Your issue is about deflation of KGS kyu ratings relative to real life European kyu ranks.

With any of these internet rating systems, it's not easy to verify that one is more true than the other. Only with lots of data one could verify that rating distances match handicap distances. I think the EGD is better in this aspect than KGS (but even the EGD can be improved on IMO, that's what I'm trying to do here), but ofcourse you cannot use the EGD when players started on the internet and improved there before they started playing in real life (in a club or tournament).

When players start playing in real life and have an unknown real life rank, I think the best way to establish their real life rank, is to have them play a dozen or so real life handicap games with players having a "known" real life rank to find the equilibrium rank of the new player. That is the way it was done before internet go servers and rating systems existed and I think it's still the preferred way to do it. Isn't this method the thing that defines real life go ranks?

I don't know how the KGS rating system works and I don't know if it's possible to download a game results table somewhere to collect statistics and derive its characteristics for conversion to real life European ranks. AFAIK, only anecdotal data exists to guestimate a conversion and as you say, it changes over the years. So I don't see a good way to fix this issue. I'll have a look at the OGS survey data (but if it is not recent, it may be obsolete soon as all these systems drift away from this one data point in time). Also, internet games tend to be rather quick and I don't know for sure what effect this has on handicaps, but I suspect that short time limits increase handicaps between players. So perhaps internet handicaps inherently have a poor connection to real life handicaps, which means that internet ranks map poorly to real life ranks, which tend to be based on longer time limits.

A different (but connected) problem from handicap distances is that ideally the ratings of players should not go up or down much over the years if their actual playing strength stays the same. In real life, it is easy enough: That player just keeps the same rank. But in computed rating systems, it's not so easy to prevent slow rating drifts over the years. How to establish if a player's strength stays the same? I can only assume that higher ranked players are more stable than lower ranked players, so overall fitting to minimize drift of dan ratings compared to declared ranks seems like a reasonable strategy. So that is what I'm doing.

BTW, I consider "grade" and "rank" synonyms.

Schachus · Post by **Schachus** » Mon Sep 25, 2017 1:13 am

Pio2001 wrote:
gennan wrote:
For players who are not beginners, the most widely used estimation is the kgs rank, but with a correction. When I started, 3 years ago, I was told that there was about 3 kyu of difference between the KGS scale and the EGF scale.
But it seems that year after year, the difference between the scales is getting bigger in the 10 kyu region. 10 years ago, 8 kyu KGS was around 11 kyu EGF. Now it's rather around 13 kyu EGF.

Is that so? 3 years ago, I(in Germany) played my first tournament. I was 7k on KGS, so I registered as 8k and played 3:1. I believe it is fair to say that 8k was the right rank to register at given the local opposition, registering at 10k or even 12k would have been sandbagging.

The problem is more, that maybe in france or somewhere else, players of the same strength would call themselves 10k or 12k. And the rating system would encourage them to do so, because the rating system accepts self-estimated ranks way too much, the system that initial rating comes from self estimation instead of beeing calculated solely from performance in the first one or two tournaments, means that in different local societies ranks are skewed or shifted against one another, than also the ratings are shifted to fit the ranks, because new players bring ratings that fit the local rank scale. This way, the rating scale would need a lot more game of players from the different local regions against one another to uniformize, than would be needed, if ratings would work intrinsicly and not correct themselves to fit the rank people claim to have.

e.g, tell me why it was usefull to initialize this player at 2700 GoR:
http://www.europeangodatabase.eu/EGD/Pl ... y=18437485
Yes, he claimed to be 7d, but he played a lot of games at his first event, that clearly show, his strength is nowhere near 2700 GoR. A good rating system should calculate some kind of performance from his games and initialize him on that (would be around 2400 GoR maybe). There is no sense a initializing him on way too high rating and thereby gifting his opponent rating points (if your 5d and beat a 7d, that a lot points, in truth he was maybe a 4d EGD, so it wasnt that remarkable the 5d beat him).

As long as your revised history gives him clearly more current rating than the official rating, I dont think your revision improved the major issues

Uberdude · Post by **Uberdude** » Mon Sep 25, 2017 1:48 am

There are big differences in attitude/frequency of rating resets around Europe. I believe the EGD system allows (and perhaps even encourages them) for external changes of at least 2 ranks in strength. In Britain we do rating resets, but in other countries they are much less common and even frowned upon (e.g. Czechia, France). I seem to recall they aren't even allowed at dan grades in France (which also has its own internal rating system). I remember at the Cambridge University club we had a Czech student whose official rating was something like 13 kyu but he was 5 kyu in strength. He was adamant he shouldn't reset to 5 kyu: the rating system is not meant to reflect one's strength, it is points you have to earn and resets are cheating that.

Also as Scachus says if a new player's declared rating is obviously wrong you should change it when you submit the results after the tournament. We do try to do this in Britain but it can be hard if you don't realise after the first tournament (e.g. Stephen Hu AGA 5d entered his first British tournament as 3d and won all 4 games (but only 2d- opposition), and then 5-2 at London against stronger opposition so clear 4d was more appropriate but we didn't: I think because resets <2 ranks aren't allowed (though you can fiddle by swapping between English and Chinese names!) but maybe incompetence.

Javaness2 · Post by **Javaness2** » Mon Sep 25, 2017 2:04 am

France has a committee to forcibly apply rating resets. My earlier point about players who win more than 100 points in a tournament - well that is an idea I think would be useful to forcibly address the issue of countries who do not use rating resets. Gaining more than 100 points should be an indication that your original rating was a lie.

gennan · Post by **gennan** » Mon Sep 25, 2017 10:47 am

Schachus wrote: The problem is more, that maybe in france or somewhere else, players of the same strength would call themselves 10k or 12k. And the rating system would encourage them to do so, because the rating system accepts self-estimated ranks way too much, the system that initial rating comes from self estimation instead of beeing calculated solely from performance in the first one or two tournaments, means that in different local societies ranks are skewed or shifted against one another, than also the ratings are shifted to fit the ranks, because new players bring ratings that fit the local rank scale. This way, the rating scale would need a lot more game of players from the different local regions against one another to uniformize, than would be needed, if ratings would work intrinsicly and not correct themselves to fit the rank people claim to have.

e.g, tell me why it was usefull to initialize this player at 2700 GoR:
http://www.europeangodatabase.eu/EGD/Pl ... y=18437485
Yes, he claimed to be 7d, but he played a lot of games at his first event, that clearly show, his strength is nowhere near 2700 GoR. A good rating system should calculate some kind of performance from his games and initialize him on that (would be around 2400 GoR maybe). There is no sense a initializing him on way too high rating and thereby gifting his opponent rating points (if your 5d and beat a 7d, that a lot points, in truth he was maybe a 4d EGD, so it wasnt that remarkable the 5d beat him).

As long as your revised history gives him clearly more current rating than the official rating, I dont think your revision improved the major issues

I can only say that this is the way the EGD worked for 20 years. I suppose even before the EGD, this was just the way things were done. I'm not involved in the policies used by the EGD and tournament organizers. I'm merely studying the EGD system and studying areas where it might be improved, because it happens to interest me. I'm not an EGF official of any kind.

The issue of reliability of newcomers' ranks was not one of the things I intended to study, but now that you and others mention this being a problem, I can think of a way to deal with it more gracefully: the system could use a grace period for a newcomer (say 10-15 games). During this period, the system would use a reduced K factor for the newcomer's opponents and an increased K factor of the newcomer herself. Then gradually, the system would settle the K factors towards their normal values as the newcomer's rating settles.
Some Elo rating systems have this kind of feature (like the FIDE rating system), but the EGD doesn't. I could add this feature to my revised system.

gowan · Post by **gowan** » Mon Sep 25, 2017 11:03 am

I think the method of having initial ratings determined by playing several tournament games is not free from flaws. For example, what do you do with an unrated visitor from Korea who wants to play in a tournament but won't likely play in any other tournaments? Some online servers have people have ratings with a ? until several games have been played. The problem with that in face-to-face tournaments is that opponents get no effect from winning (or losing) games. I think that in practice the best method of determining initial ratings is to have organizers or officials judge what the initial rating should be, taking into account self rating, online ratings and ranks from other organizations. Of course there will be some improper ratings but any effects will be smoothed out over time.

gennan · Post by **gennan** » Mon Sep 25, 2017 11:24 am

Javaness2 wrote:France has a committee to forcibly apply rating resets. My earlier point about players who win more than 100 points in a tournament - well that is an idea I think would be useful to forcibly address the issue of countries who do not use rating resets. Gaining more than 100 points should be an indication that your original rating was a lie.

The EGD uses quite a large K factor for lower ratings. A newcomer who's rating is about 20k to the best of her knowlegde, could win 3 or 4 games in her first 5 game tournament (playing opponents ranging from 20k to 18k) and the EGD rating system could well award her more than 100 points for this result. This is not an exceptional situation IMO and I would definitely not call her a liar. She did nothing wrong. If you want to put blame somewhere, perhaps you should blame the large K factor the EGD uses at lower ratings.

But if I build a grace period into my revised system (see my previous response to Schachus), it would also involve a large K factor for the newcomer herself while her rating settles. That could mean that she gains many points during the grace period, more than 100 points even: perhaps she estimated her rank at 19k while in hindsight, 17k was closer to the truth, but she just didn't know any better. I think that a perfectly calibrated grace period system should award her 200 points before her grace period ends to conserve her future opponents' ratings. Would you consider such a system morally wrong?

gennan · Post by **gennan** » Mon Sep 25, 2017 12:11 pm

gowan wrote:I think the method of having initial ratings determined by playing several tournament games is free from flaws. For example, what do you do with an unrated visitor from Korea who wants to play in a tournament but won't likely play in any other tournaments? Some online servers have people have ratings with a ? until several games have been played. The problem with that in face-to-face tournaments is that opponents get no effect from winning (or losing) games. I think that in practice the best method of determining initial ratings is to have organizers or officials judge what the initial rating should be, taking into account self rating, online ratings and ranks from other organizations. Of course there will be some improper ratings but any effects will be smoothed out over time.

That is indeed a good argument against a grace period: suppose an upcoming 5d plays a "real" korean 7d newcomer and manages to win a hard fought battle. Indeed, it would be disappointing for the 5d to not gain any points for this achievement, because his opponent is still in his grace period.

If one actually want to solve the issue, a system like Rémi Coulom's WRH system is probably a better solution than an Elo-like system, because WRH would be robust against misranked newcomers while still granting the 5d mentioned above his deserved points for winning against the real 7d newcomer.

If no grace period is used in an Elo-like system, some players will "undeservedly" gain or lose points against misranked newcomers. But perhaps undeservedly gaining or losing points is just a fact of life as a go player: "undeserved" wins and losses happen quite often in regular games anyway. As long as misranked opponents are not too common, the overall effect should even out in a similar way as undeservedly winning or losing against regular opponents.

hyperpape · Post by **hyperpape** » Mon Sep 25, 2017 2:03 pm

In these discussions, doesn't Herman usually come around and mention that for kyu players, self-estimated ratings are more accurate than the ones the system dispenses?

HermanHiddema · Post by **HermanHiddema** » Mon Sep 25, 2017 2:09 pm

@hyperpape: If you insist

Is a link to a previous post sufficient? viewtopic.php?p=74100#p74100

Pio2001 · Post by **Pio2001** » Tue Sep 26, 2017 3:20 pm

Hi,
I don't know how the european system works, but in France, there are several adjustments made to the rating system.
For new players, an iterative system is performed to correct the rank. If the variation after the first tournament is more than -50 or +50, the player's registration grade is replaced with it's final grade, and the calculations are performed again. The final grade is fed in place of the initial grade as long as the variation is more than 50.
It allows to start the ranking of a new player according to his/her performance during the tournament, rather than guessing.
The system works well except if the first ranked game is a single game in the club. If the result is not meaningful, then the same adjustment can't occur during the second tournament of the player.

However, a second type of iterative adjustment is possible : if, during any tournament, the grade increases more than 60 points, then the iterative system is performed. This only works for positive variations. There are no iteration for negative variations, except during the very first tournament of new players.

These correction are also useful for their opponents, as the correction occurs before all the calculations.
For example, if a young player has gone from 5 kyu to 1 dan during the holidays, and they get a +200 correction, their opponents are considered to have lost against a 3 kyu player instead of a 5 kyu.

Also, the french federation uses a weighting parameter for handicap games that is biased towards the higher grades : the variation of the players for a handicap games are multiplied by (1 - H/10), except if while looses. In this case, the variation of white's grade is again multiplied by (1 - H/10).
It means that handicap games are weighted at 90 % of their values for a 1 stone handicap, 10 % for a 9 stone game, and that strong players have a special protection when they play handicap games against weaker players. For a 9 stone game, their variation is only 10 % of the calculation if they win, but 1 % if they loose (their opponent still have 10 % in this case).

gennan · Post by **gennan** » Tue Sep 26, 2017 11:43 pm

It does seem quite complicated to invent a rating system to robustly accomodate quickly improving players that play few tournaments. I suspect it's just about impossible to get it right. You may add many rules and variables, but basically the system just does not have enough information from tournament game results alone.

For kyu players especially, just believing the declared rank seems like a better strategy (like Herman says). A system that liberally resets kyu ratings tends to be closer to the truth than a system that adheres much more weight to the previous rating than the currently declared rank. If a 9k club player became 6k in the period after his most recent tournament 6 months earlier (based on his personal experience in handicap club games played in between), why not just have him declare himself 6k in the next tournament and have the system adhere much belief to that information (like the tournament organizers)?

Do you think that kyu players tend to balatantly underrate or overrate themselves when declaring a rank in real life tournaments? I don't believe this is true. I think most kyu players are quite judicious when declaring a rank in a real life tournament. I think that a rating system that claims to map well to go ranks, should follow players declared ranks rather than the other way around, especially for lower ranked players that play few tournaments.

In the (higher) dan region the system should be insensitive to changes in declared ranks. It is unlikely that a 4d becomes 7d within a few months. Also, (higher) dan players play more tournament games, so the system tends to have enough information about them. In that case, adhering more weight to the previous rating than to the declared rank seems like a reasonable policy to me. So for a 4d to become 7d in the rating system, he'll just have to earn those points in tournaments. The system won't automatically apply a reset when he changes his declared rank from 4d to 7d.
If he happened to study in China for a year and really became 7d without playing tournaments in Europe, the system administrator could reset him by special request, but I think this happens very rarely (has it ever happened?).

I'm now experimenting with adhering more weight to declared ranks, particularly lower ranks. Not like the current system with a hard reset to the new rank when skipping a rank (which is a bit too crude IMO), but a bit more sophisticated.

Krama · Post by **Krama** » Wed Sep 27, 2017 8:49 am

Is it possible to do a comparison of goratings.org and goratings.eu?

Perhaps to somehow place goratings.org into the eu version and see how euro players compare to aisan pros?

Life In 19x19

Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings