Revised European go ratings

Pio2001 · Post by **Pio2001** » Wed Jul 11, 2018 1:32 pm

gennan wrote:It's possible to extract statistics for that, but the question would have to be specified more precisely.

The question arose during discussions about the rules for regional tournaments. We were discussing about the handicap of the games. Some people have the feeling that making the players play with full handicap is too much, because the real difference between players separated by 5 ranks, for example, is less than 5 stones in reality.

That's the question we are wondering about : what's the win/loss ratio when two players of different rank at the french federation are playing with full handicap ?
In fact, most of the handicap games in France are played with handicap minus one, so the stats about these are maybe the most relevant.

Also, these local tournaments are made so that people from various clubs can play together. So we preferably pair together players with a small difference in rating. A good question for us, for example, is "is it fair to give three stones if the players are three ranks apart" ?
6 stones for players 7 ranks apart is the highest gap I've seen this year in my club.

gennan wrote:But I don't think the EGD ratings should even be used to determine ranks. Ranks are in principle determined by handicap games and I guess that most handicap games are informal games that never enter the EGD. The EGD collects mostly even game results from tournaments and then it guestimates ranks from even game statistics with a formula that doesn't even match with the EGD statistics.

I don't know the workings of the EGD

In France, rank, rating and grade are the same thing. The rank is given by the rating, and there are no grades.
Players are not allowed to register a tournament with a rating different from their official one. If the gap is too high, a special request must be sent to the managers of the rating list at least two weeks before the tournament in order to correct the rating.
After the tournament, the new french rating is calculated, then the results are sent to the EGF. When the results reach the EGF, the initial ratings (before the tournament) overwrite the european ratings.

For example, let's take the example of a player with a rating of 1500 in both french and european lists. After a first tournament, both the french federation and the european federation make their own calculations, and the player becomes, say, 1580 in the european list, and 1650 in the french list.
The next tournament, the same player must register at 1650 (that's the rule). When the results of this second tournament are sent to the european rating list, the player is first moved from 1580 to 1650 in the european list (*), then his new rating is calculated separately in both lists.
So the EGF rating of a player is always the same as his french rating, except for the calculations for his last tournament.

(*) In fact, the french rating is converted to a rank, then this rank is sent to the EGF, that converts it back to a rounded rating.

Pio2001 · Post by **Pio2001** » Wed Jul 11, 2018 1:44 pm

gennan wrote:I think a rating system is useful to track tournament results of stronger players, but the mapping between rating and rank should be more fluid. A rating system can compute ratings for players that play in tournaments regularly, but it should just register the associated declared ranks. By continually updating the correlations between ratings and declared ranks, it could generate mappings that change over time (and the mapping could even be different for different countries).

I'm a bit lost in the differences between ranks and ratings. In France, these are just two measurement units for the same thing : the official strength of the player.

The first goal of the rating list is to be able to pair players of the same strength.
The second goal of the rating list is to give the right handicap when players of different strength must be paired.

And a requirement for a rating list is not to drift too much from reality. At least, the rating of a player whose strength is progressing should not go down !

gennan · Post by **gennan** » Wed Jul 11, 2018 2:02 pm

Pio2001 wrote:
gennan wrote:It's possible to extract statistics for that, but the question would have to be specified more precisely.
The question arose during discussions about the rules for regional tournaments. We were discussing about the handicap of the games. Some people have the feeling that making the players play with full handicap is too much, because the real difference between players separated by 5 ranks, for example, is less than 5 stones in reality.

That's the question we are wondering about : what's the win/loss ratio when two players of different rank at the french federation are playing with full handicap ?
In fact, most of the handicap games in France are played with handicap minus one, so the stats about these are maybe the most relevant.

Also, these local tournaments are made so that people from various clubs can play together. So we preferably pair together players with a small difference in rating. A good question for us, for example, is "is it fair to give three stones if the players are three ranks apart" ?
6 stones for players 7 ranks apart is the highest gap I've seen this year in my club.

gennan wrote:But I don't think the EGD ratings should even be used to determine ranks. Ranks are in principle determined by handicap games and I guess that most handicap games are informal games that never enter the EGD. The EGD collects mostly even game results from tournaments and then it guestimates ranks from even game statistics with a formula that doesn't even match with the EGD statistics.

I don't understand. In fact, 5 stones should not even be enough for the weaker player with a rank difference of 5, because it's a bit less than full handicap. Actual full handicap would be 6 stones and white getting komi. (1 stone with komi is an even game, 2 stones with komi is 1 full move advantage for black, corresponding to 1 rank difference, etc). So white already gets half a stone advantage when black only gets 5 stones. But you increase white's advantage even more by only giving black 4 stones, increasing white's advantage to 1.5 stones.

So if full handicap is generally too much for the stronger player, I think the proper action is to promote the weaker player until he scores about 50% winrate with a smaller handicap. Isn't this the foundation of go ranks? If rank differences are not related to full handicap differences, then what do ranks even mean? One might just as well drop it altogether.

It seems you are trying to invent a new handicap system that reduces the handicap just to deny weaker player promotion. That strikes me as the opposite of the proper procedure.

HermanHiddema · Post by **HermanHiddema** » Wed Jul 11, 2018 2:06 pm

Pio2001 wrote:I'm a bit lost in the differences between ranks and ratings. In France, these are just two measurement units for the same thing : the official strength of the player.

Ranks are a measure of how many handicap stones player A needs aginst player B.
Ratings are a measure of how likely player A is to win a game against player B.

These are related, but the exact relation is unknown, and may not be linear on either playing strength or handicap size.

E.g:

1. Given that player A wins 80% of games against player B, can we say for certain how much handicap B would need to give them even chances? Is that constant, or would it be different if these players are DDK, SDK or Dan level players?

2. Given that player A gives 3 handicap to player B and with that they score about 50-50 in their games, can we say what percentage of even games player A would win? Again, is this constant, or would it be different if these players are DDK, SDK or Dan level players?

Perhaps both "handicap stones needed" and "win percentage" can be represented by a single number, perhaps not. I've never seen any conclusive research on the matter. Until we know for sure, we need to make a distinction between ranks and ratings when talking about research on the subject, as we are doing here.

EdLee · Post by **EdLee** » Wed Jul 11, 2018 2:26 pm

Is that constant, or would it be different if these players are DDK, SDK or Dan level players?

Hypothesis: the exact distance ^(*) between any two individuals is unique between them.
( Thus independent of other individuals. )

Common anecdotal observation: A beats B 60% of the time, B beats C 60% of the time, C beats A 60% of the time (all else equal). ^(**)

I've never seen any conclusive research on the matter.

Yes, more research can only help.

_____
^(*) Distance in terms of Go levels; physical striking distance between living organisms, etc.
^(**) Not just in Go, but in other domains and organisms as well.

gennan · Post by **gennan** » Wed Jul 11, 2018 2:37 pm

I'm beginning to understand the question: In France, ranks aren't really used. Only ratings are used and a French rank is only a shorthand for the rating. Let's call it a "rating rank".

But ratings aren't based on handicaps, like Herman said. You are discovering that the "rating ranks" derived from even game winrates drifted away from "handicap ranks" since the rating system started in 1996. The handicaps don't match anymore (which is another way of discovering the artifacts of the rating system that I'm complaining about).

So you found this issue and now you need a formula to derive handicaps from rating differences (which comes down to correlating ratings with "handicap ranks"). It can be done, but I feel that "handicap ranks" have always been the "proper" rank. Why have "rating ranks" that are different from "handicap ranks"? It would be a step in the wrong direction IMO.

Pio2001 · Post by **Pio2001** » Wed Jul 11, 2018 3:03 pm

gennan wrote:I don't understand. In fact, 5 stones should not even be enough for the weaker player with a rank difference of 5,

That's right... in theory. In practice, some players feel the opposite.

That's why I wanted to have a look at the real thing, from the actual game results, to see if they are right or wrong.

gennan wrote:So if full handicap is generally too much for the stronger player, I think the proper action is to promote the weaker player until he scores about 50% winrate with a smaller handicap. Isn't this the foundation of go ranks?

Yes, but how ? We don't even know if the current handicap is actually too much, even less what would be the proper one.

HermanHiddema wrote:Ranks are a measure of how many handicap stones player A needs aginst player B.
Ratings are a measure of how likely player A is to win a game against player B.

Thanks, That makes sense.

HermanHiddema wrote:These are related, but the exact relation is unknown, and may not be linear on either playing strength or handicap size.

Perhaps both "handicap stones needed" and "win percentage" can be represented by a single number, perhaps not. I've never seen any conclusive research on the matter.

Well, for the european ratings, Gennan's work has already given some answers. From what I've read in this thread, yes, it can be represented by a single number, but the parameter "a", that translates ranks into grades, is quite wrong.
However, the rank / grade correspondance is not bad. It seems that the "H" parameter in the rating's calculations, that represents directly the number of handicap stones, has more weight in the end than the "a" parameter.

Pio2001 · Post by **Pio2001** » Wed Jul 11, 2018 3:33 pm

gennan wrote:I'm beginning to understand the question: In France, ranks aren't really used. Only ratings are used and a French rank is only a shorthand for the rating. Let's call it a "rating rank".

Yes, that's it.

gennan wrote:So you found this issue and now you need a formula to derive handicaps from rating differences

Not quite. Some players pretend that there may be an issue, but I think they may be wrong, because your stats show that there are no issues in the european ratings, and french ratings should be very close.

gennan · Post by **gennan** » Wed Jul 11, 2018 4:07 pm

So the question is how to "fix" this handicap issue.

When stronger player systematically cannot give the handicap according to the rating system, I would say it means the weaker players are systematically underrated. And when they play few tournaments or when they play tournament games mostly against similarly underrated players, their rating hardly increases and they remain underrated. Over time, this can become a large group of players that keep each other underrated.

They may beat stronger players all the time in informal handicap games (proving they are underrated), but few of those games enter the rating system (and handicap parameters hardly matter when there is no data). And when some handicap games manage to enter the system, they often have a lower weight. The rating system just doesn't have the tools to fix it without rating resets. Rating resets are needed to inject points into the system, otherwise improving players will drag everybody down. The epsilon parameter on its own is not enough to prevent that. But when self-promotion or club-promotion is forbidden, such rating resets are probably rare. So I wouldn't pin my hopes on "fixing" the issue while keeping these policies intact.

It's a catch-22. But "fixing" it by adjusting the handicap system so that the underrated players get less handicap without being granted promotion seems unfair to me and I think it will only make things worse in the long run. The only way to reduce this issue while keeping French policies intact, is French players collecting rating points by beating foreigners in international tournaments. This injects badly needed rating points into the French ratings, basically relying on foreign rating resets to counter deflation of French ratings a bit.

gennan · Post by **gennan** » Wed Jul 11, 2018 5:48 pm

Pio2001 wrote:
gennan wrote:So you found this issue and now you need a formula to derive handicaps from rating differences
Not quite. Some players pretend that there may be an issue, but I think they may be wrong, because your stats show that there are no issues in the european ratings, and french ratings should be very close.

The premises of the EGD ratings + ranks are that 100 rating points corresponds to a full handicap stone and that 1k corresponds to 2000 rating. Handicap without komi should then be h = 0.5 + (rating difference) / 100 (and vice versa rating difference = (h - 0.5) x 100).

I found that the premise of 100 rating points per handicap stone holds reasonably well over 20 years. I don't know for sure how accurate that is, but the deviations seem within about 10 points in the middle kyu range where there is the most handicap game data. But it will probably deviate more with smaller samples (like higher ranks only for only 5 years for only 1 country).

The premise of 1k corresponding to 2000 rating also seems to hold reasonably well. The drift seems not much more than about 50 points average deflation in the mid dan region over 20 years. But overall drift doesn't really matter for this topic of handicaps.

These findings are for the total of the European ratings. Individual countries probably have larger deviations, but the data will be more noisy and the samples sizes are smaller, making it harder to detect specific biases in the data.

France does have relatively conservative promotion policies and it has been like that for many years. By systematically delaying promotions, it's possible that France has grown to 115 points per full handicap stone in the strong kyu region. A handicap of 5 stones should correspond to 4.5 x 100 = 450 points, but with 115 points per handicap stone, 5 stones would correspond to 4.5 x 115 = 517.5 points. It would mean that the rating difference for any handicap is more than expected in France, making it tough for lower dans to give the "correct" handicap to stronger kyu players.

But this 115 points is just a guess from my part. I don't think the EGD contains enough French handicap game data to detect accurately if France really has 115 rather than 100 points per full handicap stone around 1k in recent years.

But do you need an accurate value? If you suspect it's more than 100 points per handicap stone in France and if you want to "fix" it by reducing handicaps in tournament games (trying to protect stronger players' ratings while keeping promotions to a minumum), you can just pick a formula that seems about right, like h = 0.5 + (rating difference) / 115 (and vice versa rating difference = (h - 0.5) x 115).

Life In 19x19

Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings

Re: Revised European go ratings