Revised European go ratings

gennan · **#41**

I added statistics of handicap games. For example, see 4 handicap probabilities.

The handicap statistics are more noisy than the even game statistics, but I suppose the smaller number of games is a factor here. Particularly, severely overhandicapped or underhandicapped games are rather sparse.

But overall, the rule that correct handicap should match the rank difference seems to hold for the EGD system (and the revised sytem). It means that the observed winrate is close to 50% when the handicap matches the rating difference (handicap = ratingDifference / 100 - 0.5 * sign(ratingDifference). This seem to hold for all ratings and all handicaps. I cannot claim this with great confidence, because the data is somewhat sparse and noisy, but overall, the data seems to confirm it.

There is not much difference between the EGD system and the revised system in this respect, but this can be expected, because the EGD bias usually doesn't exceed half a stone and that is a bit too subtle to detect in the handicap statistics.

So on average, the EGD ratings seem quite suitable for predicting handicap differences, which means that the overall shape of the rating distribution is ok.
So fitting a rating function to the observed winning odds should result in a rating scale that matches rating differences with handicaps / rank differences.

The predicted winning odds (predicted by the EGD system) don't match the observed winning odds well, so what else can explain that handicaps match rating differences rather well in the EGD?

I think the reset policy is the reason: newcomers and quickly improving players keep feeding the EGD system with declared ranks that are basically about right with respect to handicap.
This keeps the system calibrated with respect to handicaps to the 1st order. The mismatched expected winning odds of the EGD cause only a second order bias on top of this 1st order calibration. It only affects players that play many games without resets: active old-timers.

So I would call the EGD reset policy crude but effective. Overall, it makes the system work in spite of the mismatched winning odds.
Basically, believing newcomers' declared ranks and believing self-promotions helps a lot to keep the system calibrated.

So having a reset policy is quite important and I included a reset policy in the revised system (I'm only trying to make it a bit more sophisticated).

gennan · **#42**

I added an addendum at the bottom of the About page. It is a bit long to post here and it repeats things I posted here earlier, but here is a copy:

Quote:

Addendum 2017-10-05
Example:
If the predicted winrates matched observed winrates exactly, on average players would not gain nor lose points when everybody's skill stays the same.
But the expected winrates don't match observed winrates in the EGD. In the example below I show what happens because of it.

We have a player with rating 2100. He plays games against a player with rating 2000. The EGD expects him to win 71% of these games. In reality he wins about 60% (as observed in the statistics of the EGD).
So his winrate minus the expected winrate is -0.11. His K factor is 24, so on average he will lose 2.6 points per game played against this opponent.

The same player also plays against another player with rating 2200. The EGD expects him to win 26% of these games. In reality he wins about 35% (as observed in the statistics of the EGD).
So his winrate minus the expected winrate is +0.11. Using his K-factor of 24, we find that on average he will win about 2.6 points per game played against this opponent.
So if he plays both opponents with the same frequency, his rating will not change on average.

But the demographics of the EGD data show that since 2003, players rated around 2000 appear more frequencly in tournament games than players rated around 2200 (the ratio is about 5:4). Correcting for this we find that his rating will change by about (5 * -2.6 + 4 * 2.6) / 9 = -0.29 points per game in this demographic distribution.
This is not much, but if this player plays 25 games a year, which is typical for tournament players in this rating region, he will lose 6 points in a year and over 10 years, every player around 2100 rating would lose 60 points.

But the EGD also uses an epsilon parameter. This will give this player 24 * 0.016 = 0.39 free points for every game he plays. This is more than enough to compensate for the expected winrate errors.
One could argue that this epsilon correction would not be neccessary if the expected winrates were closer to reality (My finding is that this is indeed the case and I see no reason to keep these winrate errors).
Nevertheless, it would seem that the expected winrate errors are more then compensated by the epsilon parameter.

Then?
Still, there is a gradually increasing difference between declared ranks and ratings in the EGD rating distributions, with a maximum of about 50 points around the lower dan region in 2012. This trend is reversing a bit in recent years, but my theory is that in recent years, dan players chose to comply with the rating system instead of looking at the ranks they have according to handicap.
What other causes could there be?
Is it that players around the lower dan region were overranking themselves more and more between 1996 and 2012?
I cannot rule this out, but neither can I rule out that this deflation is caused by a defect of the rating system.

Another possible cause for deflation is improving players. There are two mechanisms in the EGD that are supposed to compensate for this.
1: The rating reset policy. This is to prevent quickly improving players from removing many points from the system. But the EGD only resets players who get 2 stones stronger between tournaments. That is rather conservative, because getting stronger is usually a more gradual process. Most players don't get stronger that quickly.
2: The epsilon parameter. This should compensate for slowly improving players (which is a much bigger group I assume). But as we have seen above, 3/4 of the epsilon parameter is used up to compensate for the expected winrate errors. So of the original 0.016, only 0.004 is left to counter deflation from slowly improving players!

So in the end, there isn't much to counter the deflation caused by slowly improving players and they will inevitably take away points from the system, leading to deflation.

So what to do?
1: Fix the expected winrate errors.
2: Use a less conservative reset policy.

The revised system does both and I find it has no need for an epsilon parameter.

ez4u · **#43**

The example seems incomplete. Could you describe what happens to the ratings of all three players (2000 player, 2100 player, and 2200 player) if they each play 24 games evenly split between the two opponents, or somehow weighted based on the sizes of the underlying pools? It seems insufficient to view the problem through the results of a single player when two play in each game and the problem is described in terms of three.

Javaness2 · **#44**

It's quite an interesting result to me that you don't need any epsilon parameter if you improve the rating reset policy, I have to read further

Schachus · **#45**

My take is that this is neither surprising nor an improvement. The "problem" you are trying to fix is that ratings dont match the ranks(also not the average Rating over the rank). Of course, if you give kyu players a reset each time they improve the decrlared rank, then ratings will "fit" the rank better, because rating is often forced to fit it. But it is questionable, whether this is an improvement.
Why is it actually a problem, that ratings dont fit ranks? It is a fact, that the difference between 1k and 1d is smaller than other ranks, because people like to call themselves dans, so they change their rank to 1d often prematurely. This effect(and similar in neighboring ranks, because people match that behaviour(so if the 1d is not much stronger, they go to 1k)) is seen in this "problem" and to me its not a problem that needs to be fixed at all.
Of course rating have problems but self estimations have them as well. In chess there is no such thing as self estimation and there are studies showing 80% of players believe to be underrated. Now is that likely, or are most of them just overestimating themselves? Now you could say, this is a problem, we need to reset their rating to their estimation, but you would screw up more then you are fixing, because rating becomes less objective.

In fact, there might be players(me for example), who explicitly dont want a reset, as long as they are just slow and steady improving, because the rating is the testimony for said improvement(you dont just imagine it, your results really get better). I got one reset from 8k to 5k because it was appearent I improved and the old rating was not appropriate for me anymore. And since then I only increased my rank by 1 (more or less following the rating), thus I can see my rating evolve from the reset to 1600 to its current stage(1802) documenting my improvement. With your system, the rating got reset to 4k and to 3k (and would have gotten reset to 2k, if not for the fact, that the tournament where I registered experimentally as a 2k for some reason didnt make its way into EGD). So the only thing I see in your version of the rating, is that It improved from 1800 reset to 1840 over last 2(or3?) tournaments, the tournaments before are rendered irellevant to the ratng cause of the reset.

On a positive note: One thing that is much better in your system is the handling of weak players(20k level). In EGD their ratings are largely screwed up, bacause EGD has a bottom cuttoff of 100 rating(like they couldnt handle negative numbers!?). Now there is handicap tournaments for children, where 20k beats 30k with 9 stones, which is just as expected. But 30k counts like 20k for EGD, so the EGD thinks the 20k did the equivalent of beating an 11k, and boots his rating a lot. This leads to large inflation of rating in the ranks below 15k, and I think your system is in fact much better there.

HermanHiddema · **#46**

Schachus wrote:

Why is it actually a problem, that ratings dont fit ranks?

Ranks are based on handicaps, and are used to determine handicaps. They have been used for that for centuries, and have worked very well in that respect.

Ratings are based on win percentages, and cannot be used to determine handicaps from that alone (i.e. if player A defeats player B in 83% of even games, what is the proper handicap?)

If you want to use ratings to determine handicaps, you have two options:

1. Publish rating-handicap tables (e.g. at rating 1400, handicaps 1, 2, 3, 4, 5, 6 are at 1467, 1530, 1590, 1649, 1705, 1758 or something like that)

2. Fiddle with the rating parameters (scale the ratings) so the handicaps line up with multiples of 100 reasonably closely.

The EGF has chosen option 2, which IMO is a much saner option, because it is much more user-friendly. This has always been a feature of the EGF system.

Dave's work here is a suggestion to improve the parameters slightly so that the ratings line up more closely with historical rank data.

Unless you can show that the current rating parameters provide more accurate handicap determination, i.e. that the historical data is systematically biased, I don't see any reason not to try to match the historical data as closely as possible.

gennan · **#47**

Schachus wrote:

My take is that this is neither surprising nor an improvement. The "problem" you are trying to fix is that ratings dont match the ranks(also not the average Rating over the rank). Of course, if you give kyu players a reset each time they improve the decrlared rank, then ratings will "fit" the rank better, because rating is often forced to fit it. But it is questionable, whether this is an improvement.
Why is it actually a problem, that ratings dont fit ranks? It is a fact, that the difference between 1k and 1d is smaller than other ranks, because people like to call themselves dans, so they change their rank to 1d often prematurely. This effect(and similar in neighboring ranks, because people match that behaviour(so if the 1d is not much stronger, they go to 1k)) is seen in this "problem" and to me its not a problem that needs to be fixed at all.
Of course rating have problems but self estimations have them as well. In chess there is no such thing as self estimation and there are studies showing 80% of players believe to be underrated. Now is that likely, or are most of them just overestimating themselves? Now you could say, this is a problem, we need to reset their rating to their estimation, but you would screw up more then you are fixing, because rating becomes less objective.

If you can't trust declared ranks, then the whole rank system of go means little. Then why not use a pure Elo system that has no relation to ranks? That would be a different kind of sytem (like goratings.org).
But the EGD claims a relation between their ratings and ranks. It's a great feature, but it adds the burden to make sure it's about right, no? Is there a better way to calibrate it than basically believing declared ranks?

Schachus wrote:

In fact, there might be players(me for example), who explicitly dont want a reset, as long as they are just slow and steady improving, because the rating is the testimony for said improvement(you dont just imagine it, your results really get better). I got one reset from 8k to 5k because it was appearent I improved and the old rating was not appropriate for me anymore. And since then I only increased my rank by 1 (more or less following the rating), thus I can see my rating evolve from the reset to 1600 to its current stage(1802) documenting my improvement. With your system, the rating got reset to 4k and to 3k (and would have gotten reset to 2k, if not for the fact, that the tournament where I registered experimentally as a 2k for some reason didnt make its way into EGD). So the only thing I see in your version of the rating, is that It improved from 1800 reset to 1840 over last 2(or3?) tournaments, the tournaments before are rendered irellevant to the ratng cause of the reset.

So you'd rather have an epsilon parameter than a liberal reset policy. It's possible, but it is quite difficult to determine the correct value of such an epsilon parameter.
I did make my reset policy a bit conservative for higher ranks: If an 1100 rated player promotes himself himself to 10k (1000 rating), the reset will grant him a full reset to 1000. But if a 2400 player promotes himself to to 5d (2500 rating), the reset will only grant him a reset to 2450 (the lower bound of 5d). This behaviour flips gradually around rating 2100.
I intend to experiment with more sophisticated resets: When a player promotes himself or when e newcomer enters the system, his K factor will be double the normal value (temporarily increasing the volatility of his rating). His opponents' K factors halve when they play him. Then over the course of a dozen or so games, the K factors gravitate to their normal values. In that way, the system collects evidence for the new rating with reduced disturbance of his opponents ratings.

Schachus wrote:

On a positive note: One thing that is much better in your system is the handling of weak players(20k level). In EGD their ratings are largely screwed up, bacause EGD has a bottom cuttoff of 100 rating(like they couldnt handle negative numbers!?). Now there is handicap tournaments for children, where 20k beats 30k with 9 stones, which is just as expected. But 30k counts like 20k for EGD, so the EGD thinks the 20k did the equivalent of beating an 11k, and boots his rating a lot. This leads to large inflation of rating in the ranks below 15k, and I think your system is in fact much better there.

Yes, I think 30k is better. Perhaps 35k or 40k would be better still. In my experience, a 7 year old beginner is about 40k (based on handicaps in the kids go club that I run). But the EGD changes declared ranks below 20k to 20k, so ranks lower than 20k are absent in the data that I got. It's a pity.

gennan · **#48**

ez4u wrote:

The example seems incomplete. Could you describe what happens to the ratings of all three players (2000 player, 2100 player, and 2200 player) if they each play 24 games evenly split between the two opponents, or somehow weighted based on the sizes of the underlying pools? It seems insufficient to view the problem through the results of a single player when two play in each game and the problem is described in terms of three.

Yes, it is not enough. Perhaps I can make something to run simulations over many games and many players to see the overall long term behaviour of particular algorithms on a player population.

gennan · **#49**

gennan wrote:

ez4u wrote:

The example seems incomplete. Could you describe what happens to the ratings of all three players (2000 player, 2100 player, and 2200 player) if they each play 24 games evenly split between the two opponents, or somehow weighted based on the sizes of the underlying pools? It seems insufficient to view the problem through the results of a single player when two play in each game and the problem is described in terms of three.

Yes, it is not enough. Perhaps I can make something to run simulations over many games and many players to see the overall long term behaviour of particular algorithms on a player population.

Actually, the EGD and the revised system already do that. Only they use the actual tournament data as input instead of a hypothetical player and pairing distribution. The results of those "simulations" are reflected in their respective rating distributions.

gennan · **#50**

Schachus wrote:

In fact, there might be players(me for example), who explicitly dont want a reset, as long as they are just slow and steady improving, because the rating is the testimony for said improvement(you dont just imagine it, your results really get better). I got one reset from 8k to 5k because it was appearent I improved and the old rating was not appropriate for me anymore. And since then I only increased my rank by 1 (more or less following the rating), thus I can see my rating evolve from the reset to 1600 to its current stage(1802) documenting my improvement.

So you happen to chose a conservative / pessimistic self promotion policy. If everybody would do that (avoiding the reset policy), it leads to overall deflation, the system needs an positive epsilon parameter that increases everybody's rating by a small amount for every tournament game.

Other players happen to chose a more liberal / optimistic self-promotion policy. If everybody would do that (exploiting the reset policy), it leads to overall inflation, the system needs a negative epsilon parameter that decreases everybody's rating by a small amount for every tournament game.

In practise, some players are conservative and some players are liberal with self promotions. I suppose this is normal and it has always been like this, even long before the EGD existed. And it's fine, as long as they balance each other out. As far as I can tell, this is mostly the case: The EGD works fairly well with a small positive epsilon value (so perhaps resets should be applied a bit more often).

Instead of using an epsilon parameter to balance long term inflation / deflation, I find that tweaking the reset policy works just as well or even better. And I find that a more liberal reset policy works better than a conservative one (like the EGD reset policy), which means that the average European tournament player is not overly conservative or overly liberal when it comes to self-promotions.

Schachus wrote:

With your system, the rating got reset to 4k and to 3k (and would have gotten reset to 2k, if not for the fact, that the tournament where I registered experimentally as a 2k for some reason didnt make its way into EGD). So the only thing I see in your version of the rating, is that It improved from 1800 reset to 1840 over last 2(or3?) tournaments, the tournaments before are rendered irellevant to the ratng cause of the reset.

How relevant should historical data be? The fact is that Artem Kachanovskyi is currently 1p (I suppose we can agree on that). Does it matter how he got there? It's the system's job to estimate everybody's current level as best as it can. Ideally all the ratings should behave like a random walk around the "real" skill level of each player at each moment. I don't see rating points as something that one earns (like money or XP points in a video game). You try to improve and if you do, the system should reflect what's happened in the real world as quickly and as accurately as possible. To do that, it needs all the help it can get.

The rating system is basically a measurement device, calibrated to a certain scale. For go, that scale is the go rank scale, which is based on handicap. The EGD has insufficient data on handicap games (other than declared ranks, which are implicitly referring to handicap games), so it uses declared ranks from newcomers, expected winrates, resets and epsilon as a fallback.

With your 2k experiment: I think the system should listen to the experimental self-promotion, but if your results don't support it, the system should quickly gravitate back to a rating that matches your results (preferrably with minimal effect on your opponent's ratings from your experiment). Note that if you'd later promote to 2k again, the system won't reset your rating, because it does not exceed your highest declared rank anymore (both the EGD and revised policies work that way). So a "failed" experiment would mean that later, you'd have to fight to a 2k rating the hard way.

Schachus · **#51**

Why are you so sure, conservative resetting leads to deflation? There are absolutely no resets in chess and still there is no deflation...(in fact, chess players are whining about inflation, but I dont believe in that either, really). But if you are worried, about deflation, I'm happy having a slight epsilon inside there somewhere. Actually, whether there is a slight drift in ratings over a long period is irrelevant to me, since what counts is how ratings compare to one-another(if in 20 Years, the strongest european rank(by rating) is not anymore 8d but 9d or 7d, although the player having it is exactly as strong as In-Seong now, that doesnt doesnt concern me too much, I can compare his rating to others). Thats also why I dont believe it needs to be tied to raks to strongly. In-Seong doesnt call himself 8d because he somehow intrinsictly knows to be 8d, but because that comes out to fit his rating(and of course the ranks of players of similar strenght). On Tygem he would be 9d, in AGA system probably too, in some countrys maybe only 6d, because amateur ranks only go till there. I agree that the idea rating should reflect handicaps is nice, and rating system should try to have that 100pts= 1 stone, but that doesnt need to be tied to ranks, first of all it only concerns the behaviour of how ratings compare to one-another.

While we are there: Hermann said ranks are build to fit handicaps, but how? There is a "1k" player at a club, where I often play, that is clearly weaker than me. There also is a "5k" player, who is slightly weaker than me, but probably stronger than the "1k". They seldom play one-another, so they dont realize this. What should my rank be, so that hadicap, taken from this rank works with both players? Of course, maybe the 1K should really be 5k and the 5k really be 4k and then I would be 3k, or maybe everyone a rank stronger, but am I supposed to go to someone who played as 1k for years and say "sorry, but I believe you are 4-5k"? I dont think ranks refelct handicap better than ratings, except maybe in the DDK range where ratings are crap, due to already discussed problems.

Rating system actually reached this conclusion(about the 1k) and he has a rating around 1650, which is right for him, though it doesnt fit his rank. This is all no problem, until, hyphothectially, some new player comes along(maybe he wasnt in Europe before or he only played online), who calibrates his rank by playing against the 1k, finds out that is his level and enters a tournament as 1k. His rating would then iniatialize on 2000, although 1650 would be right.

Historical data is important, because a few tournaments mean nothing at all. The rating only gives a solid and reliable answer if it has enough rated tournaments to draw this from, because over a single tournament, there is so much noise in the data(form, luck, opponents play).. and so on that you can only say "he is a 3k plus or minus 2 ranks". I could have told you that without rating system. The strenght of a rating system is, imo , to consider a lot of data(newer data more impotant oviously), to give a more exact answer of our current "average" playing strenght.

HermanHiddema · **#52**

Schachus wrote:

While we are there: Hermann said ranks are build to fit handicaps, but how? There is a "1k" player at a club, where I often play, that is clearly weaker than me. There also is a "5k" player, who is slightly weaker than me, but probably stronger than the "1k". They seldom play one-another, so they dont realize this. What should my rank be, so that hadicap, taken from this rank works with both players? Of course, maybe the 1K should really be 5k and the 5k really be 4k and then I would be 3k, or maybe everyone a rank stronger, but am I supposed to go to someone who played as 1k for years and say "sorry, but I believe you are 4-5k"? I dont think ranks refelct handicap better than ratings, except maybe in the DDK range where ratings are crap, due to already discussed problems.

The statement "I dont think ranks refelct handicap better than ratings" doesn't really make sense to me.

Lets say I have two hypothetical players in some hypothetical pure Elo rating system. No fiddling like the EGF has done, just basic Elo as implemented in chess. One of them has rating 4400, the other has rating 4650. According to the Elo rating formula, at a 250pt difference, that means the player with the higher rating should win about 81% of even games between them. What would you consider a proper handicap between these players?

gennan · **#53**

I'm looking at the overall statistics of the EGD history. Any statistical distribution has variation and outliers, of which you seem to have encountered a case with this 5k and 1k. I'm sure all of us know some examples like this, but overall, ranks do give a good indication of someone's skill (with a mean error of one or two ranks).

Go is different from chess, in that historically it has a rank system based on handicap (you may call them titles, but they are less fixed than titles IMO, especially kyu ranks). These ranks are based on handicap needed against other ranked players to get a 50% winrate and they are not based on even game winrates at all (except when the ranks are equal, in which case 50% winrate is expected). In that sense, go ranks are not compatible with a normal Elo rating system, which is based on even game winrates only.

You could use a pure Elo rating system for go. I would be absolutely fine with that: only rating differences matter and overall rating drift means nothing, as long as you don't compare year 2010 ratings with year 1970 ratings (there is a little thing though, that chess also has titles linked to ratings, like 2500 = Grandmaster and these titles happen to suffer from long term inflation).

But if you use a pure Elo rating system for go, you should not claim a fixed relation to go ranks (handicaps). It would just be a seperate system. You might publish annual correlation tables as an indication used to convert year 2016 ratings to go ranks, but these correlations would be free to drift from one year to the other. I would be perfectly ok with such a rating system.

The "problem" is that the EGD does claim a fixed mapping to go ranks (handicaps). I think it is a good feature, but if you make this claim, you should do your best to maintain an accurate mapping, which means finetuning the system to detect and counter long term drift and local / global contraction or dilation of the rating range (because that leads to mismatched handicaps which would invalidate the mapping).

Schachus · **#54**

HermanHiddema wrote:

Schachus wrote:

While we are there: Hermann said ranks are build to fit handicaps, but how? There is a "1k" player at a club, where I often play, that is clearly weaker than me. There also is a "5k" player, who is slightly weaker than me, but probably stronger than the "1k". They seldom play one-another, so they dont realize this. What should my rank be, so that hadicap, taken from this rank works with both players? Of course, maybe the 1K should really be 5k and the 5k really be 4k and then I would be 3k, or maybe everyone a rank stronger, but am I supposed to go to someone who played as 1k for years and say "sorry, but I believe you are 4-5k"? I dont think ranks refelct handicap better than ratings, except maybe in the DDK range where ratings are crap, due to already discussed problems.

The statement "I dont think ranks refelct handicap better than ratings" doesn't really make sense to me.

Lets say I have two hypothetical players in some hypothetical pure Elo rating system. No fiddling like the EGF has done, just basic Elo as implemented in chess. One of them has rating 4400, the other has rating 4650. According to the Elo rating formula, at a 250pt difference, that means the player with the higher rating should win about 81% of even games between them. What would you consider a proper handicap between these players?

That is of course absolutely true, in your example you can say nothing at all about handicaps. But EGD has not got a basic ELO system, diverging in 2 important points(in order to fix this): Nr.1: Handicap games are rated: For these purpososes, handicap is compensated for with 100pt a stone(50pt for the first one, because its is only half as good). If there are enough handicap games rated that should help calibrating things

Nr.2: other than in Elo, rating difference does not immediately reflect winning expectation. It also depends on the stronger players rating(the number a defines the difference, where expectations are e:1, and depends on the rating).I actually dont know, how the correspondence of a and the rating was obtained, but a good way would be: Take players known to have a certain level and are 1 stone apart(however you determine that, I would say, as the rank system did. That means chances in a 1 stone game(that is reverse komi for black) should be 50/50)(so ratings should be 100 points apart) and let them play even games. Check how many the stronger player wins(70%,? 80%) and make the a corresponding to that strength such, that the expectation for a 100 points difference game matches that. I dont know, if this was done when dependence of GoR and a was defined, but it makes sense, that a grows lower for stronger players as we know chances of 5d beating 6d even are much lower than 8k beating 7k, so at least a was sort of plausibly chosen.

Of course this process does have to do something with the rank system in being set up. But you have that way a rating system that has chances of reflecting handicap, without the need for rating resets.

This is also my suggesttion: If you want handicaps to work better with ratings, why not take the data from handicap games you observe and use it to optimize this correlation between GoR and a(and maye also the K factor, that is for some reason called con in EGD), in such a way that the revised ratings fit handicap games better.

I would not want to entirely get rid of resets, since there are cases where players improve 20 ranks between tournaments and thats skews things heavily (1d losing to someone with 20k rating is strange), but I suggest having as few of them as possible.

gennan · **#55**

HermanHiddema wrote:

Schachus wrote:

While we are there: Hermann said ranks are build to fit handicaps, but how? There is a "1k" player at a club, where I often play, that is clearly weaker than me. There also is a "5k" player, who is slightly weaker than me, but probably stronger than the "1k". They seldom play one-another, so they dont realize this. What should my rank be, so that hadicap, taken from this rank works with both players? Of course, maybe the 1K should really be 5k and the 5k really be 4k and then I would be 3k, or maybe everyone a rank stronger, but am I supposed to go to someone who played as 1k for years and say "sorry, but I believe you are 4-5k"? I dont think ranks refelct handicap better than ratings, except maybe in the DDK range where ratings are crap, due to already discussed problems.

The statement "I dont think ranks refelct handicap better than ratings" doesn't really make sense to me.

Lets say I have two hypothetical players in some hypothetical pure Elo rating system. No fiddling like the EGF has done, just basic Elo as implemented in chess. One of them has rating 4400, the other has rating 4650. According to the Elo rating formula, at a 250pt difference, that means the player with the higher rating should win about 81% of even games between them. What would you consider a proper handicap between these players?

Yes, the EGD basically claims to predict handicaps (ranks), but its predictions are mostly based on even game results. I think the reason that is works fairly well in the absense of data, is that it is fed constantly with declared ratings (based on handicaps).

Schachus · **#56**

actually, I'm interested: did you do anything the "a" in your revised ratings?

gennan · **#57**

Schachus wrote:

Nr.2: other than in Elo, rating difference does not immediately reflect winning expectation. It also depends on the stronger players rating(the number a defines the difference, where expectations are e:1, and depends on the rating).I actually dont know, how the correspondence of a and the rating was obtained, but a good way would be: Take players known to have a certain level and are 1 stone apart(however you determine that, I would say, as the rank system did. That means chances in a 1 stone game(that is reverse komi for black) should be 50/50)(so ratings should be 100 points apart) and let them play even games. Check how many the stronger player wins(70%,? 80%) and make the a corresponding to that strength such, that the expectation for a 100 points difference game matches that. I dont know, if this was done when dependence of GoR and a was defined, but it makes sense, that a grows lower for stronger players as we know chances of 5d beating 6d even are much lower than 8k beating 7k, so at least a was sort of plausibly chosen.

Of course this process does have to do something with the rank system in being set up. But you have that way a rating system that has chances of reflecting handicap, without the need for rating resets.

This is also my suggesttion: If you want handicaps to work better with ratings, why not take the data from handicap games you observe and use it to optimize this correlation between GoR and a(and maye also the K factor, that is for some reason called con in EGD), in such a way that the revised ratings fit handicap games better.

This is basically what I did: match the predicted winrates with the observed winrates from the EGD. I posted my findings on the site. For example http://goratings.eu/Probabilities/P_PredictedEGD vs http://goratings.eu/Probabilities/P_ObservedEGD. I also looked at handicap game winrates specifically, but the amount of handicap game data is a bit lacking in the EGD.

Schachus · **#58**

Oh, good we talked about it! I didnt realsize that was what your graphics show.
Still I think, if your goal is to raflect handicaps well, you need to take handicap game data to check the wuality of your ratings, even if there is not a lot of data. I still think, checking against the rank defeats the purpose, since that way the system "assing the rating that corresponds to the declared rank" would be optimal, while it clearly isnt.

gennan · **#59**

Yes. Take http://goratings.eu/Probabilities/P_PredictedEGD. The purple curve intersecting the 2100 grid line at 50% is the 2100 rating curve. The curve intersects the 2000 grid line at 70%. This means the EGD predicts a 70% win probability when a 2100 player plays a 2000 player.

Then take http://goratings.eu/Probabilities/P_ObservedEGD. Here it shows that the 2100 curve intersects the 2000 grid line at 60%. This is the observed winrate of a 2100 player against a 2000 player over the history of the EGD.

So the EGD predicted winrates don't match the observed winrates very well.

Better predictions is one of the improvements I built in the revised system.

gennan · **#60**

These predictions matter, because if your winrate is lower than the expected winrate, you lose points.

Revised European go ratings

Who is online