KGS ranking revisited

jts · Post by **jts** » Fri May 11, 2012 11:55 am

emeraldemon wrote:It seems to me that the ideal rating & handicapping system would strive to handicap every match to a 50% win rate.

This isn't quite right though, as the ratings are continuous, even though the ranks are cardinal. So between a 3.9k and a 3.0k we might expect the stronger to win 2/3 of the game, even though from the perspective of the stronger player he may feel frustration that he wins 2/3 of his games and never seems to rank up.

hyperpape · Post by **hyperpape** » Fri May 11, 2012 12:28 pm

One adaptation is to use all the variations of komi between 6.5 and 0.5 as appropriate. Of course this doesn't remove the problem entirely.

wms · Post by **wms** » Fri May 11, 2012 1:58 pm

emeraldemon wrote:...There was a competition a while back looking for improvements to ELO that used basically this metric on historical chess data, I believe.

A year or two ago somebody surveyed various rank algorithms applied to go. He used the KGS algorithm (I'd given him what he needed to recreate it exactly), Elo, a couple modern systems (Glicko I think was one?), and his own system. He then used ability to predict game outcomes as his metric of how good a system was. I was happy to hear that the KGS system placed second in his study, behind his own, but ahead of Elo and Glicko. But his system did not consider predictability of rank changes; so KGS' penchant for changing your rank when you don't play, or for it's occasional bumps where everybody goes up or down together, did not count against it.

I'm terrible with names but probably somebody here on 19x19 will remember who did the study and where the results are.

yoyoma · Post by **yoyoma** » Fri May 11, 2012 2:00 pm

wms wrote:
emeraldemon wrote:...There was a competition a while back looking for improvements to ELO that used basically this metric on historical chess data, I believe.
A year or two ago somebody surveyed various rank algorithms applied to go. He used the KGS algorithm (I'd given him what he needed to recreate it exactly), Elo, a couple modern systems (Glicko I think was one?), and his own system. He then used ability to predict game outcomes as his metric of how good a system was. I was happy to hear that the KGS system placed second in his study, behind his own, but ahead of Elo and Glicko. But his system did not consider predictability of rank changes; so KGS' penchant for changing your rank when you don't play, or for it's occasional bumps where everybody goes up or down together, did not count against it.

I'm terrible with names but probably somebody here on 19x19 will remember who did the study and where the results are.

http://remi.coulom.free.fr/WHR/

emeraldemon · Post by **emeraldemon** » Fri May 11, 2012 7:47 pm

Thanks for the link. wms, did the results of that study make you consider trying his algorithm?

RobertJasiek · Post by **RobertJasiek** » Fri May 11, 2012 10:21 pm

jts wrote:So your objection is not that it's erratic per se

Mainly my objection is the system's design errors.

RobertJasiek · Post by **RobertJasiek** » Fri May 11, 2012 10:25 pm

emeraldemon wrote:look at the average win-rate of every player over an appreciable number of games, and find the average distance from 50%.

This is a simplifying theory but not quite true. When the rating system is bad, then some players can play worse than usual because the system expects them to win much more than 50%, winning that much is tiring, and so they win less than they would if they were not forced into becoming tired. E.g., I (and others, from whom I have heard the same) can win ca. 10-12 games in a row, but then one becomes so tired than winning 20-24 games in a row is out of the question. Rather quickly lost games occur, first 1, then 2, then 4, then 8. The more tired the greater the percentage of lost games becomes.

witwit · Post by **witwit** » Fri May 11, 2012 11:37 pm

I certainly agree that sudden shifts in the system are not desired, but as other people have mentioned that is a separate issue from being "internally consistent". Trying to make KGS consistent with external systems is not a straightforward problem since there is no way to objectively measure accuracy like you can when judging the internal accuracy of the system, ie how well the system can predict the outcome of a game given their ratings.

This is a simplifying theory but not quite true. When the rating system is bad, then some players can play worse than usual because the system expects them to win much more than 50%, winning that much is tiring, and so they win less than they would if they were not forced into becoming tired. E.g., I (and others, from whom I have heard the same) can win ca. 10-12 games in a row, but then one becomes so tired than winning 20-24 games in a row is out of the question. Rather quickly lost games occur, first 1, then 2, then 4, then 8. The more tired the greater the percentage of lost games becomes.

It is entirely possible, however, that the player population on KGS, on average, plays enough games outside of the server to make the system more accurate than without this inflation. Moreover, the system obviously places less confidence in these inactivity inflated ranks meaning that if the increase in rank was not warranted the correction should in theory not take too long. Of course whether or not this works in practice depends on the playing habits of the player population, but I can say in the case of KGS that it works well enough for me.

snorri · Post by **snorri** » Sat May 12, 2012 8:51 am

hyperpape wrote:One adaptation is to use all the variations of komi between 6.5 and 0.5 as appropriate. Of course this doesn't remove the problem entirely.

Please don't. If game-to-game variance is greater than komi as I suspect it as for almost all amateur players or if the average systematic error is greater than that, as it almost certainly is, it doesn't gain anything and isn't worth the confusion it would cause. It's a false precision. When IGS switched to half-ranks and therefore had games with reverse komi for 1-stone differences, it took me some time to adjust. I'm okay with it and now I don't have to recheck the komi in every game but with a continuous komi system I'd have to, so I'd probably just manually set it to some common value before the game rather than some microrank-derived setting.

RobertJasiek · Post by **RobertJasiek** » Sat May 12, 2012 9:02 am

witwit wrote:there is no way to objectively measure accuracy like you can when judging the internal accuracy of the system

Do you say that an objective external measure of internal accuracy cannot exist or that so far nobody has described such yet?

emeraldemon · Post by **emeraldemon** » Sat May 12, 2012 11:56 am

RobertJasiek wrote:
emeraldemon wrote:look at the average win-rate of every player over an appreciable number of games, and find the average distance from 50%.
This is a simplifying theory but not quite true. When the rating system is bad, then some players can play worse than usual because the system expects them to win much more than 50%, winning that much is tiring, and so they win less than they would if they were not forced into becoming tired. E.g., I (and others, from whom I have heard the same) can win ca. 10-12 games in a row, but then one becomes so tired than winning 20-24 games in a row is out of the question. Rather quickly lost games occur, first 1, then 2, then 4, then 8. The more tired the greater the percentage of lost games becomes.

If I understand, you're talking about a situation something like this:

A player is ranked incorrectly by a system, so the player is suggested even games against players he "should" win against say 80% of the time. But because it's tiring to win so much, he wins less: maybe 60% or something.

My instinct in this situation is to say that his true "should win" percentage is 60%, not 80%. Even if it's true that winning is more tiring than losing (which I'm not sure of), that seems to be a part of what's necessary to win. You can't say "Player A would beat Player B 80% of the time if Player A didn't have to win 80% of the time".

It is true that a person's past games can change how they play in the next games; are you trying to suggest that we should model this?

snorri · Post by **snorri** » Sat May 12, 2012 5:14 pm

jts wrote:Well, not necessarily. If your most recent partners decline, you'll decline to. It just assumes that, in the absence of evidence, you can still beat the same people and lose to the same people.

So in a very real way, it's better to beat someone whose rating is going up than someone whose rating is trending down or staying flat, assuming of course that past performance says something about future results.

The worst would be to lose to someone whose rating is going down. Ah, but it's such a nuisance to check player's graphs before you play a game. Maybe there should be new stigma marks:

\ = losing record
/ = winning record
- = relatively flat record

Then one can advertise: "no ?~\" in the game description, which is more succinct than "no ?~ or losers"

But no, wait. Some people might take issue with people who only play opponents with a winning record in order to get that extra rating boost. So just like ~ you'd have to have another mark that says, in effect, that you don't play enough losers. I'm not sure what that mark should be...maybe *

witwit · Post by **witwit** » Sat May 12, 2012 8:57 pm

RobertJasiek wrote:
witwit wrote:there is no way to objectively measure accuracy like you can when judging the internal accuracy of the system
Do you say that an objective external measure of internal accuracy cannot exist or that so far nobody has described such yet?

I meant that an objective measure of internal consistency does exist while an objective measure of consistency with external systems can only be defined by arbitrarily picking another system to compare against.

RobertJasiek · Post by **RobertJasiek** » Sat May 12, 2012 10:20 pm

emeraldemon wrote:Even if it's true that winning is more tiring than losing (which I'm not sure of),

I am sure, based on the experience of dozens of thousands of games.

You can't say "Player A would beat Player B 80% of the time if Player A didn't have to win 80% of the time".

Since one cannot say so, a rating system must avoid punishing players for becoming tired by having to win a too great percentage for too long a time.

are you trying to suggest that we should model this?

Not model - but avoid.

RobertJasiek · Post by **RobertJasiek** » Sat May 12, 2012 10:29 pm

witwit wrote:an objective measure of consistency with external systems can only be defined by arbitrarily picking another system to compare against.

Arbitrarily picking another system is not an objective measure of consistency with external systems. Theoretical insight independent of particular external systems possibly can provide an objective measure. It is, however, still unclear which assumptions for theoretical insight can be called objective or arbitrary axioms. Getting a good answer on this is the real difficulty.

Life In 19x19

KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited