KGS ranking revisited

RobertJasiek · Post by **RobertJasiek** » Fri May 11, 2012 10:21 pm

jts wrote:So your objection is not that it's erratic per se

Mainly my objection is the system's design errors.

RobertJasiek · Post by **RobertJasiek** » Fri May 11, 2012 10:25 pm

emeraldemon wrote:look at the average win-rate of every player over an appreciable number of games, and find the average distance from 50%.

This is a simplifying theory but not quite true. When the rating system is bad, then some players can play worse than usual because the system expects them to win much more than 50%, winning that much is tiring, and so they win less than they would if they were not forced into becoming tired. E.g., I (and others, from whom I have heard the same) can win ca. 10-12 games in a row, but then one becomes so tired than winning 20-24 games in a row is out of the question. Rather quickly lost games occur, first 1, then 2, then 4, then 8. The more tired the greater the percentage of lost games becomes.

witwit · Post by **witwit** » Fri May 11, 2012 11:37 pm

I certainly agree that sudden shifts in the system are not desired, but as other people have mentioned that is a separate issue from being "internally consistent". Trying to make KGS consistent with external systems is not a straightforward problem since there is no way to objectively measure accuracy like you can when judging the internal accuracy of the system, ie how well the system can predict the outcome of a game given their ratings.

This is a simplifying theory but not quite true. When the rating system is bad, then some players can play worse than usual because the system expects them to win much more than 50%, winning that much is tiring, and so they win less than they would if they were not forced into becoming tired. E.g., I (and others, from whom I have heard the same) can win ca. 10-12 games in a row, but then one becomes so tired than winning 20-24 games in a row is out of the question. Rather quickly lost games occur, first 1, then 2, then 4, then 8. The more tired the greater the percentage of lost games becomes.

It is entirely possible, however, that the player population on KGS, on average, plays enough games outside of the server to make the system more accurate than without this inflation. Moreover, the system obviously places less confidence in these inactivity inflated ranks meaning that if the increase in rank was not warranted the correction should in theory not take too long. Of course whether or not this works in practice depends on the playing habits of the player population, but I can say in the case of KGS that it works well enough for me.

snorri · Post by **snorri** » Sat May 12, 2012 8:51 am

hyperpape wrote:One adaptation is to use all the variations of komi between 6.5 and 0.5 as appropriate. Of course this doesn't remove the problem entirely.

Please don't. If game-to-game variance is greater than komi as I suspect it as for almost all amateur players or if the average systematic error is greater than that, as it almost certainly is, it doesn't gain anything and isn't worth the confusion it would cause. It's a false precision. When IGS switched to half-ranks and therefore had games with reverse komi for 1-stone differences, it took me some time to adjust. I'm okay with it and now I don't have to recheck the komi in every game but with a continuous komi system I'd have to, so I'd probably just manually set it to some common value before the game rather than some microrank-derived setting.

RobertJasiek · Post by **RobertJasiek** » Sat May 12, 2012 9:02 am

witwit wrote:there is no way to objectively measure accuracy like you can when judging the internal accuracy of the system

Do you say that an objective external measure of internal accuracy cannot exist or that so far nobody has described such yet?

emeraldemon · Post by **emeraldemon** » Sat May 12, 2012 11:56 am

RobertJasiek wrote:
emeraldemon wrote:look at the average win-rate of every player over an appreciable number of games, and find the average distance from 50%.
This is a simplifying theory but not quite true. When the rating system is bad, then some players can play worse than usual because the system expects them to win much more than 50%, winning that much is tiring, and so they win less than they would if they were not forced into becoming tired. E.g., I (and others, from whom I have heard the same) can win ca. 10-12 games in a row, but then one becomes so tired than winning 20-24 games in a row is out of the question. Rather quickly lost games occur, first 1, then 2, then 4, then 8. The more tired the greater the percentage of lost games becomes.

If I understand, you're talking about a situation something like this:

A player is ranked incorrectly by a system, so the player is suggested even games against players he "should" win against say 80% of the time. But because it's tiring to win so much, he wins less: maybe 60% or something.

My instinct in this situation is to say that his true "should win" percentage is 60%, not 80%. Even if it's true that winning is more tiring than losing (which I'm not sure of), that seems to be a part of what's necessary to win. You can't say "Player A would beat Player B 80% of the time if Player A didn't have to win 80% of the time".

It is true that a person's past games can change how they play in the next games; are you trying to suggest that we should model this?

snorri · Post by **snorri** » Sat May 12, 2012 5:14 pm

jts wrote:Well, not necessarily. If your most recent partners decline, you'll decline to. It just assumes that, in the absence of evidence, you can still beat the same people and lose to the same people.

So in a very real way, it's better to beat someone whose rating is going up than someone whose rating is trending down or staying flat, assuming of course that past performance says something about future results.

The worst would be to lose to someone whose rating is going down. Ah, but it's such a nuisance to check player's graphs before you play a game. Maybe there should be new stigma marks:

\ = losing record
/ = winning record
- = relatively flat record

Then one can advertise: "no ?~\" in the game description, which is more succinct than "no ?~ or losers"

But no, wait. Some people might take issue with people who only play opponents with a winning record in order to get that extra rating boost. So just like ~ you'd have to have another mark that says, in effect, that you don't play enough losers. I'm not sure what that mark should be...maybe *

witwit · Post by **witwit** » Sat May 12, 2012 8:57 pm

RobertJasiek wrote:
witwit wrote:there is no way to objectively measure accuracy like you can when judging the internal accuracy of the system
Do you say that an objective external measure of internal accuracy cannot exist or that so far nobody has described such yet?

I meant that an objective measure of internal consistency does exist while an objective measure of consistency with external systems can only be defined by arbitrarily picking another system to compare against.

RobertJasiek · Post by **RobertJasiek** » Sat May 12, 2012 10:20 pm

emeraldemon wrote:Even if it's true that winning is more tiring than losing (which I'm not sure of),

I am sure, based on the experience of dozens of thousands of games.

You can't say "Player A would beat Player B 80% of the time if Player A didn't have to win 80% of the time".

Since one cannot say so, a rating system must avoid punishing players for becoming tired by having to win a too great percentage for too long a time.

are you trying to suggest that we should model this?

Not model - but avoid.

RobertJasiek · Post by **RobertJasiek** » Sat May 12, 2012 10:29 pm

witwit wrote:an objective measure of consistency with external systems can only be defined by arbitrarily picking another system to compare against.

Arbitrarily picking another system is not an objective measure of consistency with external systems. Theoretical insight independent of particular external systems possibly can provide an objective measure. It is, however, still unclear which assumptions for theoretical insight can be called objective or arbitrary axioms. Getting a good answer on this is the real difficulty.

wms · Post by **wms** » Mon May 14, 2012 10:07 am

emeraldemon wrote:Thanks for the link. wms, did the results of that study make you consider trying his algorithm?

Yes and no. Yes in that it made me decide that if I ever revisit the ranking system, Remi's system would be the first place I go for alternatives. No in that his paper reaffirmed my belief that the KGS system is "good enough" and there is no urgent need to replace it.

LexC · Post by **LexC** » Mon May 14, 2012 11:12 am

The conclusion of the paper is interesting as it says that the Remi's algorithm needs some refinements to apply to KGS

Another research direction would be to improve the model. An efficient application of WHR to Go data would require some refinements of the dynamic
Bradley-Terry model, that the KGS rating algorithm [13] already has. In particular, it should be able to
– Handle handicap and komi.
– Deal with outliers.
– Handle the fact that beginners make faster progress than experts.

Kaya.gs · Post by **Kaya.gs** » Mon May 14, 2012 1:17 pm

Overall i think kgs rating system has some definite strong points. Specially between 3k and 2d, from experience, i feel the accuracy is just perfect. Systems like Wbaduk or Tygem tend to digress really bad in that aspect, even players with thousands of games. In Wbaduk, 2 7d players can be more than 2 stones appart.

My opinion is that accuracy is just one of the factors in a rating system. The psychology of it is very important.
I think the key element that produces discontent with kgs's rating system is heavyness. Its an educated guess that the #1 reason for multiple accounts is the rating system.

The thing with WHR is that its not a trivial implementation and you have to look out for performance there, because it does some heavy operations.

Tami · Post by **Tami** » Mon May 14, 2012 5:59 pm

Kaya.gs wrote:My opinion is that accuracy is just one of the factors in a rating system. The psychology of it is very important. I think the key element that produces discontent with kgs's rating system is heavyness. Its an educated guess that the #1 reason for multiple accounts is the rating system.

I think that hits the nail on the head.

The KGS may be accurate for most players most of the time, but it seems to be based on the assumption that nobody ever improves. Once you have a stable rank, then it becomes extremely hard to change it, no matter how much you win or lose. And, I kind of agree with Robert Jasiek here, it is much easier to play worse than your mark than up to it because on more than one occasion I have been close to a promotion, lost a crucial game and then gone on to lose a string on games out of sheer frustration. I`m sure that experience is not unique. If only it wasn`t quite so like climbing a greasy pole, maybe not so many players would go on tilt so often.

The latest adjustment, the downward one that prompted this thread, came as a nasty surprise - I had been nursing my main account, the heavy one, toward 1d by steadfastly resisting tilty emotions whenever I did lose, and the adjustment undid all that. It also brought my 1d account temporarily back to 1k.

And, yes, rank and ratings graph are important to me. I have been putting effort into improving my go, and I was using these things to measure my progress. Maybe I have little talent for the game and I am only improving in small steps, but I still like to see my graph go upwards over the passing months.

For sure, I totally get it that the system is not intended for providing feedback on players` progress, but only for making a roughly 50-50 win/lose balance. Could it not be though that the 50-50 balance is merely an illusion of a mirage? If, in fact, there are many, many players of different strengths crammed into a small ratings band because of heaviness, then might not their mutual scores tend to even out over time, thereby giving the false impression of accuracy? (Strong 3k beats weak 3k, but weak 3k wins against weak 1k, who then narrowly beats strong 2k, who beats strong 3k, who goes on tilt and loses to weak 3k).

Still, if it`s never going to change, then that`s just too bad. At least it's still fun to play free games and watch broadcasts.

hyperpape · Post by **hyperpape** » Mon May 14, 2012 6:24 pm

You're also playing against people who themselves go on tilt, and play badly. You may gain more than you lose, or lose more than you gain. Who knows? It's just because you're in your own head, while your opponents are faceless people on the other side of the internet that their emotions and their sloppy play is invisible to you.

I have no doubt that there are some people who are unusually unstable in their play strength, who can be 1 dan for weeks, then play like a 6 kyu for a night (or whatever the numbers may be).

Never mind what that person's rank should be (for my money, it's not 1 dan), but what rating system could possibly accommodate people like that? You're asking KGS to divine that player's true strength when there's no evidence in their play.

I understand Kaya's point that psychology matters, and maybe it's sometimes ok to accept a less accurate system. But be clear about that: the system won't tell you what you really are. It will just do a better job of telling you what you expect to hear.

Life In 19x19

KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited