A Curious Case Study in KGS Ranks

Bantari · Post by **Bantari** » Wed Mar 26, 2014 2:10 pm

RobertJasiek wrote: There are also other reasons why I do not play much on other servers, such as extremely disliking having to use another software for every server.
There are other reasons to like KGS, so I want the worst part of KGS (the rating system) to improve so that I can better enjoy to good features of KGS.

I think what you suggest is not really an "improvement", but only a change which would make it more fun for you, personally, to play there.

But you need to understand that this change, which would make it more fun for you, personally, to play there - this change would ruin the fun for some others to play there, me included, personally. I happen to quite like the KGS rating system as it is.

So basically, what you propose is that because of your personal dislike to play on other servers, you wish the one server which you like playing on to cater to your very personal preferences, even when this means that others are unhappy.

What I would suggest is that instead of remaking the world to be your sweet cozy oyster, you try to figure out how to combat your personal little idiosyncrasy which prevents you from enjoying the oysters already out there. It is easier and infinitely more efficient to change one person that then whole server. And if you really feel about it strongly, why not just switch to Tygem entirely - you will have then also just one server with just one software, and this should make you happy. Or no?

As I said - as long as we don't all agree on exactly the same model, places have to exist which cater to various groups. It seems that the place that caters to you is Tygem, so why worry about KGS?

RobertJasiek wrote:I have not said that one system must be used on all servers. You have made this up.

I did not "make it up". I have inferred it from what you said. It never occurred to me that you make all this fuss because you are unwilling to play on Tygem or anywhere else but KGS, and thus KGS has to cater to your personal preferences in spite of the fact that places which cater to your personal preferences already exist and thrive elsewhere.

So it is probably my bad, apologies.

However - I think you should lobby to also switch Tygem rating system to more sensible one, just as you lobby for change in KGS system. After all, balance must be preserved. If you try to take away the place I enjoy, at least try to give me a substitute. Fair is fair, no?

RobertJasiek wrote:There is room for a server with real world ratings. In fact, there is so much room that such a server does not even exist remotely. Don't even try to pretend KGS would be such a server, ridiculous. On KGS, equally KGS-ranked players can easily be 5 real world ranks apart.

As can they in real world. I still remember fondly when I was forced to give a "1d" player 9 handi and trashed him badly.

Anyways - I never said what you suggest, you are making it up. I said "real-world-like", there is a difference, especially when we consider the context, which is rank stability vs. rank instability.

oren · Post by **oren** » Wed Mar 26, 2014 2:50 pm

Mef wrote: Nevertheless even for the corner case KGS has a simple way to solve this problem: Play games handicapped at the rating you think you should be! This will allow you to reach your equilibrium faster and unlike many other rating systems does not penalize the opponents who help you get there.

Or make a new account.

RBerenguel · Post by **RBerenguel** » Wed Mar 26, 2014 3:43 pm

RobertJasiek wrote:
My system (when worked out to have global non-deflationary stability) would have much greater volatiliy, but I am not at all convinced it would have smaller accuracy. Rather I think that, on average for every particular player, it would have greater accuracy, because it can correct his temporarily wrong ratings much more quickly.

What system? It is clear that it is quite hard to model player's rank real distribution. A higher volatility simulated model is quite wrong, too

Mef · Post by **Mef** » Wed Mar 26, 2014 3:54 pm

I really enjoy reading the discussion that's been going on (well at least half of the discussion...Robert defending a rating system he came up with in 10 seconds without thinking of its implications isn't as interesting to me). One of the reasons I wanted to present this corner case was to start some discussion.

I think one thing this has help point out is one of the sources for confusion (and perhaps frustration) related to the KGS rating system. It can be quickly summarized here:

Polama wrote: What we strictly, factually know is that over 242 games this account was at least 3 stones weaker, potentially more depending on the exact nature of the bug.

Polama wrote:I think an advanced statistical model would view this case as a meaningful shift

vs.

RBerenguel wrote: A student in statistics won't look at the data and say, "hey, this player is a sucker now!"

What we have at the core is two questions: "How strong was the bot performing at a given time?" vs. "How do you expect the bot to perform on its next game?"

Many people are worried about the former and this is related to what Polama is calculating. The performance of the bot on that day was clearly well below 11k. This is very easy to show with very high statistical certainty.

The other question is related, however it is not the same. Likewise, when you calculate the expected result it is also not the same. If we were to look for analogies, the closest we will probably find to something like this is a sports injury. If a player is injured, their performance may suffer a sudden drastic drop, but you would not expect this to be representative of how they will be expected to perform if and when they recover.

The KGS rating system aims to answer the latter question (predict the outcome of the next game), at the necessary expense of the former question (describe the result of the previous game). This of course always implies there is a bit of regression to the mean ever-present in all of its calculations.

Mef · Post by **Mef** » Wed Mar 26, 2014 4:38 pm

RobertJasiek wrote:The case study does not compare well to human players with frequent games, who need, without significant interruption, to win ca. 70+% for weeks up to a few months in order to improve a rank, after it has been VERY MUCH easier to drop a rank.

I let this slide when it was posted because I didn't think it was worth bringing up...but as Robert has continued harping on about how he feels slighted by the flawed KGS rating system and because this claim is so incredibly easy to check (it took me about 10 minutes to make a spreadsheet), I just wanted to point out that Robert has never had consecutive months with >70% win rate in rated games regardless of sample size, playing rate, rating change, handicap, etc. If you count April/May in 2004 he had one set of 2 months where he had 70% and 72%, but that's almost 10 years ago and the KGS rating system has been adjusted several times since then. I have attached a graph which is quite easy for anyone to independently verify with his archives.

Robert's statements are based on assumptions that are divorced from reality.

Edit: Putting imagine in hide tag:

mitsun · Post by **mitsun** » Wed Mar 26, 2014 4:41 pm

Mef wrote: The KGS rating system aims to answer the latter question (predict the outcome of the next game), at the necessary expense of the former question (describe the result of the previous game).

Hmm, I thought I understood the KGS rating system until I read this. I would have said that the KGS rating system is designed to accurately describe the results of the previous games, with the assumption that this allows it to predict the outcome of the next game.

On the subject of a player whose rank changes drastically and discontinuously, that is an unusual case which violates the assumptions of the rating model, and I don't think is it particularly interesting to see how KGS or any other rating system copes with this anomaly.

RobertJasiek · Post by **RobertJasiek** » Wed Mar 26, 2014 4:46 pm

Bantari,

that I have described my preferred kind of rating system does not imply that I would impose it on everybody for the sake of making only myself happy. Nevertheless, you allow me to express my opinion, right?:) - Since different people have different preferences, a rating system can be some compromise. However, currently the KGS rating system is no compromise in its stability aspect. - I think a compromise should be possible so that some stability is there but everybody (incl. the frequent players) can improve if winning a significant (instead of very great) percentage over a reasonable (instead of extraordinarily long) period and without super-human effort (alternatively without playing for a few months, then winning a few games).

To understand your preference for the current system, how many games do you play per day and how many months do you need to improve a rank after having dropped a rank?

Currently, by experience, one effectively needs to win ca. 68.5% for successive weeks or months to improve a rank as a frequently playing player. Assume it would be tweaked to 65%, would you be unhappy then? For me, this might make the difference, because 65% do not require as much super-human effort.

uPWarrior · Post by **uPWarrior** » Wed Mar 26, 2014 4:48 pm

RBerenguel wrote: What system? It is clear that it is quite hard to model player's rank real distribution.

In my opinion that's the issue with most ranking systems designs. It should not try to model the rank distribution at all; the definition of rank itself should be the only well defined concept to be modeled.

skydyr · Post by **skydyr** » Wed Mar 26, 2014 5:06 pm

uPWarrior wrote:
RBerenguel wrote: What system? It is clear that it is quite hard to model player's rank real distribution.
In my opinion that's the issue with most ranking systems designs. It should not try to model the rank distribution at all; the definition of rank itself should be the only well defined concept to be modeled.

I suspect part of the problem with this is that rank is defined as a difference of X handicap stones between two players, but the very definition may be flawed if it doesn't scale correctly as the handicap increases, as well as if it doesn't apply equally to higher and lower ranks. That is, if a 1 rank player gives 3 stones to a 4 rank player for a 50% win rate, and a 4 rank player gives the same for the same win percentage to a 7 rank player, does it actually follow in reality that a 1 rank player will give 6 stones to a 7 rank player and come out with a 50% win percentage? If it does, is this true regardless of whether the hypothetical 1 rank player is 7 dan or 7 kyu as we would judge currently? I certainly don't have a good sense of how well players of any strength conform to this expectation of the definition, though I would welcome data one way or the other.

RobertJasiek · Post by **RobertJasiek** » Wed Mar 26, 2014 5:08 pm

Mef wrote:Robert has never had consecutive months with >70% win rate

"Ca. 70%" (IIRC, I have not said ">70%") has been a simplifying, rounded number, because I rely on memory. A couple of years ago, I actually counted numbers of wins and losses for one or two periods (a couple of weeks) when I played seriously in order to (and mainly for the purpose to) improve a KGS rank. IIRC, it was ca. 68.5%, but I am not sure of the exact number. I posted the figures somewhere, maybe you find them. I calculated the percentage from the start of making my serious attempt to the moment of reaching the next higher KGS rank. Therefore, it does not matter whether it was consecutive months. What matters is that it was EXACTLY the period during which I made the serious attempt.

I have not claimed to have had consecutive months with >70% win rate. You enjoy to bring forward this argument, which I have not made. Please understand the difference between consecutive calendar months and period of seriously playing until raising a rank.

(As I reported elsewhere, I also had the other experience of playing very little for IIRC months, then winning literally only a few games in order to suddenly improve a rank, i.e. being shown the next higher rank tag.)

Mef · Post by **Mef** » Wed Mar 26, 2014 5:32 pm

RobertJasiek wrote: Currently, by experience, one effectively needs to win ca. 68.5% for successive weeks or months to improve a rank as a frequently playing player. Assume it would be tweaked to 65%, would you be unhappy then? For me, this might make the difference, because 65% do not require as much super-human effort.

Honestly? Once I have the graph it's virtually 0 effort to check this. Aside from the 1 instance I mentioned previously, you have not had a periods of 2 consecutive months with 65%+ win rate on KGS in rated games either.

That aside, KGS assumes that there is a 66% likelihood of a person half a stone stronger winning a rated game (assuming they are 2d or stronger). An infinitely long 65% win streak would not necessarily be enough to promote. It's been a while since I have done the math on them, but I would assume AGA and EGF are similar in how they compute this.

edit: putting imagine in hide tag

Bantari · Post by **Bantari** » Wed Mar 26, 2014 5:36 pm

RobertJasiek wrote:Bantari,

that I have described my preferred kind of rating system does not imply that I would impose it on everybody for the sake of making only myself happy. Nevertheless, you allow me to express my opinion, right?:) - Since different people have different preferences, a rating system can be some compromise. However, currently the KGS rating system is no compromise in its stability aspect. - I think a compromise should be possible so that some stability is there but everybody (incl. the frequent players) can improve if winning a significant (instead of very great) percentage over a reasonable (instead of extraordinarily long) period and without super-human effort (alternatively without playing for a few months, then winning a few games).

To understand your preference for the current system, how many games do you play per day and how many months do you need to improve a rank after having dropped a rank?

Currently, by experience, one effectively needs to win ca. 68.5% for successive weeks or months to improve a rank as a frequently playing player. Assume it would be tweaked to 65%, would you be unhappy then? For me, this might make the difference, because 65% do not require as much super-human effort.

Ok, fair enough.

One point, though:

If it takes less effort to reach a rank, there would be more players with that rank. For example: you are sitting in a pool of 4d players. If you lower the threshold for rank increase to a lower percentage, you might rise to 5d easier, but so would many of your other fellow 4d players. At the same time, many of the 5d players would rise to 6d, since this would be easier now as well. Taking it to extreme, chances are you will sit in the same pool of the same people just with a different number by your name. To me, this would be absolutely meaningless, its just a label. As long as the system is uniform, I care not that much if people of my strength are called 4d or 5d or whatever.

And a second point, for good measure:

With the situation being as it is, it certainly does not take a "superhuman effort" to reach 5d. There are many players who are 5d on KGS, they reached it fair and square, and I have hard time believing that they are all X-Men. What you mean, I assume, is that it would take a "superhuman effort" for *you* to reach 5d. But all that this means is that, according to this particular rating system, you are not yet strong enough to reach 5d on KGS, pure and simple. No matter how your ego makes you think of yourself or how much you would love it to be otherwise.

If, for whatever reasons (for example - teaching fees) it is important for you to have a higher number by your name, best to switch to a server on which the system allows somebody of your strength reach higher ranks. As for KGS... the value of reaching a higher rank is precisely because it is not easy to reach, it means something. Making it easier to reach would make it mean less. Just like Tygem ranks mean squat - certainly I would never consider a Tygem 5d anything near a real-life 5d. While KGS 5d is pretty strong.

This is the best advice I can give you.

uPWarrior · Post by **uPWarrior** » Wed Mar 26, 2014 6:04 pm

skydyr wrote:
uPWarrior wrote:
RBerenguel wrote: What system? It is clear that it is quite hard to model player's rank real distribution.
In my opinion that's the issue with most ranking systems designs. It should not try to model the rank distribution at all; the definition of rank itself should be the only well defined concept to be modeled.
I suspect part of the problem with this is that rank is defined as a difference of X handicap stones between two players, but the very definition may be flawed if it doesn't scale correctly as the handicap increases, as well as if it doesn't apply equally to higher and lower ranks. That is, if a 1 rank player gives 3 stones to a 4 rank player for a 50% win rate, and a 4 rank player gives the same for the same win percentage to a 7 rank player, does it actually follow in reality that a 1 rank player will give 6 stones to a 7 rank player and come out with a 50% win percentage? If it does, is this true regardless of whether the hypothetical 1 rank player is 7 dan or 7 kyu as we would judge currently? I certainly don't have a good sense of how well players of any strength conform to this expectation of the definition, though I would welcome data one way or the other.

I don't have data but I think most people would agree both things apply: stones are not fully transitive and stronger players play less swingy games.

The second fact wouldn't impact the rating system at all, if a 7k wins 50% of the games against a 4k then he should be 6k and the same would be true for the 7d. (how easy it is to actually win 50% of the games against a player 3 stones stronger wouldn't have to be modeled at all)
Transitivity could be a problem, you could try to model that distribution instead of the player distribution as this one would not be biased (your player distribution model depends on your own ranking system, while the win/loss ratio does not). Or you could just ignore the fact that high handicaps aren't transitive as they are so rare anyway..

ez4u · Post by **ez4u** » Wed Mar 26, 2014 6:05 pm

Below is a graph that may (or may not) help. This is from the KGS Analytics download. It graphs the 100-game moving average win rate (i.e. the moving average of column 'L' in the download file) against the average 'Rank' (column 'D' in the download file) at the time of those games. The moving average will give us a different view than monthly results due to the changing volume of games played per month. Notice in the X-axis labels that Aug-07 through Dec-07 shows each month. Sum was busy in those months. Compare that to Nov-12 through Dec-13. Only four months are shown: Nov, Mar, Aug, Dec. Sum was not busy in those months.

The rank was averaged and divided by 10 just to fit it in the same scale as the winning rate. Hence 5d = '50%', 4d = '40%', etc. on this graph. This allows us to look at the relationship between winning rate and promotion/demotion timing.

skydyr · Post by **skydyr** » Wed Mar 26, 2014 6:25 pm

uPWarrior wrote:
skydyr wrote:
uPWarrior wrote: In my opinion that's the issue with most ranking systems designs. It should not try to model the rank distribution at all; the definition of rank itself should be the only well defined concept to be modeled.
I suspect part of the problem with this is that rank is defined as a difference of X handicap stones between two players, but the very definition may be flawed if it doesn't scale correctly as the handicap increases, as well as if it doesn't apply equally to higher and lower ranks. That is, if a 1 rank player gives 3 stones to a 4 rank player for a 50% win rate, and a 4 rank player gives the same for the same win percentage to a 7 rank player, does it actually follow in reality that a 1 rank player will give 6 stones to a 7 rank player and come out with a 50% win percentage? If it does, is this true regardless of whether the hypothetical 1 rank player is 7 dan or 7 kyu as we would judge currently? I certainly don't have a good sense of how well players of any strength conform to this expectation of the definition, though I would welcome data one way or the other.
I don't have data but I think most people would agree both things apply: stones are not fully transitive and stronger players play less swingy games.

The second fact wouldn't impact the rating system at all, if a 7k wins 50% of the games against a 4k then he should be 6k and the same would be true for the 7d. (how easy it is to actually win 50% of the games against a player 3 stones stronger wouldn't have to be modeled at all)
Transitivity could be a problem, you could try to model that distribution instead of the player distribution as this one would not be biased (your player distribution model depends on your own ranking system, while the win/loss ratio does not). Or you could just ignore the fact that high handicaps aren't transitive as they are so rare anyway..

If a 7k is winning 50% of 3 stone games against a 4k, and losing 50% of them, why would you assume their rank should be increased? I suspect I've misunderstood your argument.

Life In 19x19

A Curious Case Study in KGS Ranks

Re: A Curious Case Study in KGS Ranks

Re: A Curious Case Study in KGS Ranks

Re: A Curious Case Study in KGS Ranks

Re: A Curious Case Study in KGS Ranks

Re: A Curious Case Study in KGS Ranks

Re: A Curious Case Study in KGS Ranks

Re: A Curious Case Study in KGS Ranks

Re: A Curious Case Study in KGS Ranks

Re: A Curious Case Study in KGS Ranks

Re: A Curious Case Study in KGS Ranks

Re: A Curious Case Study in KGS Ranks

Re: A Curious Case Study in KGS Ranks

Re: A Curious Case Study in KGS Ranks

Re: A Curious Case Study in KGS Ranks

Re: A Curious Case Study in KGS Ranks