A Curious Case Study in KGS Ranks

Comments, questions, rants, etc, that are specifically about KGS go here.
User avatar
Bantari
Gosei
Posts: 1639
Joined: Sun Dec 06, 2009 6:34 pm
GD Posts: 0
Universal go server handle: Bantari
Location: Ponte Vedra
Has thanked: 642 times
Been thanked: 490 times

Re: A Curious Case Study in KGS Ranks

Post by Bantari »

RobertJasiek wrote:There are also other reasons why I do not play much on other servers, such as extremely disliking having to use another software for every server.
There are other reasons to like KGS, so I want the worst part of KGS (the rating system) to improve so that I can better enjoy to good features of KGS.

I think what you suggest is not really an "improvement", but only a change which would make it more fun for you, personally, to play there.

But you need to understand that this change, which would make it more fun for you, personally, to play there - this change would ruin the fun for some others to play there, me included, personally. I happen to quite like the KGS rating system as it is.

So basically, what you propose is that because of your personal dislike to play on other servers, you wish the one server which you like playing on to cater to your very personal preferences, even when this means that others are unhappy.

What I would suggest is that instead of remaking the world to be your sweet cozy oyster, you try to figure out how to combat your personal little idiosyncrasy which prevents you from enjoying the oysters already out there. It is easier and infinitely more efficient to change one person that then whole server. And if you really feel about it strongly, why not just switch to Tygem entirely - you will have then also just one server with just one software, and this should make you happy. Or no?

As I said - as long as we don't all agree on exactly the same model, places have to exist which cater to various groups. It seems that the place that caters to you is Tygem, so why worry about KGS?

RobertJasiek wrote:I have not said that one system must be used on all servers. You have made this up.

I did not "make it up". I have inferred it from what you said. It never occurred to me that you make all this fuss because you are unwilling to play on Tygem or anywhere else but KGS, and thus KGS has to cater to your personal preferences in spite of the fact that places which cater to your personal preferences already exist and thrive elsewhere.

So it is probably my bad, apologies.

However - I think you should lobby to also switch Tygem rating system to more sensible one, just as you lobby for change in KGS system. After all, balance must be preserved. If you try to take away the place I enjoy, at least try to give me a substitute. Fair is fair, no?

RobertJasiek wrote:There is room for a server with real world ratings. In fact, there is so much room that such a server does not even exist remotely. Don't even try to pretend KGS would be such a server, ridiculous. On KGS, equally KGS-ranked players can easily be 5 real world ranks apart.

As can they in real world. I still remember fondly when I was forced to give a "1d" player 9 handi and trashed him badly.

Anyways - I never said what you suggest, you are making it up. I said "real-world-like", there is a difference, especially when we consider the context, which is rank stability vs. rank instability.
- Bantari
______________________________________________
WARNING: This post might contain Opinions!!
User avatar
oren
Oza
Posts: 2777
Joined: Sun Apr 18, 2010 5:54 pm
GD Posts: 0
KGS: oren
Tygem: oren740, orenl
IGS: oren
Wbaduk: oren
Location: Seattle, WA
Has thanked: 251 times
Been thanked: 549 times

Re: A Curious Case Study in KGS Ranks

Post by oren »

Mef wrote:Nevertheless even for the corner case KGS has a simple way to solve this problem: Play games handicapped at the rating you think you should be! This will allow you to reach your equilibrium faster and unlike many other rating systems does not penalize the opponents who help you get there.


Or make a new account. :)
User avatar
RBerenguel
Gosei
Posts: 1585
Joined: Fri Nov 18, 2011 11:44 am
Rank: KGS 5k
GD Posts: 0
KGS: RBerenguel
Tygem: rberenguel
Wbaduk: JohnKeats
Kaya handle: RBerenguel
Online playing schedule: KGS on Saturday I use to be online, but I can be if needed from 20-23 GMT+1
Location: Barcelona, Spain (GMT+1)
Has thanked: 576 times
Been thanked: 298 times
Contact:

Re: A Curious Case Study in KGS Ranks

Post by RBerenguel »

RobertJasiek wrote:
My system (when worked out to have global non-deflationary stability) would have much greater volatiliy, but I am not at all convinced it would have smaller accuracy. Rather I think that, on average for every particular player, it would have greater accuracy, because it can correct his temporarily wrong ratings much more quickly.


What system? It is clear that it is quite hard to model player's rank real distribution. A higher volatility simulated model is quite wrong, too
Geek of all trades, master of none: the motto for my blog mostlymaths.net
Mef
Lives in sente
Posts: 852
Joined: Fri Apr 23, 2010 8:34 am
Rank: KGS [-]
GD Posts: 428
Location: Central Coast
Has thanked: 201 times
Been thanked: 333 times

Re: A Curious Case Study in KGS Ranks

Post by Mef »

I really enjoy reading the discussion that's been going on (well at least half of the discussion...Robert defending a rating system he came up with in 10 seconds without thinking of its implications isn't as interesting to me). One of the reasons I wanted to present this corner case was to start some discussion.

I think one thing this has help point out is one of the sources for confusion (and perhaps frustration) related to the KGS rating system. It can be quickly summarized here:

Polama wrote: What we strictly, factually know is that over 242 games this account was at least 3 stones weaker, potentially more depending on the exact nature of the bug.


Polama wrote:I think an advanced statistical model would view this case as a meaningful shift


vs.

RBerenguel wrote:A student in statistics won't look at the data and say, "hey, this player is a sucker now!"



What we have at the core is two questions: "How strong was the bot performing at a given time?" vs. "How do you expect the bot to perform on its next game?"

Many people are worried about the former and this is related to what Polama is calculating. The performance of the bot on that day was clearly well below 11k. This is very easy to show with very high statistical certainty.

The other question is related, however it is not the same. Likewise, when you calculate the expected result it is also not the same. If we were to look for analogies, the closest we will probably find to something like this is a sports injury. If a player is injured, their performance may suffer a sudden drastic drop, but you would not expect this to be representative of how they will be expected to perform if and when they recover.

The KGS rating system aims to answer the latter question (predict the outcome of the next game), at the necessary expense of the former question (describe the result of the previous game). This of course always implies there is a bit of regression to the mean ever-present in all of its calculations.
Mef
Lives in sente
Posts: 852
Joined: Fri Apr 23, 2010 8:34 am
Rank: KGS [-]
GD Posts: 428
Location: Central Coast
Has thanked: 201 times
Been thanked: 333 times

Re: A Curious Case Study in KGS Ranks

Post by Mef »

RobertJasiek wrote:The case study does not compare well to human players with frequent games, who need, without significant interruption, to win ca. 70+% for weeks up to a few months in order to improve a rank, after it has been VERY MUCH easier to drop a rank.


I let this slide when it was posted because I didn't think it was worth bringing up...but as Robert has continued harping on about how he feels slighted by the flawed KGS rating system and because this claim is so incredibly easy to check (it took me about 10 minutes to make a spreadsheet), I just wanted to point out that Robert has never had consecutive months with >70% win rate in rated games regardless of sample size, playing rate, rating change, handicap, etc. If you count April/May in 2004 he had one set of 2 months where he had 70% and 72%, but that's almost 10 years ago and the KGS rating system has been adjusted several times since then. I have attached a graph which is quite easy for anyone to independently verify with his archives.

Robert's statements are based on assumptions that are divorced from reality.

Edit: Putting imagine in hide tag:
Monthly win rate 2004-2014
Monthly win rate 2004-2014
Sum-Monthly-Winrate-small.JPG (25.68 KiB) Viewed 11766 times
Last edited by Mef on Wed Mar 26, 2014 6:33 pm, edited 1 time in total.
mitsun
Lives in gote
Posts: 553
Joined: Fri Apr 23, 2010 10:10 pm
Rank: AGA 5 dan
GD Posts: 0
Has thanked: 61 times
Been thanked: 250 times

Re: A Curious Case Study in KGS Ranks

Post by mitsun »

Mef wrote:The KGS rating system aims to answer the latter question (predict the outcome of the next game), at the necessary expense of the former question (describe the result of the previous game).

Hmm, I thought I understood the KGS rating system until I read this. I would have said that the KGS rating system is designed to accurately describe the results of the previous games, with the assumption that this allows it to predict the outcome of the next game.

On the subject of a player whose rank changes drastically and discontinuously, that is an unusual case which violates the assumptions of the rating model, and I don't think is it particularly interesting to see how KGS or any other rating system copes with this anomaly.
RobertJasiek
Judan
Posts: 6273
Joined: Tue Apr 27, 2010 8:54 pm
GD Posts: 0
Been thanked: 797 times
Contact:

Re: A Curious Case Study in KGS Ranks

Post by RobertJasiek »

Bantari,

that I have described my preferred kind of rating system does not imply that I would impose it on everybody for the sake of making only myself happy. Nevertheless, you allow me to express my opinion, right?:) - Since different people have different preferences, a rating system can be some compromise. However, currently the KGS rating system is no compromise in its stability aspect. - I think a compromise should be possible so that some stability is there but everybody (incl. the frequent players) can improve if winning a significant (instead of very great) percentage over a reasonable (instead of extraordinarily long) period and without super-human effort (alternatively without playing for a few months, then winning a few games).

To understand your preference for the current system, how many games do you play per day and how many months do you need to improve a rank after having dropped a rank?

Currently, by experience, one effectively needs to win ca. 68.5% for successive weeks or months to improve a rank as a frequently playing player. Assume it would be tweaked to 65%, would you be unhappy then? For me, this might make the difference, because 65% do not require as much super-human effort.
uPWarrior
Lives with ko
Posts: 199
Joined: Mon Jan 17, 2011 1:59 pm
Rank: KGS 3 kyu
GD Posts: 0
Has thanked: 6 times
Been thanked: 55 times

Re: A Curious Case Study in KGS Ranks

Post by uPWarrior »

RBerenguel wrote:What system? It is clear that it is quite hard to model player's rank real distribution.


In my opinion that's the issue with most ranking systems designs. It should not try to model the rank distribution at all; the definition of rank itself should be the only well defined concept to be modeled.
skydyr
Oza
Posts: 2495
Joined: Wed Aug 01, 2012 8:06 am
GD Posts: 0
Universal go server handle: skydyr
Online playing schedule: When my wife is out.
Location: DC
Has thanked: 156 times
Been thanked: 436 times

Re: A Curious Case Study in KGS Ranks

Post by skydyr »

uPWarrior wrote:
RBerenguel wrote:What system? It is clear that it is quite hard to model player's rank real distribution.


In my opinion that's the issue with most ranking systems designs. It should not try to model the rank distribution at all; the definition of rank itself should be the only well defined concept to be modeled.


I suspect part of the problem with this is that rank is defined as a difference of X handicap stones between two players, but the very definition may be flawed if it doesn't scale correctly as the handicap increases, as well as if it doesn't apply equally to higher and lower ranks. That is, if a 1 rank player gives 3 stones to a 4 rank player for a 50% win rate, and a 4 rank player gives the same for the same win percentage to a 7 rank player, does it actually follow in reality that a 1 rank player will give 6 stones to a 7 rank player and come out with a 50% win percentage? If it does, is this true regardless of whether the hypothetical 1 rank player is 7 dan or 7 kyu as we would judge currently? I certainly don't have a good sense of how well players of any strength conform to this expectation of the definition, though I would welcome data one way or the other.
RobertJasiek
Judan
Posts: 6273
Joined: Tue Apr 27, 2010 8:54 pm
GD Posts: 0
Been thanked: 797 times
Contact:

Re: A Curious Case Study in KGS Ranks

Post by RobertJasiek »

Mef wrote:Robert has never had consecutive months with >70% win rate


"Ca. 70%" (IIRC, I have not said ">70%") has been a simplifying, rounded number, because I rely on memory. A couple of years ago, I actually counted numbers of wins and losses for one or two periods (a couple of weeks) when I played seriously in order to (and mainly for the purpose to) improve a KGS rank. IIRC, it was ca. 68.5%, but I am not sure of the exact number. I posted the figures somewhere, maybe you find them. I calculated the percentage from the start of making my serious attempt to the moment of reaching the next higher KGS rank. Therefore, it does not matter whether it was consecutive months. What matters is that it was EXACTLY the period during which I made the serious attempt.

I have not claimed to have had consecutive months with >70% win rate. You enjoy to bring forward this argument, which I have not made. Please understand the difference between consecutive calendar months and period of seriously playing until raising a rank.

(As I reported elsewhere, I also had the other experience of playing very little for IIRC months, then winning literally only a few games in order to suddenly improve a rank, i.e. being shown the next higher rank tag.)
Mef
Lives in sente
Posts: 852
Joined: Fri Apr 23, 2010 8:34 am
Rank: KGS [-]
GD Posts: 428
Location: Central Coast
Has thanked: 201 times
Been thanked: 333 times

Re: A Curious Case Study in KGS Ranks

Post by Mef »

RobertJasiek wrote:Currently, by experience, one effectively needs to win ca. 68.5% for successive weeks or months to improve a rank as a frequently playing player. Assume it would be tweaked to 65%, would you be unhappy then? For me, this might make the difference, because 65% do not require as much super-human effort.


Honestly? Once I have the graph it's virtually 0 effort to check this. Aside from the 1 instance I mentioned previously, you have not had a periods of 2 consecutive months with 65%+ win rate on KGS in rated games either.

That aside, KGS assumes that there is a 66% likelihood of a person half a stone stronger winning a rated game (assuming they are 2d or stronger). An infinitely long 65% win streak would not necessarily be enough to promote. It's been a while since I have done the math on them, but I would assume AGA and EGF are similar in how they compute this.

edit: putting imagine in hide tag
Monthly win rate with reference lines
Monthly win rate with reference lines
Sum-Monthly-Winrate-small-with-lines.JPG (26.96 KiB) Viewed 11763 times
Last edited by Mef on Wed Mar 26, 2014 6:31 pm, edited 1 time in total.
User avatar
Bantari
Gosei
Posts: 1639
Joined: Sun Dec 06, 2009 6:34 pm
GD Posts: 0
Universal go server handle: Bantari
Location: Ponte Vedra
Has thanked: 642 times
Been thanked: 490 times

Re: A Curious Case Study in KGS Ranks

Post by Bantari »

RobertJasiek wrote:Bantari,

that I have described my preferred kind of rating system does not imply that I would impose it on everybody for the sake of making only myself happy. Nevertheless, you allow me to express my opinion, right?:) - Since different people have different preferences, a rating system can be some compromise. However, currently the KGS rating system is no compromise in its stability aspect. - I think a compromise should be possible so that some stability is there but everybody (incl. the frequent players) can improve if winning a significant (instead of very great) percentage over a reasonable (instead of extraordinarily long) period and without super-human effort (alternatively without playing for a few months, then winning a few games).

To understand your preference for the current system, how many games do you play per day and how many months do you need to improve a rank after having dropped a rank?

Currently, by experience, one effectively needs to win ca. 68.5% for successive weeks or months to improve a rank as a frequently playing player. Assume it would be tweaked to 65%, would you be unhappy then? For me, this might make the difference, because 65% do not require as much super-human effort.

Ok, fair enough.

One point, though:
  • If it takes less effort to reach a rank, there would be more players with that rank. For example: you are sitting in a pool of 4d players. If you lower the threshold for rank increase to a lower percentage, you might rise to 5d easier, but so would many of your other fellow 4d players. At the same time, many of the 5d players would rise to 6d, since this would be easier now as well. Taking it to extreme, chances are you will sit in the same pool of the same people just with a different number by your name. To me, this would be absolutely meaningless, its just a label. As long as the system is uniform, I care not that much if people of my strength are called 4d or 5d or whatever.

And a second point, for good measure:
  • With the situation being as it is, it certainly does not take a "superhuman effort" to reach 5d. There are many players who are 5d on KGS, they reached it fair and square, and I have hard time believing that they are all X-Men. What you mean, I assume, is that it would take a "superhuman effort" for *you* to reach 5d. But all that this means is that, according to this particular rating system, you are not yet strong enough to reach 5d on KGS, pure and simple. No matter how your ego makes you think of yourself or how much you would love it to be otherwise.

If, for whatever reasons (for example - teaching fees) it is important for you to have a higher number by your name, best to switch to a server on which the system allows somebody of your strength reach higher ranks. As for KGS... the value of reaching a higher rank is precisely because it is not easy to reach, it means something. Making it easier to reach would make it mean less. Just like Tygem ranks mean squat - certainly I would never consider a Tygem 5d anything near a real-life 5d. While KGS 5d is pretty strong.

This is the best advice I can give you.
- Bantari
______________________________________________
WARNING: This post might contain Opinions!!
uPWarrior
Lives with ko
Posts: 199
Joined: Mon Jan 17, 2011 1:59 pm
Rank: KGS 3 kyu
GD Posts: 0
Has thanked: 6 times
Been thanked: 55 times

Re: A Curious Case Study in KGS Ranks

Post by uPWarrior »

skydyr wrote:
uPWarrior wrote:
RBerenguel wrote:What system? It is clear that it is quite hard to model player's rank real distribution.


In my opinion that's the issue with most ranking systems designs. It should not try to model the rank distribution at all; the definition of rank itself should be the only well defined concept to be modeled.


I suspect part of the problem with this is that rank is defined as a difference of X handicap stones between two players, but the very definition may be flawed if it doesn't scale correctly as the handicap increases, as well as if it doesn't apply equally to higher and lower ranks. That is, if a 1 rank player gives 3 stones to a 4 rank player for a 50% win rate, and a 4 rank player gives the same for the same win percentage to a 7 rank player, does it actually follow in reality that a 1 rank player will give 6 stones to a 7 rank player and come out with a 50% win percentage? If it does, is this true regardless of whether the hypothetical 1 rank player is 7 dan or 7 kyu as we would judge currently? I certainly don't have a good sense of how well players of any strength conform to this expectation of the definition, though I would welcome data one way or the other.


I don't have data but I think most people would agree both things apply: stones are not fully transitive and stronger players play less swingy games.

The second fact wouldn't impact the rating system at all, if a 7k wins 50% of the games against a 4k then he should be 6k and the same would be true for the 7d. (how easy it is to actually win 50% of the games against a player 3 stones stronger wouldn't have to be modeled at all)
Transitivity could be a problem, you could try to model that distribution instead of the player distribution as this one would not be biased (your player distribution model depends on your own ranking system, while the win/loss ratio does not). Or you could just ignore the fact that high handicaps aren't transitive as they are so rare anyway..
Last edited by uPWarrior on Wed Mar 26, 2014 6:06 pm, edited 1 time in total.
User avatar
ez4u
Oza
Posts: 2414
Joined: Wed Feb 23, 2011 10:15 pm
Rank: Jp 6 dan
GD Posts: 0
KGS: ez4u
Location: Tokyo, Japan
Has thanked: 2351 times
Been thanked: 1332 times

Re: A Curious Case Study in KGS Ranks

Post by ez4u »

Below is a graph that may (or may not) help. This is from the KGS Analytics download. It graphs the 100-game moving average win rate (i.e. the moving average of column 'L' in the download file) against the average 'Rank' (column 'D' in the download file) at the time of those games. The moving average will give us a different view than monthly results due to the changing volume of games played per month. Notice in the X-axis labels that Aug-07 through Dec-07 shows each month. Sum was busy in those months. Compare that to Nov-12 through Dec-13. Only four months are shown: Nov, Mar, Aug, Dec. Sum was not busy in those months.

The rank was averaged and divided by 10 just to fit it in the same scale as the winning rate. Hence 5d = '50%', 4d = '40%', etc. on this graph. This allows us to look at the relationship between winning rate and promotion/demotion timing.
Sum Win Rate - Dan 20140327.jpg
Sum Win Rate - Dan 20140327.jpg (76.8 KiB) Viewed 11744 times
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
skydyr
Oza
Posts: 2495
Joined: Wed Aug 01, 2012 8:06 am
GD Posts: 0
Universal go server handle: skydyr
Online playing schedule: When my wife is out.
Location: DC
Has thanked: 156 times
Been thanked: 436 times

Re: A Curious Case Study in KGS Ranks

Post by skydyr »

uPWarrior wrote:
skydyr wrote:
uPWarrior wrote:In my opinion that's the issue with most ranking systems designs. It should not try to model the rank distribution at all; the definition of rank itself should be the only well defined concept to be modeled.


I suspect part of the problem with this is that rank is defined as a difference of X handicap stones between two players, but the very definition may be flawed if it doesn't scale correctly as the handicap increases, as well as if it doesn't apply equally to higher and lower ranks. That is, if a 1 rank player gives 3 stones to a 4 rank player for a 50% win rate, and a 4 rank player gives the same for the same win percentage to a 7 rank player, does it actually follow in reality that a 1 rank player will give 6 stones to a 7 rank player and come out with a 50% win percentage? If it does, is this true regardless of whether the hypothetical 1 rank player is 7 dan or 7 kyu as we would judge currently? I certainly don't have a good sense of how well players of any strength conform to this expectation of the definition, though I would welcome data one way or the other.


I don't have data but I think most people would agree both things apply: stones are not fully transitive and stronger players play less swingy games.

The second fact wouldn't impact the rating system at all, if a 7k wins 50% of the games against a 4k then he should be 6k and the same would be true for the 7d. (how easy it is to actually win 50% of the games against a player 3 stones stronger wouldn't have to be modeled at all)
Transitivity could be a problem, you could try to model that distribution instead of the player distribution as this one would not be biased (your player distribution model depends on your own ranking system, while the win/loss ratio does not). Or you could just ignore the fact that high handicaps aren't transitive as they are so rare anyway..


If a 7k is winning 50% of 3 stone games against a 4k, and losing 50% of them, why would you assume their rank should be increased? I suspect I've misunderstood your argument.
Post Reply