Page 7 of 9

Re: KGS ranking revisited

Posted: Tue May 15, 2012 2:52 pm
by speedchase
Mef wrote:
Note: Cumulative totals exclude the months of May, June, and July when games were played as a 4d
Rank-Case-Study-3.JPG


This is very good evidence of how overly heavy it gets.

Re: KGS ranking revisited

Posted: Tue May 15, 2012 3:08 pm
by averell

Code: Select all

Statistics of Even Games - all players

 SeMin  SeMax    AdR     Nw     Ng     Pw    ASe   Pw-ASe  SE(AdR)   ARGD     dPw 
 -----  -----  ------  -----  -----  -----  -----  ------  -------  ------  ------
   0.0    2.5   731.9   1357  27306    5.0    1.0     3.9      1.3   -36.3     3.7
   2.5    5.0   408.8   2258  17924   12.6    3.7     8.9      4.7   -56.0     7.9
   5.0    7.5   334.0   3087  17274   17.9    6.2    11.6      6.1   -63.9    11.7
   7.5   10.0   293.9   3966  18319   21.6    8.7    12.9      8.0   -64.9    13.6
  10.0   12.5   260.3   4752  18248   26.0   11.3    14.8      9.7   -66.2    16.3
  12.5   15.0   230.0   5142  18190   28.3   13.7    14.5     10.7   -66.8    17.6
  15.0   17.5   208.4   6042  19263   31.4   16.2    15.1     12.5   -64.2    18.9
  17.5   20.0   189.0   6554  20207   32.4   18.7    13.7     14.8   -61.6    17.6
  20.0   22.5   168.2   7098  20378   34.8   21.3    13.6     16.1   -58.7    18.8
  22.5   25.0   151.7   7708  21138   36.5   23.8    12.7     18.4   -56.0    18.1
  25.0   27.5   138.8   8428  22659   37.2   26.3    10.9     21.6   -52.4    15.5
  27.5   30.0   117.6   8219  20817   39.5   28.7    10.7     22.1   -54.2    17.4
  30.0   32.5   102.4   8795  21870   40.2   31.3     8.9     24.8   -48.5    15.4
  32.5   35.0    89.2   9698  23469   41.3   33.8     7.5     27.8   -44.5    13.6
  35.0   37.5    76.8  10582  24572   43.1   36.3     6.8     30.8   -40.8    12.3
  37.5   40.0    61.8   9978  23050   43.3   38.7     4.6     32.7   -42.8    10.6
  40.0   42.5    46.3   9933  21930   45.3   41.3     4.0     35.1   -40.2    10.2
  42.5   45.0    33.0  10222  22000   46.5   43.8     2.7     37.6   -39.0     8.8
  45.0   47.5    19.7  10566  22237   47.5   46.3     1.3     40.7   -36.0     6.8
  47.5   50.0     6.6  10908  22585   48.3   48.7    -0.5     43.8   -33.5     4.5

The actual data, from that same site (2000-Jan-01 to today)
In this table, AdR is average rating difference, and Pw is percentage of wins.
While the actual percentages don't matter to me either, the fact is that a single rank or 100 elo points doesn't make all that much difference w.r.t. winning chances. Of course this is EGD tournament data, but i doubt KGS blitz ranks fare any better.

Mef wrote:The point I was trying to make is that if there are the system anomalies that some suspect, there should be some easily measurable side effects we could predict and identify. I guess at the end of the day, when possible I'd much prefer hard facts to sneaking suspicions and speculation...but then again, I am from Missouri...

You act like there are no "system anomalies". Even if a system like KGS is the best available (or reasonably possible) doesn't mean that it's not crap in a lot of ways. For example fake short-lived accounts are a reality. Also the assumption "a constant amount of games played over time" is often not realistic (christmas, other holidays). And "heavyness" of accounts even if a lot of the time is only frustration/bias actually admittedly exists and we're only arguing about how bad it actually is (fast improving people / general population).

That said, the only thing i would like to see changed with the KGS ranks is to make the data more open (actual accessible numbers), because people might just come up with a better system. Another hypotethical weakness is rank anchors, which have to be kept secret, but i don't see why they would be that hard to find. I might try this as a pet project when i get a few days of free time.

Re: KGS ranking revisited

Posted: Tue May 15, 2012 3:10 pm
by jts
speedchase wrote:
Mef wrote:
Note: Cumulative totals exclude the months of May, June, and July when games were played as a 4d
Rank-Case-Study-3.JPG


This is very good evidence of how overly heavy it gets.

What are you talking about? What percentage of games you won in even games against 4dans doesn't tell us too much about whether you should be promoted to 6dan several months later. Maybe you have something else in mind, but you need to be more specific rather than sniping at Mef, who did an excellent job on his analysis of supposedly "heavy" rankings.

Re: KGS ranking revisited

Posted: Tue May 15, 2012 4:03 pm
by lemmata
Irrelevant musing: Given how long this discussion on ranks is, I wonder if server popularity could permanently be increased by radical rank inflation.

Some on-topic musing: Does it really matter how close the KGS ranking system is to someone's ideal? If you can maintain, say, an 80% winning rate against people of equal rank over a month or two, you will rank up. Sure that might not be ideal, but it just means that the system is conservative in rejecting the null hypothesis that the parameter governing a series of random trials has not changed.

More on-topic musing: In the grand scheme of things, rank accuracy is not that important in comparison to other issues. Consider an extreme scenario in which the server designer only cares about rank accuracy. The designer may force everyone to play ONLY randomized automatch games without the option of choosing a rank range (the designer will choose the "optimal" range). After all, we can't have pools of people who only play each other and skew the system (national rooms, players who only play others +/- 1 rank). Freedom to choose is quite valuable to go players, but that freedom may adversely affect the accuracy of a server's ranking system. Designing a system that is robust while allowing those freedoms may result in a system that unbiased (accurate on average), but is slower to recognize changes (but eventually does).

KGS-specific musing: I wonder if wms would get fewer complaints if he opted to display that a player is 9.892 kyu rather than 10 kyu. A player's ranking actually moves all the time. We just don't see it because the KGS displays integer-valued ranks. We can look at the rank graph, but that is not quite as visceral as seeing an actual number.

Meta-thread musing: This discussion really seems to be more about what would be an ideal ranking system. The KGS-specific issues seem more like a distraction to the discussion the participants really want to have. Perhaps a new, more focused thread is in order?

Zen musing: A Tygem 4 dan once told me that he was 4 gup/kyu according to tests at the Hanguk Giwon (Korean Baduk Association). I don't think he complained too much about being under-ranked according to the Hanguk Giwon. Unless you are a pro, these numbers are just for fun. Enjoy the game.

Re: KGS ranking revisited

Posted: Tue May 15, 2012 6:37 pm
by Tami
Mef, I resent being used for that kind of analysis. Couldn`t you at least have chosen some other player and WITHHELD their name? Doing what you did has a definite flavour of ad hominem about it, as did another poster`s allusion to "illusory superiority".

I have not been claiming MY rank was wrong, rather that I would only like to see the system made more fluid - in both directions, up and down. I think 1k is currently about right, although I'd rather be labelled a 2k temporarily or a 1d temporarily than told that I never change.

In any case, the graph Mef made could even support my points: between September to November I only managed a win rate of 40% or worse, yet my graph went UP. I won 57% of my games since March and my line is back where it started, thanks to the recent adjustment. My rank never changes - even in periods when I make the mistake of playing a lot of games while upset or distracted.

I am in favour of a system that demotes you when you are doing badly, and promotes you when you are doing well, instead of using many months worth of data to keep you the same. I`m also very much in favour of not making large adjustments at seemingly random times. It would be easier for us lay people to understand.

I would honestly rather be demoted if I lost the bulk of my games in, say, a one month period, if it meant that I could be promoted as easily for winning the bulk of my games in a similar period. Using a long history only makes it hard to understand why your graph and rating behave as they do. If you ARE improving quickly, then your history only holds you back.

Stability is not such a great thing, anyway: if a player is off-form, then allowing their rank to drop down would give them the chance to get back on-form by having easier opponents to play against. Again, if they were on hot form, then a promotion would give them tougher partners, and result in their form cooling off. That is, a fluid system would absorb temporary changes better than a rigid one, like a martial artist flowing with a punch. In contrast, with KGS`s largely unchanging ranks, you get players winning or losing a lot without promoting or demoting, and even if their win-loss scores even out to 50-50 over a 300-year period, the impression they usually get is that the system is just heavy.

Anyway, just so that it's clear: I have never claimed that my rank, personally speaking, was inaccurate; only that I would like to see a system that was more responsive to form, and assigned ranks in a more fluid way. That does not mean going to the opposite extreme, only that it would desirable to have a system that did not feel quite so like swimming through concrete.

Re: KGS ranking revisited

Posted: Tue May 15, 2012 7:05 pm
by speedchase
jts wrote:
speedchase wrote:
Mef wrote:
Note: Cumulative totals exclude the months of May, June, and July when games were played as a 4d
Rank-Case-Study-3.JPG


This is very good evidence of how overly heavy it gets.

What are you talking about? What percentage of games you won in even games against 4dans doesn't tell us too much about whether you should be promoted to 6dan several months later. Maybe you have something else in mind, but you need to be more specific rather than sniping at Mef, who did an excellent job on his analysis of supposedly "heavy" rankings.


First of all, i didn't "snip" at Mef, just pointed out that he brought data that supports the position opposite that of his. Second of all, who said anything about wins against 4dans getting you promoted to 6dan. I was pointing out that in a period of large variance, his graph moves less that 1/4th of a stone in either direction.

Re: KGS ranking revisited

Posted: Tue May 15, 2012 7:27 pm
by jts
Well, During this period, rj's wins as a 5dan varied between 44% and %49 - a strong performance but hardly erratic swings justifying huge rank changes. If anything, it's surprising he swung by even a half stone with such a steady win rate - perhaps we could blame this on heaviness, but that would be a little facile.

It's true that in a few months where he played very few games his win rate was much higher or lower ... 2 games at 0%, 17 at 75%... but surely you agree that given how many hundreds of games he plays, it would be cruel to bust him down to 1k on the basis of two games, or raise him to 7d on the basis of seventeen.

Re: KGS ranking revisited

Posted: Tue May 15, 2012 7:45 pm
by speedchase
jts wrote:It's true that in a few months where he played very few games his win rate was much higher or lower ... 2 games at 0%, 17 at 75%... but surely you agree that given how many hundreds of games he plays, it would be cruel to bust him down to 1k on the basis of two games, or raise him to 7d on the basis of seventeen.


You are using the logic of the current system to argue against the logic of a new system. it would only be "cruel" if you assume that he would have to stay here. in a more fluid system his rank could move up and down alot and it would be neither cruel or illogical, but rather then your rank would reflect how well you were playing

Re: KGS ranking revisited

Posted: Tue May 15, 2012 9:26 pm
by jts
speedchase wrote:
jts wrote:It's true that in a few months where he played very few games his win rate was much higher or lower ... 2 games at 0%, 17 at 75%... but surely you agree that given how many hundreds of games he plays, it would be cruel to bust him down to 1k on the basis of two games, or raise him to 7d on the basis of seventeen.


You are using the logic of the current system to argue against the logic of a new system. it would only be "cruel" if you assume that he would have to stay here. in a more fluid system his rank could move up and down alot and it would be neither cruel or illogical, but rather then your rank would reflect how well you were playing

Question: do you think there is any connection between "how well you were playing" and "how well you will play in the future"? Because there are many ways to look at "how well you were playing". You could look at yesterday, you could look at last week, you could look at the last week, you could look at the last three years, you could look at all the games you played sober, you could only look at Tuesdays because that's your lucky day.

My contention is that the only sensible way to quantify how well you've been playing is to choose the measure that best predicts how well you'll play in the future.

Just out of curiosity, if you flipped a coin 600 times and, during those 600 flips, got a sequence of 12 consecutive flips of which (in any order) 9 were heads and 3 were tails, would you believe that the odds of flipping heads had changed during those 12 flips?

Re: KGS ranking revisited

Posted: Tue May 15, 2012 9:35 pm
by Tami
jts wrote:Just out of curiosity, if you flipped a coin 600 times and, during those 600 flips, got a sequence of 12 consecutive flips of which (in any order) 9 were heads and 3 were tails, would you believe that the odds of flipping heads had changed during those 12 flips?


The fault with this analogy is that people change, coins don`t.

Re: KGS ranking revisited

Posted: Tue May 15, 2012 9:58 pm
by jts
Tami wrote:
jts wrote:Just out of curiosity, if you flipped a coin 600 times and, during those 600 flips, got a sequence of 12 consecutive flips of which (in any order) 9 were heads and 3 were tails, would you believe that the odds of flipping heads had changed during those 12 flips?


The fault with this analogy is that people change, coins don`t.

Coins are change! (Sorry, it was either that or something about flipping back and forth.)

Well, that just makes one half of the analogy easier to understand than the other half. We know coins don't (usually) change, so it's easy to contemplate with equanimity observing 12 unusually lucky flips (I always prefer heads, myself) and attributing this to chance rather than to a short period of heroic numismatic self-overcoming.

On the other hand, if you watch a basketball player make 600 3pt shots, and he makes 45% of them, it's very tempting to believe that there was a brief period of 12 shots when he had "hot hands" and that then he lost his nerve. Humans change, after all. Or, if a fund manager generally does slightly worse than the market over the course of 600 months, but within that period there were twelve consecutive months when he finally figured out the right system, but then everyone else caught on.

If you think you have evidence that a human has changed, you should see whether that evidence helps you predict the future, no?

Re: KGS ranking revisited

Posted: Tue May 15, 2012 10:41 pm
by Tami
jts wrote:If you think you have evidence that a human has changed, you should see whether that evidence helps you predict the future, no?


I agree.

It`s not that KGS ranks never change, it`s just that they change very slowly, which is frustrating for myself and many others. I`m sure anybody who maintained a high winning percentage over a year would gain promotion. But surely people can improve on a monthly basis, if not faster? Why can they not be rewarded sooner for their efforts?

Where do you draw the line anyway? How about making your ranking history stretch over 2 years to make it even more stable? Or how about 5 or 20 years? Should a player demonstrate a new level of strength or weakness over 1 week, 1 month, 1 year, or a decade before that player receives their promotion or demotion?

Speaking for myself, the KGS system does not deliver a 50-50 ratio. I lose more than I win, as it happens. Neither does it give me predominantly close games, even though the system was purportedly designed to. Taking Himiko`s most active recent month, March, for instance, and you will see there was only one game that did not end in resignation or a large margin.

I`m sure I cannot win this argument. I don't have the background in maths or statistics necessary to support my views. But my subjective impression is that the system is too rigid, and I know that this is not a unique impression. Psychology is important, and providing a more easily comprehensible system might not necessarily satisfy some people, but it would probably be more enjoyable for most to know that when they are playing well, they can receive some sort of reward for it instead of feeling that they are permanently condemned to stay at a certain level.

Up to now, KGS has not had a serious rival. Now it does. Nobody can force KGS to change its systems, but it will certainly be interesting to see how people take to Kaya's rating system when Kaya goes public. I won't be surprised if there were far fewer complaints about it.

Re: KGS ranking revisited

Posted: Tue May 15, 2012 11:12 pm
by RobertJasiek
Mef wrote:as I understand it, your rank only gets heavy when no one else is watching.


No, it is almost always heavy, with these exceptions:

1) I have hardly played at all for about 3+ months. (Rare.)

2) On a very few days, a great winning percentage actually results in a significant increment. (The contrary is much more frequent: one or two bad nights almost invariably lead to a significant decrement.)

Re: KGS ranking revisited

Posted: Tue May 15, 2012 11:24 pm
by RobertJasiek
lemmata wrote:these numbers are just for fun


No, because it is no fun having to play a too great percentage of mismatched opponents.

Re: KGS ranking revisited

Posted: Wed May 16, 2012 12:01 am
by RobertJasiek
jts wrote:Mef, who did an excellent job on his analysis of supposedly "heavy" rankings.


April 2011:
47% but dramatic decrement. Was it one of the server adjustments? I do not recall.

May 2011:
61% but the rating dropped.

June 2011:
62% but the rating dropped.

July 2011:
62% and the rating went up dramatically. Not over a period but very suddenly.

Conclusion for April to July 2011:
- Great frustration: three months of rating development contrary to performance during that period.
- More great frustration for the sudden dramatic jump upwards in July. A jump totally inconsistent with the performance May to July. It would have made much more sense if the rating would have improved rather steadily during these three months.

August 2011:
51% but the rating goes upwards. Why has it not gone upwards more, earlier during May to July, when my winning percentages were significantly higher than in August?! This is frustrating again; the rating development defies performance.

Spetember to November 2011:
44%, 39%, 45% but the rating does not drop; instead the rating remains constant. This is frustrating. How can one be motivated to win more while even significantly below average results leave the rating constant?

December 2011:
71%, ok only 17 games, so not much data.

January 2012:
40%, 300 games. For the first time(!), the rating develops as it should: it drops. That it drops only ca. 15% of a rank shows just how very heavy rating changes are! This is frustrating again because 1) it promises just how difficult it will be to increase 15% of a rank and 2) watching some other players with rank changes by 2+ ranks due to just a few played games confirms that playing too many games is a punished.

Feburary + March 2012:
63%, 56 games. 75%, 12 games. This is one of the rare cases where the rating moves up dramatically. More great frustration! Frustration because I have no idea whatsoever why suddenly exceptionally KGS fulfils the player's dream. January with 40%, 300 games would have suggested something very different. The percentages from August to January also would have suggested something very different. Furthermore, the comparison with 62%, average 149 games during May to July 2011 with almost thrice as many games as February 2012 would have suggested that the rating development in Feburary + March 2012 could at most have the increment of May to July 2011; but now the rating increment is greater. Great frustration: three reasons why the new increment did not make sense.

April 2012:
2 games, constant rating. This is only the second month of 13 months with a reasonable rating development. Very sad: When I do not play, my rating development makes much more sense than when I play.

May 2012:
The greatest frustration of all: a manual server shift drops the rating by ca 35% of a rank.

Overall conclusion:
For by far most of the time, the rating developments create frustration instead of good meaning.

Note:
Remarks like "the system is good enough" while the most obviously it is bad multiply that frustration.