KGS ranking system

Mef · Post by **Mef** » Wed Dec 12, 2012 12:59 am

ez4u wrote: Download the CSV file for speedchase from the bottom of the KGS Analytics page and look at August 2011. It would be interesting to have some informed commentary on his promotion record throughout the summer and into the fall.

Well, this was an interesting one to look at...I think the short version of this story is an exercise in how you frame the circumstances (as is often true in anomalous cases). (=

For instance, you could say:

"There was an 18 game winning streak in August 2011 where speedchase was not promoted." (which is true)

Which seems like a pretty strong case for "stuck ranks" on KGS. Of course you could also say:

"Speedchase started July 2011 rated as 15k. He only won 62% of the 350 games he played over the next 2 months, yet by the third week in August he already had attained a 10k rating. After a handful of more games in September he was 9k." (also true)

Which makes it seem like there's quite a bit of room for mobility on KGS.

Of course the real story is probably somewhere in between.

Some of the explanation can be attributed to just be dumb luck. For instance, it is interesting to note that 5 of the players in speedchase's streak stopped playing in the month of August 2011 (or at least went on a 2 month hiatus of no playing). A couple of those had only just barely solidified their ratings. 3 of the remaining players had been demoted by the end of August (one player having a particularly rough time went from being 8k in July to 14k by September). For the interests of full disclosure, one of the players speedchase beat went on to be promoted prior to the end of August.

In the end it sounds like it was a perfect storm, you had someone who was rapidly increasing their rank (a known weak point for the system KGS employs), who was playing lots of games (thus partly discounting the significance of the streak), who was also playing lots of players (including those who didn't consistently play, or who didn't really have solid ranks). All in all though it settled out pretty quickly (In September/October speedchase had settled into 9k and was winning 55% of his games).

At the end of the day, if you have a sudden 3-6 stone shift, it might be reasonable to start a new account, but if you don't, don't be surprised if it takes 2 months to work it's way through the system*.

*This is assuming you continue playing default handicap games. You can expedite the process by playing games handicapped at a point where it "should" be.

EdLee · Post by **EdLee** » Wed Dec 12, 2012 5:52 am

Mike Novack wrote:That class really helped, didn't it. Nah, it was a class on baking.

Mike, I agree with your post.
Idle thoughts:

Mike Novack · Post by **Mike Novack** » Wed Dec 12, 2012 7:37 am

speedchase wrote:@Mike any length of wins COULD be explained by probability, but that doesn't mean they should. There are much more reasonable explinations.
edit: removed a mistake.

Yes of course. But I was trying to explain the subjective problem of gross underestimation of the probability "just chance" (and so no action justified). We should expect the rank to be adjusted when there is a reasonably low probability of "just chance". Not when the probability is still in the range of "as likely as not".

I was trying to explain why the improvement in results (percentage of wins) before rank adjustment took place was so much greater than what many people felt it should be. I was trying to explain that "feeling".

snorri · Post by **snorri** » Wed Dec 12, 2012 1:10 pm

Mef wrote:Some of the explanation can be attributed to just be dumb luck. For instance, it is interesting to note that 5 of the players in speedchase's streak stopped playing in the month of August 2011 (or at least went on a 2 month hiatus of no playing). A couple of those had only just barely solidified their ratings. 3 of the remaining players had been demoted by the end of August (one player having a particularly rough time went from being 8k in July to 14k by September). For the interests of full disclosure, one of the players speedchase beat went on to be promoted prior to the end of August.

Unfortunately, a true analysis does require looking at the whole system or at least the opponents as you have kindly done. We do this in real-life tournaments, too, at least in the AGA. When I lose a rated tournament game to a player who wins all other games, I don't feel so bad as if I lose against someone who has been losing a lot. In the first case, my rating will barely move.

Still, I wouldn't call it dumb luck. If speedchase was taking all comers, that's not dumb, that's just being a friendly player. If instead, speedchase decided to scruntize the rank graphs, game histories, and stats of all opponents in order to maximize rating movement, then maybe the results would be different. But maybe it would also have been harder to get in those 18 games to begin with. "Challenge? Hold on while I check out your games, search your twitter history, run a credit check, verify your Klout score... Okay, maybe I'll play you pending the results of the drug test...oh, wait, you took another game. Maybe next time!"

Mef · Post by **Mef** » Wed Dec 12, 2012 4:08 pm

snorri wrote: Still, I wouldn't call it dumb luck.

Well, in this case I meant "dumb luck" as in, no one was intentionally causing this, not in that anyone's actions were stupid. I simply was referring to the fact that it's not as if speedchase's opponent's were going out of their way to play him then not play anymore...and speedchase wasn't seeking out people like AzzBzz, who was a 13k for all of August except the brief period when speedchase played him as a 12k. It just so happens speed was playing a lot of different people at a time when he was rapidly improving...and y'know, sometimes things just happen...

hibbs · Post by **hibbs** » Wed Dec 12, 2012 4:41 pm

Mike Novack wrote:I suspect the subjective problem is unfamiliarity with statistics and the "scientific method".

First, I want to state that I consider this suspicion a bit offending and condescending, and apparently you are interpreting assumptions in my OP that are not written there.
First of all, I have no subjective problem. (Actually I doubt that I ever even had a 5 win streak...). I was merely trying to figure out why people complain about the rating system.

The objective Problem (if you would consider that a problem) is the following:
(a) suppose you are correctly ranked at a 50% win rate
(b) You read a spectacular book, or attend a workshop, or there is any other singular event that leads you to believe that you have suddenly improved
(c) You go on a win streak of M games
(d) You make the null hypothesis "there was no improvement in rank"

Now: Probabilities are what is called "independent" I e. the probability of winning a game is not affected by the outcome of previous games. If it were, this type of calculation would not be applicable anyways. Because the probabilities are independent, whenever you start a series of games, the probability of winning M games in a row is 0.5^M
You might now define a threshold for that probability, say if it is less than 0.1%, I will reject the null hypothesis in favor of the alternative hypothesis ("There was a real improvement in rank").

Now to what I consider an objective problem: It will always take the same number of games to come to that conclusion, independent of whether you have played 1000 games in the last month or 10 (because the probabilities are independent).

However: In the KGS ranking system, it does take a different number of games to get a promotion in dependence of how often you have played. (As was stated earlier: If you play at a constant rate, it will always take the same time to get the promotion). So to rephrase a statement from my OP: If you play at a higher frequency, you need longer streaks to get promoted than if you play a lower frequency. That the frequency of the games affects how much each game “weighs” can be considered a flaw in the ranking system. (As I have learned in this thread is probably one that we have to live with because of the mentioned problem with asymmetric weights).

I would not go as far as to consider this “science”, but I do not see where this reasoning conflicts with or points to an unfamiliarity with either statistics or the scientific method.

Mike Novack wrote: a) Suppose in a prior period you won 50% of N games. You attend a workshop, study a book, etc. and presumably have improved. You now play a sequence of M games winning them all. Should your rank be upped to reflect that? (based upon M)

b) a) Suppose in a prior period you won 50% of N games ............................ .........................................You now play a sequence of M games winning them all. Should your rank be upped to reflect that? (based upon M)
"b" is the so called "null hypothesis" that the outcome was purely by random chance. Notice that if your rank is upped in case "b" that was the wrong thing to do.
The point I am trying to make here is that the lay person tends to grossly underestimate the size of M required to have it be unlikely that the observed outcome was pure chance. For example, suppose this class is attended by 32 people. More likely than not one of them would come home and win their next five games. That class really helped, didn't it. Nah, it was a class on baking.

DISCLAIMER: The following discussion has nothing to do with ranking systems, specifically because a ranking system would not be able to consider if someone has attended a class or read a book or had a sudden improvement for some other reason. Probably the only reason I make this discussion is because I feel accused of not understanding statistics…

The flaw with the previous reasoning in the calculation with the null hypothesis is to some extent that it doesn’t account for the fact that there was a class or a book into account. In fact any streak of M won games in a row is treated exactly the same. I assume that is what Mike wants to say with (a) and (b).

If we want to account for the fact that there was a singular event that may have led to someone becoming stronger, we have to introduce an additional assumption. For the sake of this discussion let’s assume:
(a) Previous to the workshop, a person A is correctly rated and plays at an average win rate of 50%
(b) Attending the workshop means a 15% chance to improve to an average win rate of 75% (I guess this is one rank). In other words: Out of 100 attendees 15 will improve to a 75% win rate and 85 attendees will stay at the 50% win rate.
(c) after that workshop, person A plays a consecutive streak of 5 wins in a row.

The interesting question is now: Based on the observation of 5 won games in a row, did person A actually improve one rank or not? The null hypothesis approach does not help here, but using Bayesian statistics, we can still come to a conclusion. There are 4 possible outcomes after (a) and (b).
1:Person A has improved and goes on a 5 game win streak, 2: Person A has improved and does not win 5 games in a row, 3: Person A has not improved and goes on a 5 game win streak, 4: Person A has not improved and does not win 5 games in a row.

Since all the probabilities are known, the probability for each outcome can be calculated, e.g. the probability for outcome 1 is 0.15 (chance to improve in the workshop) * 0.75^5 (probability to win five games in a row at a 75% average win rate) = 3.5%. Other probabilities can be calculated in a similar way, the probability that the person did not improve in the workshop and got a 5 game win streak is 2.7 % (of course there is a 82% probability for not improving and not getting a 5 game win streak).

What is important now: We have observed the outcome “five wins in a row”, which means under the given assumptions it is actually more likely that the person has really improved than not. And even though winning 5 games in a row is a rather common event that happens by chance in 3% of all cases, and even though it is unlikely to improve by attending the workshop, the person may still correctly feel that he should get a promotion. (Everyone: Please do not start a discussion if this should be reflected in a ranking system… Read above disclaimer first)

Boidhre · Post by **Boidhre** » Wed Dec 12, 2012 4:49 pm

hibbs: Probabilities most likely aren't independent when it comes to a series of go game results for most people given how human psychology makes a difference on the board. If it were bots playing each other you'd be correct and the probabilities would be independent. I'd be very surprised if someone on a winning streak was not more likely to win their next game than someone on a losing streak.

Mef · Post by **Mef** » Wed Dec 12, 2012 5:11 pm

Boidhre wrote:hibbs: Probabilities most likely aren't independent when it comes to a series of go game results for most people given how human psychology makes a difference on the board. If it were bots playing each other you'd be correct and the probabilities would be independent. I'd be very surprised if someone on a winning streak was not more likely to win their next game than someone on a losing streak.

I would take this one step further (especially for games played on the same day). A person on a winning streak is likely well-rested, not distracted, not hungry, etc (i.e. closer to their peak playing condition, playing at a stronger level even outside of psychology). The same person on a losing streak is more likely tired, nervous, angry/frustrated, thinking more about the problems they had at work that day, and so on (i.e. playing below at a average strength). Of course once the streak starts, the psychological feedback loop you mention is probably only going to amplify whatever effect is already being observed.

daal · Post by **daal** » Thu Dec 13, 2012 12:31 am

hibbs wrote:
Mike Novack wrote:I suspect the subjective problem is unfamiliarity with statistics and the "scientific method".
First, I want to state that I consider this suspicion a bit offending and condescending, and apparently you are interpreting assumptions in my OP that are not written there.
First of all, I have no subjective problem. (Actually I doubt that I ever even had a 5 win streak...). I was merely trying to figure out why people complain about the rating system.

I know practically nothing about statistics, but I do know something about being offended, and I think you have little reason to be. You write that you are trying to figure out why people complain about the rating system, and Mike offered an explanation. A reasonable one at that. It doesn't imply that you don't understand statistics, but rather that of the people who have issues with the kgs rating system, some of them (statistically: not all of them you) don't understand statistics or scientific method and are therefore basing their criticisms on subjective impressions that are not supported by a more rigorous analysis.

hibbs · Post by **hibbs** » Thu Dec 13, 2012 2:04 am

daal wrote:I know practically nothing about statistics, but I do know something about being offended, and I think you have little reason to be. You write that you are trying to figure out why people complain about the rating system, and Mike offered an explanation. A reasonable one at that. It doesn't imply that you don't understand statistics, but rather that of the people who have issues with the kgs rating system, some of them (statistically: not all of them you) don't understand statistics or scientific method and are therefore basing their criticisms on subjective impressions that are not supported by a more rigorous analysis.

Actually, seen from this point of view you might be right, and I should not feel offended. What actually happened is: I am scientist and although I am not a trained statistician, statistics is a major part of my daily work. On occasion I teach statistics and I have a few publications on the application of certain statistical methods in my particular field. As a little side note, more as a hobby, I am in an organization that specifically deals with the demarcation of science vs. pseudoscience. Therefore, I believe I dealt more with the concept of scientific methodology than an average scientist.
I actually interpreted that mentioned post as a direct reply to my OP, with the implication that I have a subjective problem and did not understand statistics ar scientific methodology, so that one sentence would actually challenge almost every bit of my professional self-image. If that interpretation was actually incorrect, I dutifully apologize for my statement about the menstioned post being offensive and condescending.

hibbs · Post by **hibbs** » Thu Dec 13, 2012 2:43 am

Mef wrote:
Boidhre wrote:hibbs: Probabilities most likely aren't independent when it comes to a series of go game results for most people given how human psychology makes a difference on the board. If it were bots playing each other you'd be correct and the probabilities would be independent. I'd be very surprised if someone on a winning streak was not more likely to win their next game than someone on a losing streak.

I would take this one step further (especially for games played on the same day). A person on a winning streak is likely well-rested, not distracted, not hungry, etc (i.e. closer to their peak playing condition, playing at a stronger level even outside of psychology). The same person on a losing streak is more likely tired, nervous, angry/frustrated, thinking more about the problems they had at work that day, and so on (i.e. playing below at a average strength). Of course once the streak starts, the psychological feedback loop you mention is probably only going to amplify whatever effect is already being observed.

First of all, the statistical independence is a necessary assumption for the various calculations to be meaningful (As I wrote, otherwise these calculations would not be valid).

The question you bring up here is an interesting one: Is this assumption also true in reality?
Against it, you can argue as you do: That undoubdetly in all kinds of sports psychology plays a role, and that feeling stronger actually makes you stonger, sou would have a positive feedback loop out of winning streaks.

In favor of it: There are countless examples where people believe in such patterns, and they all don't hold up against a critical look at the data. I recently read a nice publication where all soccer matches since the introduction of the German Bundesliga were analyzed. It turns out: All winning streaks (better: the frequency of winning or loosing streaks) are in perfect agreement with pure chance. A side note: Most people believe that in a game with many goals, it would be more likely that the team scores another one, because the team has a run or plays in temporary perfect rapport. If you wath such a game it seems to be totally obvious. However: The frequencies of goals scored in a game is in total agreement with the assumption of scoring a goal is an independent random event. (It only deviates from the statistical Poisson distribution for games that with games that end 0:0 or 1:0). Of course the outcome of soccer games is not entirely random, because there are better teams that have an overall higher baseline frequency of scoring or winning, but that can be properly modeled.
Also in baseball there is the common and intuitive phenomenon that a batter is on a streak. As someone has mentioned earlier i this thread, this is also a myth that was debunked (there is no positive feedback from a winning streak).
Also more general: Humans are masters of pattern recognition even to a point of seeing patterns where is only randomness (look at the stars, for example). Someone who is not really trained at this will usually see meaningful patterns in random events, or if he sees something that looks random totally overlook that this is actually a pattern. Most untrained people underestimate the statistical frequency of winning or looking streaks and beleive they see something real. So most often things like "I must have been tired" are in fact a post-hoc rationalization of a random event.

For all these reasons, my intital guess would be that the outcomes of games are indeed independent from previous results.

Who would be right? There is no reason to argue about it, all that is needed to have look at the data: Find a few players who have played many rated games at the proper handicap without improving in that period. Check the frequencies of streaks (how often they won 2,3,4,... games in a row), compare that frequency with what would be expected statistically. I would be surprised if no one did this before, but I feel tmepted now to have a look if I have some spare time.

I would like to give one possible explanation why the outcomes could indeed be random in spite of the psychological effects: If you feel stronger because of a random streak, you are likely to play a bit more agressive and therefore more likely to make some overplays that occasionally get punished. So feeling stronger man not necessarily correlate with actually playing stronger.

Mef · Post by **Mef** » Thu Dec 13, 2012 3:26 am

hibbs wrote: First of all, the statistical independence is a necessary assumption for the various calculations to be meaningful (As I wrote, otherwise these calculations would not be valid).
*Snip the rest of the post for length*

I agree with much of what you say, and also agree that, for the most part, each individual game in a series is essentially a random event. I would still contend that there are not non-random factors that can increase the likelihood of a winning or losing streak. Once again, you might have a person who is tired, sick or nervous playing a lot of games in a row on a given day, and they are not at their peak performance (hence have a greater than normal chance of losing their games).

While you point to data from sports about "streakiness" (which is in and of itself true), there is also precedent for situations where non-random streaks do exist due to external conditions. One example would be pitcher performance prior to being placed on the disabled list. During the onset of an injury a pitcher will generally see a drop in their fastball velocity, greater variation in their release point, and other things injury-related things which often manifests itself in a "bad streak" shortly before they take time to have surgery, recover, etc. These are cases of legitimate "losing streaks" so to speak.

Of course you could say that even in these cases your outcomes are still independent events, it's merely the expected value has shifted prior to the streak due to an external condition. When you are coming from the perspective of estimating the expected value though, playing especially well on one day and especially poorly on another would look just about identical to the "psychology of streaks" so to speak.

At the end of the day though, I'm with you, I'd prefer to see someone dig into some data and see if there's anything worthwhile there.

Mef · Post by **Mef** » Thu Dec 13, 2012 5:19 am

Mef wrote: At the end of the day though, I'm with you, I'd prefer to see someone dig into some data and see if there's anything worthwhile there.

All right...since KGS analytics just spits out a CSV with all the game results....and I ended up having a bit of free time...I made a quick and dirty excel macro that analyzed streaks in game histories. I looked at 3 players who I like to use for KGS statistical data because A: Their ratings are fairly consistent, B: They play a ton of games, and C: They are fairly recognizable KGS personalities, here are my results:

Streak = 3 games

Streak= 4 games

Streak=5 Games

I need to go to sleep now, but later today I'll try to double check my script and make sure there's no glaring errors. Also I may rework it to try and test my "Good days / bad days" theory.

Mike Novack · Post by **Mike Novack** » Thu Dec 13, 2012 7:52 am

hibbs wrote: Since all the probabilities are known, the probability for each outcome can be calculated, e.g. the probability for outcome 1 is 0.15 (chance to improve in the workshop) * 0.75^5 (probability to win five games in a row at a 75% average win rate) = 3.5%. Other probabilities can be calculated in a similar way, the probability that the person did not improve in the workshop and got a 5 game win streak is 2.7 %

What is important now: We have observed the outcome “five wins in a row”, which means under the given assumptions it is actually more likely that the person has really improved than not. And even though winning 5 games in a row is a rather common event that happens by chance in 3% of all cases, and even though it is unlikely to improve by attending the workshop, the person may still correctly feel that he should get a promotion. (Everyone: Please do not start a discussion if this should be reflected in a ranking system… Read above disclaimer first)

I think that is perhaps the crux of the disagreement. Possibly related to the usual and customary certainties expected before "publication" in the different science. Yes, .56 (3.5/6.2) is greater than .44 (2.7/6.2) but not a whole lot greater. If the system gave promotions based upon attending this class and then having a five game winning streak to 100 mythical players would have been correct to do so 56 times and incorrect to do so 44 times. That's a pretty bad "error rate". The calculation might be redone to determine what lengths of streaks would have been necessary to get the error rate down to below 10%, below 5%, etc.

Boidhre · Post by **Boidhre** » Thu Dec 13, 2012 8:27 am

hibbs wrote:First of all, the statistical independence is a necessary assumption for the various calculations to be meaningful (As I wrote, otherwise these calculations would not be valid).

There's plenty of maths out there for dealing with non-independent events statistically. I've forgotten most/all of it since college since I no longer work with it, but assuming non-independent events to be independent just so you can use a linear regression or whatever just gives you misleading results.

Life In 19x19

KGS ranking system

Re: KGS ranking system

Re: KGS ranking system

Re: KGS ranking system

Re: KGS ranking system

Re: KGS ranking system

Re: KGS ranking system

Re: KGS ranking system

Re: KGS ranking system

Re: KGS ranking system

Re: KGS ranking system

Re: KGS ranking system

Re: KGS ranking system

Re: KGS ranking system

Re: KGS ranking system