Mike Novack wrote:I suspect the subjective problem is unfamiliarity with statistics and the "scientific method".
First, I want to state that I consider this suspicion a bit offending and condescending, and apparently you are interpreting assumptions in my OP that are not written there.
First of all, I have no subjective problem. (Actually I doubt that I ever even had a 5 win streak...). I was merely trying to figure out why people complain about the rating system.
The
objective Problem (if you would consider that a problem) is the following:
(a) suppose you are correctly ranked at a 50% win rate
(b) You read a spectacular book, or attend a workshop, or there is any other singular event that leads you to believe that you have suddenly improved
(c) You go on a win streak of M games
(d) You make the null hypothesis "there was no improvement in rank"
Now: Probabilities are what is called "independent" I e. the probability of winning a game is not affected by the outcome of previous games. If it were, this type of calculation would not be applicable anyways. Because the probabilities are independent, whenever you start a series of games, the probability of winning M games in a row is 0.5^M
You might now define a threshold for that probability, say if it is less than 0.1%, I will reject the null hypothesis in favor of the alternative hypothesis ("There was a real improvement in rank").
Now to what I consider an objective problem: It will always take the same
number of games to come to that conclusion, independent of whether you have played 1000 games in the last month or 10 (because the probabilities are independent).
However: In the KGS ranking system, it does take a different number of games to get a promotion in dependence of how often you have played. (As was stated earlier: If you play at a constant rate, it will always take the same
time to get the promotion). So to rephrase a statement from my OP: If you play at a higher frequency, you need longer streaks to get promoted than if you play a lower frequency. That the frequency of the games affects how much each game “weighs” can be considered a flaw in the ranking system. (As I have learned in this thread is probably one that we have to live with because of the mentioned problem with asymmetric weights).
I would not go as far as to consider this “science”, but I do not see where this reasoning conflicts with or points to an unfamiliarity with either statistics or the scientific method.
Mike Novack wrote: a) Suppose in a prior period you won 50% of N games. You attend a workshop, study a book, etc. and presumably have improved. You now play a sequence of M games winning them all. Should your rank be upped to reflect that? (based upon M)
b) a) Suppose in a prior period you won 50% of N games ............................ .........................................You now play a sequence of M games winning them all. Should your rank be upped to reflect that? (based upon M)
"b" is the so called "null hypothesis" that the outcome was purely by random chance. Notice that if your rank is upped in case "b" that was the wrong thing to do.
The point I am trying to make here is that the lay person tends to grossly underestimate the size of M required to have it be unlikely that the observed outcome was pure chance. For example, suppose this class is attended by 32 people. More likely than not one of them would come home and win their next five games. That class really helped, didn't it. Nah, it was a class on baking.
DISCLAIMER: The following discussion has nothing to do with ranking systems, specifically because a ranking system would not be able to consider if someone has attended a class or read a book or had a sudden improvement for some other reason. Probably the only reason I make this discussion is because I feel accused of not understanding statistics…
The flaw with the previous reasoning in the calculation with the null hypothesis is to some extent that it doesn’t account for the fact that there was a class or a book into account. In fact any streak of M won games in a row is treated exactly the same. I assume that is what Mike wants to say with (a) and (b).
If we want to account for the fact that there was a singular event that may have led to someone becoming stronger, we have to introduce an additional assumption. For the sake of this discussion let’s assume:
(a) Previous to the workshop, a person A is correctly rated and plays at an average win rate of 50%
(b) Attending the workshop means a 15% chance to improve to an average win rate of 75% (I guess this is one rank). In other words: Out of 100 attendees 15 will improve to a 75% win rate and 85 attendees will stay at the 50% win rate.
(c) after that workshop, person A plays a consecutive streak of 5 wins in a row.
The interesting question is now: Based on the observation of 5 won games in a row, did person A actually improve one rank or not? The null hypothesis approach does not help here, but using Bayesian statistics, we can still come to a conclusion. There are 4 possible outcomes after (a) and (b).
1:Person A has improved and goes on a 5 game win streak, 2: Person A has improved and does not win 5 games in a row, 3: Person A has not improved and goes on a 5 game win streak, 4: Person A has not improved and does not win 5 games in a row.
Since all the probabilities are known, the probability for each outcome can be calculated, e.g. the probability for outcome 1 is 0.15 (chance to improve in the workshop) * 0.75^5 (probability to win five games in a row at a 75% average win rate) = 3.5%. Other probabilities can be calculated in a similar way, the probability that the person did not improve in the workshop and got a 5 game win streak is 2.7 % (of course there is a 82% probability for not improving and not getting a 5 game win streak).
What is important now: We have observed the outcome “five wins in a row”, which means under the given assumptions it is actually more likely that the person has really improved than not. And even though winning 5 games in a row is a rather common event that happens by chance in 3% of all cases, and even though it is unlikely to improve by attending the workshop, the person may still correctly feel that he should get a promotion. (Everyone: Please do not start a discussion if this should be reflected in a ranking system… Read above disclaimer first)