KGS ranking revisited

HermanHiddema · Post by **HermanHiddema** » Tue May 15, 2012 6:31 am

Tami wrote:
HermanHiddema wrote:And it doesn't matter anyway. Your playing strength includes the way you play when tired and when your opponent is tired. It includes stupid blunders and brilliant tesuji. It includes your whole game, not just the parts you like. Many players seem to have this notion of their "real strength", which is usually how strong they would be if they removed all the parts of their play they don't like. And it is nonsense. Being able to play well even when you are tired, even when you in byoyomi, even when the game is decisive to win a large prize, all of that is part of your playing strength.I know several players that are stronger than myself not because they read deeper, or know more about the game, but because they are able to play more consistently, because they have more stamina, because they never give up, and will grab any chance I give them. That does not mean I am "actually just as strong". No, those players are stronger than me. They win more games. If you win more of your games against the same opponents, you are stronger, it is as simple as that. No excuses, no illusions.
There`s a flaw with this argument too.

What if you DO get stronger? Is the past still relevant?

If some 70-year-old retiree, let`s call him Trevor, who`s been 5k all his life became stronger in his new-found free time, would you insist on basing his rank on all the games he ever played, from up to 60+ years before his improvement? He might be better at go now than his friends Bob and Sosuke...oh, but by your reckoning Bob and Sosuke are stronger because they have won several thousand more games than Trevor. Does Trevor have to win several thousand more games from this point to prove that he has improved?

People CHANGE. When people change, this should be recognised.

An ideal rating system would enable people to rank up on the basis of good results or rank down on bad ones. Exactly how easy or difficult changing rank should be is admittedly hard to determine, but my opinion is KGS errs too far on the conservative side. You real level would be shown by the level you can maintain, not by your peaks or troughs.

Where did I say that all his old games should be relevant?

Of course your playing strength can change. And of course your rating should be based on more recent data. I would consider that entirely obvious.

All I am arguing against is people who want their "tired" games to somehow not count.

hyperpape · Post by **hyperpape** » Tue May 15, 2012 6:35 am

HermanHiddema wrote:Many players seem to have this notion of their "real strength", which is usually how strong they would be if they removed all the parts of their play they don't like. And it is nonsense.

While we're coming at this from similar points of view, I differ with this point. It is not entirely meaningless. It's just impossible to systematically test (this may be my favorite philosophical distinction, fwiw).

Sometimes we see a player who is almost perfect in most regards, but also has a glaring flaw. We recognize that he's better than his performance in the sense that if he fixes that flaw he will be exceptional. And it does matter in one sense. Suppose A is exceptionally talented, but is also an alcoholic, while B is less talented, but is completely disciplined. A wins when he's been sober, but loses more often because of his personal problems. But if A ever gets his act together, B will never match him, no matter how hard he works.

Less extreme examples are common, and while they're prone to bias and wishful thinking, that doesn't mean they're not sometimes real. They're just not the business of a ratings system.

RobertJasiek · Post by **RobertJasiek** » Tue May 15, 2012 6:37 am

HermanHiddema wrote:Which is impossible, as you cannot know whether your opponent was tired or not.

I can often see this from his moves, as I can see it from my moves years after I played a game.

Your playing strength includes the way you play when tired and when your opponent is tired.

This is a bad excuse for a bad rating system. First it forces players to become tired, then it blames them for losing when tired.

A rating system is not responsible for a player being tired by solely his own responsibility, but a rating system is responsible for making a player tired in more games.

It includes stupid blunders and brilliant tesuji. It includes your whole game, not just the parts you like.

There are three factors for blunders in server games:
1) blunder due to playing strength
2) blunder due to short thinking time
3) blunder due to the rating system having caused tiredness or frustration

Roughly guessed for my games and per game on average, (1) is below 5%, (2) is about 50%, (3) is about 45%. (During the night, (3) can easily dominate (2).)

Many players seem to have this notion of their "real strength", which is usually how strong they would be if they removed all the parts of their play they don't like. And it is nonsense.

It is not nonsense that (3) exists.

Being able to play well even when you are tired, even when you in byoyomi, even when the game is decisive to win a large prize, all of that is part of your playing strength.

Nobody doubts this. It does not remove (3) though.

I know several players that are stronger than myself not because they read deeper, or know more about the game, but because they are able to play more consistently, because they have more stamina, because they never give up, and will grab any chance I give them. That does not mean I am "actually just as strong". No, those players are stronger than me. They win more games.

Nobody doubts this.

If you win more of your games against the same opponents, you are stronger, it is as simple as that.

Wrong. It is NOT as simple as that. (3) exists. Make the KGS rating system better and I will have a better rating.

No excuses, no illusions.

(3) is not an excuse or illusion but a fact.

RobertJasiek · Post by **RobertJasiek** » Tue May 15, 2012 6:44 am

hyperpape wrote:it just so happens that we have a wonderful french mathematician who's tested these things with statistics.

So what. Has he identified and suggested solutions for players affected by the too static rating system?

RobertJasiek · Post by **RobertJasiek** » Tue May 15, 2012 6:45 am

HermanHiddema wrote:All I am arguing against is people who want their "tired" games to somehow not count.

I want a rating system good enough for not creating players to play more tired games than they would if the system were good.

HermanHiddema · Post by **HermanHiddema** » Tue May 15, 2012 7:32 am

hyperpape wrote:
HermanHiddema wrote:Many players seem to have this notion of their "real strength", which is usually how strong they would be if they removed all the parts of their play they don't like. And it is nonsense.
While we're coming at this from similar points of view, I differ with this point. It is not entirely meaningless. It's just impossible to systematically test (this may be my favorite philosophical distinction, fwiw).

Sometimes we see a player who is almost perfect in most regards, but also has a glaring flaw. We recognize that he's better than his performance in the sense that if he fixes that flaw he will be exceptional. And it does matter in one sense. Suppose A is exceptionally talented, but is also an alcoholic, while B is less talented, but is completely disciplined. A wins when he's been sober, but loses more often because of his personal problems. But if A ever gets his act together, B will never match him, no matter how hard he works.

Less extreme examples are common, and while they're prone to bias and wishful thinking, that doesn't mean they're not sometimes real. They're just not the business of a ratings system.

Can't say I really think much of the concept of "unmeasurable but real". And anyway, the example you give is easily measurable. A player's performance could be approximated, statistically, as a distribution. In the example, player A and B have the same mean performance, but player A has a higher standard deviation.

But, until a player fixes that "flaw", I do not think that he is stronger. Flaws are part of the deal. You could say he has a higher peak performance, but that is it.

hyperpape · Post by **hyperpape** » Tue May 15, 2012 7:50 am

HermanHiddema wrote:Can't say I really think much of the concept of "unmeasurable but real".

Well, a flip answer would be to introduce you to my friend Gödel...A less flip answer would be that there are obvious cases of meaningful statements that can't be tested in any practical way. There is a fact of the matter of how many times the letter 't' has been printed in books since time began. There is obviously no practical way to test this.

And anyway, the example you give is easily measurable. A player's performance could be approximated, statistically, as a distribution. In the example, player A and B have the same mean performance, but player A has a higher standard deviation.

Is this right? Couldn't he have a smaller standard deviation but a larger range--perhaps he is hung-over 95% of the time and plays terribly all of that time?

But, until a player fixes that "flaw", I do not think that he is stronger. Flaws are part of the deal. You could say he has a higher peak performance, but that is it.

I do not know what to make of the semantics of "stronger". I do agree a rating system should not try to gauge these facts. Nonetheless, they are important dimensions of evaluating a player. That is the claim that I would like to insist upon.

Edit: removed a dangling [/quote] tag.

jts · Post by **jts** » Tue May 15, 2012 8:15 am

hyperpape wrote:
HermanHiddema wrote:Can't say I really think much of the concept of "unmeasurable but real".
Well, a flip answer would be to introduce you to my friend Gödel...A less flip answer would be that there are obvious cases of meaningful statements that can't be tested in any practical way. There is a fact of the matter of how many times the letter 't' has been printed in books since time began. There is obviously no practical way to test this.

Would you also say that there is just a counter-factual fact about the world, to wit, the number of times 't' would have been printed in books since time began if Cleopatra had had a snub nose? Being a realist about unmeasurable facts is one thing, but being a realist about unmeasurable counterfactuals is quite another.

hyperpape wrote:
But, until a player fixes that "flaw", I do not think that he is stronger. Flaws are part of the deal. You could say he has a higher peak performance, but that is it.
I do not know what to make of the semantics of "stronger". I do agree a rating system should not try to gauge these facts. Nonetheless, they are important dimensions of evaluating a player. That is the claim that I would like to insist upon.

Yes, this may make sense.
"He has a thoughtless fuseki but he finds incredible tesuji in the middlegame."
"Even when he loses all the fighting, he can usually count on making up twenty points in the endgame."
"He memorized all of Dosaku's games and always has beautiful shape, but I wish he would stop to think more at critical junctures; he often makes careless blunders."
"Somehow he has an encyclopedic knowledge of joseki, but when he plays he's usually drunk, or hung over, or both."

These are all ways to qualitatively describe someone's playing strength, but I think what Herman is saying that whenever he hears someone say "KGS can't measure my real strength because _______" they somehow think that a problem like erratic play or poor use of time or alcoholism can be fixed by an accurate rating system. It would be just as sensible (i.e., not very) to ask for a rating system, or a concept of "real strength", that abstracted away from weak reading, or sloppy endgame, or poor joseki choice.

hyperpape · Post by **hyperpape** » Tue May 15, 2012 9:05 am

jts wrote:
hyperpape wrote:
HermanHiddema wrote:Can't say I really think much of the concept of "unmeasurable but real".
Well, a flip answer would be to introduce you to my friend Gödel...A less flip answer would be that there are obvious cases of meaningful statements that can't be tested in any practical way. There is a fact of the matter of how many times the letter 't' has been printed in books since time began. There is obviously no practical way to test this.
Would you also say that there is just a counter-factual fact about the world, to wit, the number of times 't' would have been printed in books since time began if Cleopatra had had a snub nose? Being a realist about unmeasurable facts is one thing, but being a realist about unmeasurable counterfactuals is quite another.

Ehhh...I'm inclined to hem and haw. I think your example is a clear one that shouldn't have an answer, but I don't know whether "unmeasurable counterfactual" is really the category that explains why. I don't really have a theory of any of this stuff.

jts wrote:I think what Herman is saying that whenever he hears someone say "KGS can't measure my real strength because _______" they somehow think that a problem like erratic play or poor use of time or alcoholism can be fixed by an accurate rating system. It would be just as sensible (i.e., not very) to ask for a rating system, or a concept of "real strength", that abstracted away from weak reading, or sloppy endgame, or poor joseki choice.

I think this is common ground to the three of us. My point was really about meaninglessness of those questions. It may have been a distraction, but my thought was that by overreaching, and saying it's meaningless, you may confuse the issue. The real point is that meaningless or not, these questions aren't the proper ones for a rating system.

Mef · Post by **Mef** » Tue May 15, 2012 10:13 am

Tami wrote:
I think that hits the nail on the head.

The KGS may be accurate for most players most of the time, but it seems to be based on the assumption that nobody ever improves. Once you have a stable rank, then it becomes extremely hard to change it, no matter how much you win or lose. And, I kind of agree with Robert Jasiek here, it is much easier to play worse than your mark than up to it because on more than one occasion I have been close to a promotion, lost a crucial game and then gone on to lose a string on games out of sheer frustration. I`m sure that experience is not unique. If only it wasn`t quite so like climbing a greasy pole, maybe not so many players would go on tilt so often.

The latest adjustment, the downward one that prompted this thread, came as a nasty surprise - I had been nursing my main account, the heavy one, toward 1d by steadfastly resisting tilty emotions whenever I did lose, and the adjustment undid all that. It also brought my 1d account temporarily back to 1k.

And, yes, rank and ratings graph are important to me. I have been putting effort into improving my go, and I was using these things to measure my progress. Maybe I have little talent for the game and I am only improving in small steps, but I still like to see my graph go upwards over the passing months.

For sure, I totally get it that the system is not intended for providing feedback on players` progress, but only for making a roughly 50-50 win/lose balance. Could it not be though that the 50-50 balance is merely an illusion of a mirage? If, in fact, there are many, many players of different strengths crammed into a small ratings band because of heaviness, then might not their mutual scores tend to even out over time, thereby giving the false impression of accuracy? (Strong 3k beats weak 3k, but weak 3k wins against weak 1k, who then narrowly beats strong 2k, who beats strong 3k, who goes on tilt and loses to weak 3k).

Still, if it`s never going to change, then that`s just too bad. At least it's still fun to play free games and watch broadcasts.

When I get the chance, I like to clear up misconceptions in threads like these. In an earlier thread I've already addressed some of Robert's concerns of heaviness.

Cliff notes:

I don't have the time to do that sort of thing again (I figure that if someone playing 10 games/day for a year in one of the most stable rank ranges on KGS has no issue moving their rank when it is appropriate, the majority of us should be ok). What I have done though is attach an image, assuming that your Universal go server handle is the one you use on KGS:

It shows your rank graph as well as your winning percentage in rated games and total games played. It is broken down month by month with cumulative totals at the bottom. For reference, I added a horizontal line at 1.5k. I used August 1, 2011 as the start of the dataset and April 30, 2012 as the end. I'll let the L19 community decide if they feel the KGS server is doing a good job of handling these results or if there is an issue with "heaviness", or a bias of losses being weight more than wins.

Regarding the idea of rank "crunching" where you have several ranks in one band. This type of phenomenon is actually something the study linked yoyoma posted would have detected (unless all rank systems tested suffer from it). If you have KGS grouping people together they shouldn't be, then a superior system would be able to identify these mismatches and easily predict the winner in an even game (a full 1 stone difference should represent something like a 75-80% chance of winning for the stronger player in an even game). In the aggregate, the superior system would be able to correctly predict a much larger percentage of even game winners than KGS.

Another way to check for this "crunching" would be to look at winning percentages in handicap games. If players within a band actually represent a several stone span, then when you play someone outside the band there is a good chance there is a strength difference that is not being compensated for by the handicap. The result is that the player with black should have a severe disadvantage in these games (note: handicap games on KGS are already a half stone under-handicapped by default, but now they would be even more so.). If 3k was really "2k-3k-4k" and 4k was really "4k-5k-6k", and 5k was really "6k-7k-8k", it would mean your average 2 stone game should really be a 4 stone game, and black would be in trouble. In the end, you'd expect something like less than 20% of handicap games being won by black.

RobertJasiek · Post by **RobertJasiek** » Tue May 15, 2012 12:57 pm

Mef wrote:In an earlier thread I've already addressed some of Robert's concerns of heaviness.

On which I have aleady answered:
http://www.lifein19x19.com/forum/viewto ... 558#p85558

if you actually look at the data you would see that even as a 4d playing more than 3000 games per year there was not any "heaviness" in his rank. It was still capable of moving 1/3 to 1/2 a stone in less than a week, provided he continued playing games at a similar rate as he had been. The lack of movement was not related to any heaviness in his account, but actually due to the rarity of any streaks where he had a significant deviation from a 45-55% win rate).

Which player does this refer to, when where those 1/3 to 1/2 jumps in less than a week, and are you sure that they were not manual KGS rating shifts?

Rank-Case-Study-2.JPG [...]
if there is an issue with "heaviness"

That KGS player is hardly interesting as a study case for heaviness because she played rather few games in comparison to those players really suffering from heaviness.

averell · Post by **averell** » Tue May 15, 2012 1:23 pm

Mef wrote:[...] (a full 1 stone difference should represent something like a 75-80% chance of winning for the stronger player in an even game). [...]

Is that an assumption, or does that actually hold statistically? Because for example in EGF ranks 1k-1d is still a 40% win chance, and to get below 25 you have to be 6d. Now it could be the KGS ranks are further apart, but i seriously doubt it.

Mef wrote: [...] In the end, you'd expect something like less than 20% of handicap games being won by black.

Same thing here basically.

Another thing that keeps happening is people making new accounts, because it's apparently easy to get a solidly ranked account up to 4 stones higher than your "actual strength" with just a lucky (time-)win and then lose to make it solid. Quite a few of "fake" 8/9ds keep cropping up.

Overall i would agree that the KGS rating system does a "good enough" job however, because individual playing strength will vary over 2 stones easily for 80% of the KGS population from one day to the next, and the current very inaccurate average prediction is the best you can do based on game results only.

stalkor · Post by **stalkor** » Tue May 15, 2012 1:31 pm

I guess like most things "haters gotta hate"

I've been following this thread and it got from a discussion right into kicking something as much and as hard as you can but the thing i'm missing is probably the most important thing, a solution!

Instead of repeating what has been stated in several other threads, come up with ideas and mathematical backup for it.

One of my ideas (of which i dont have any facts to back up) is for example to shorten the history taken into account from 6 to 5 or 4 months, this can possibly result in a less "rigid" rank.

Also to help players understand how much a win or loss is worth i would like to see an addition in the games tab list where its stated how much that game made your rank shift up or down (if im not mistaken a rank is a number so it could be done).

For example +0.20 or -0.33

This will create the opportunity to see the weight of each game, not the increment after a day.

Mef · Post by **Mef** » Tue May 15, 2012 2:02 pm

RobertJasiek wrote:
Mef wrote:In an earlier thread I've already addressed some of Robert's concerns of heaviness.
On which I have aleady answered:
http://www.lifein19x19.com/forum/viewto ... 558#p85558

Yes, as I understand it, your rank only gets heavy when no one else is watching.

That KGS player is hardly interesting as a study case for heaviness because she played rather few games in comparison to those players really suffering from heaviness.

I will save myself a thousand words.

Mef · Post by **Mef** » Tue May 15, 2012 2:25 pm

averell wrote: Is that an assumption, or does that actually hold statistically? Because for example in EGF ranks 1k-1d is still a 40% win chance, and to get below 25 you have to be 6d. Now it could be the KGS ranks are further apart, but i seriously doubt it.

Another thing that keeps happening is people making new accounts, because it's apparently easy to get a solidly ranked account up to 4 stones higher than your "actual strength" with just a lucky (time-)win and then lose to make it solid. Quite a few of "fake" 8/9ds keep cropping up.

Overall i would agree that the KGS rating system does a "good enough" job however, because individual playing strength will vary over 2 stones easily for 80% of the KGS population from one day to the next, and the current very inaccurate average prediction is the best you can do based on game results only.

It's been a long time since I've looked into calculating the exact values (or even known exactly what constants KGS uses). From my interpretation of the EGF ratings page a 1d vs. a 1k in an even game is between 70% and 75% expected win for the 1d (depending one what you call 1d...Table II lists GoR 2000 as 27.8% chance of beating a 2100 ). At any rate, I'm not too worried about quibbling over exact percentages for each system, I imagine it would depend on how well you wish your system to scale for handicaps and a variety of other factors.

The point I was trying to make is that if there are the system anomalies that some suspect, there should be some easily measurable side effects we could predict and identify. I guess at the end of the day, when possible I'd much prefer hard facts to sneaking suspicions and speculation...but then again, I am from Missouri...

Life In 19x19

KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited

Re: KGS ranking revisited