Whole History Rating

Harleqin · **#21**

It seems to me that Robert's fears are quite inspecific. I can see how they apply to the current systems, but I do not see them as an impediment to studying better ones.

Of course, looking how the new system handles sparse local data is an important subtopic.

Perhaps we could now have a more detailed look at the paper and get an impression on the different parts of the system and how they fit together.

Sverre · **#22**

RobertJasiek wrote:

Sverre, there are these possibilities: a) yes, b) give them pseudo-ratings that are shown for their pleasure but otherwise ignored, c) use a rating system that calulates only local ratings anyway.

OK, could you give some precise numbers on for example minimal number of rated games per year, or objective criteria for when one is in an "isolated subpopulation"? And also an estimate on what percentage of players would be booted from the rating system under these criteria?

topazg · **#23**

Sverre wrote:

RobertJasiek wrote:

Sverre, there are these possibilities: a) yes, b) give them pseudo-ratings that are shown for their pleasure but otherwise ignored, c) use a rating system that calulates only local ratings anyway.

OK, could you give some precise numbers on for example minimal number of rated games per year, or objective criteria for when one is in an "isolated subpopulation"? And also an estimate on what percentage of players would be booted from the rating system under these criteria?

Well, I'm guessing the 5 or 6 games I play a year would probably get me booted anyway ..

pwaldron · **#24**

RobertJasiek wrote:

pwaldron, maybe in theory there are more information is better theorems but currently rating systems are so far from perfect that a more modest approach makes it easier to design better systems. When we will have them, one can still come back to the low confidence sparse data noise and see if one can explain them already well.

Robert, chant ten times: It is always better to have more information.

The very worst you can have is a game prediction algorithm that flips a coin to predict the winner. Every additional game adds more information, and it cannot make a system less accurate. Some game results are more useful than others in pinning down ratings, but they all have value and it is foolish to throw any away. If the information is not useful then it does little to reduce the uncertainty in the resulting estimated parameters (i.e., ratings) but it's never worse to have the information than not to have it.

RobertJasiek · **#25**

Sverre wrote:

OK, could you give some precise numbers on for example minimal number of rated games per year, or objective criteria for when one is in an "isolated subpopulation"? And also an estimate on what percentage of players would be booted from the rating system under these criteria?

No. One would have to think about it to set useful values. I have wanted to encourage such thinking; I have not carried it out in detail myself.

RobertJasiek · **#26**

pwaldron wrote:

Robert, chant ten times: It is always better to have more information.

My first statistics book had a nice example: Estimate the distance between two towns. First you take a rough look: "The next town is about 10km afar." The you measure your town's mediaeval wall: "It is 30cm thick". Now you conclude: "The distance is 10km + 30cm = 10.0003km."

Likewise if you have two isolated players who claim to 5k each and their total game data consist of exactly 1 game between themselves, you cannot connect that information to a huge pool of 5k players elsewhere.

Chant ten times: Strongly disconnected data should not be compared.:)

Quote:

Every additional game adds more information, and it cannot make a system less accurate.

The problem lies in the system itself. If it is not good enough, then it does not interprete sparse data correctly. One must not overinterpret such a system by feeding it with also the sparse data.

Harleqin · **#27**

Robert, you seem to presume a weakness of the system before you have even looked at it.

In my as yet rough understanding, each game result is a data point. If only few data points are directly connected to a player, then that player's resulting rating graph will be easily moved with further (even indirect) data. Game results against this player will therefore naturally have little impact on the rating graph of a player with more games.

I understand that you have made bad experience with ELO-like systems. My impression is that this kind of problems is naturally covered by a WHR-like approach.

We shall keep this potential problem in mind, but I would like to move on to a more detailed look at the algorithm now.

RobertJasiek · **#28**

I have not referred to only one particular rating system but to rating systems in general.

Li Kao · **#29**

One problem with most ranking systems is how to anchor them. On online servers you can anchor some players who don't improve much but are very active and anchoring bots. Anchoring RL systems is much harder.
Perhaps define some percentiles for rank intervals?

Liisa · **#30**

Li Kao wrote:

One problem with most ranking systems is how to anchor them. On online servers you can anchor some players who don't improve much but are very active and anchoring bots. Anchoring RL systems is much harder.
Perhaps define some percentiles for rank intervals?

Anchoring is not necessary because we can just let the system to float freely. Mathematical rating system should not have any direct and fixed relationship with kyuu-dan ranks (that are subjective honorary titles). If we try to force that relationship, it will just decrease the reliability of the mathematical system. (We play handicap games in tournaments only when we are beginner double digit kyuus!)

And the good thing of plain and simple Elo is that even though we cannot deduce from Elo exact probability of beating specific opponent. We can always put players in very specific order within certain subpopulation. (This is the reason why GoR works like magic!) And there are always enough traffic between subpopulations (e.g. via EGC) so that we can calibrate them to match roughly each other if that is necessary.

But I agree that history approach has it's merits. The best way is to calculate simultaneously normal Elo and rating that includes enough history (a year or so to the past) and put both figures to the same graph.

yoyoma · **#31**

RobertJasiek wrote:

My first statistics book had a nice example: Estimate the distance between two towns. First you take a rough look: "The next town is about 10km afar." The you measure your town's mediaeval wall: "It is 30cm thick". Now you conclude: "The distance is 10km + 30cm = 10.0003km."

I would say it this way: The next town is 10km +/- 2km. Then measure wall is 30cm. Now: The next town is 10.0003km +/- 2km. Mathematically it works just fine.

Turning to examples of go ratings, if a player has only played 2 games, an even game win against a 30k, and an even game loss to a 2k, then the rating system can say he is 16k +/- 14 ranks.

If a different player played 1000 games, all even games against 16k players, and won 50% lost 50%, then the rating system can say his is 16k +/- 0.2 ranks.

So no games are thrown out. Of course the system will have less confidence in ratings of players with less games. AGA's system publishes a number related to the confidence for all players.

RobertJasiek · **#32**

Quote:

the rating system can say he is 16k +/- 14 ranks.

If the rating systems did say it (in terms of rating points), that would be an improvement. Strange confidence values instead say too little to the reader.

Harleqin · **#33**

Liisa wrote:

Mathematical rating system should not have any direct and fixed relationship with kyuu-dan ranks (that are subjective honorary titles). If we try to force that relationship, it will just decrease the reliability of the mathematical system.

The ranks are just labels attached to certain values of the model. They do not change the model.

Quote:

(We play handicap games in tournaments only when we are beginner double digit kyuus!)

I think that this is a very lamentable recent development.

Quote:

And the good thing of plain and simple Elo is that even though we cannot deduce from Elo exact probability of beating specific opponent, we can always put players in very specific order within a certain subpopulation. (This is the reason why GoR works like magic!)

You describe a use and an outcome and that should be a reason? Magic indeed...

Quote:

And there are always enough traffic between subpopulations (e.g. via EGC) so that we can calibrate them to match roughly each other if that is necessary.

What is the average rating improvement of finnish players at the London Open? If calibration was so fast, it should approach 0, no? The problem is that calibration can only propagate through later games. If a population is 40 ELO points underrated and 5% of them go to a big foreign tournament, they bring home 40 points each. In theory, this should mean that the population is afterwards only 38 points underrated, but in order to actually distribute these points, a lot of games have to be played (and the players bringing the points will naturally not be very inclined to do so).

Quote:

But I agree that history approach has its merits. The best way is to calculate simultaneously normal Elo and rating that includes enough history (a year or so to the past) and put both figures to the same graph.

I think that you have not yet looked at the idea of the algorithm in question. It is not about "including some history".

I guess that I cannot expect everyone who wants to discuss this to read that paper, so an explanation will have to be given in this thread. I shall look into that.

Liisa · **#34**

Harleqin wrote:

Quote:

And there are always enough traffic between subpopulations (e.g. via EGC) so that we can calibrate them to match roughly each other if that is necessary.

What is the average rating improvement of finnish players at the London Open? If calibration was so fast, it should approach 0, no? The problem is that calibration can only propagate through later games. If a population is 40 ELO points underrated and 5% of them go to a big foreign tournament, they bring home 40 points each. In theory, this should mean that the population is afterwards only 38 points underrated, but in order to actually distribute these points, a lot of games have to be played (and the players bringing the points will naturally not be very inclined to do so).

If we see that subpopulation's rating is off by 38 points after the LOGC, then we can add 38 Elo points to entire active sub population. In practice we can add 12 points (30%) and then look how much subpopulation is still underrated after next year. If this kind of comparison is applied once a year, soon enough we will get acceptable differences between subpopulations or better yet that subpopulations will stay in sync. This is not that hard.

Quote:

I guess that I cannot expect everyone who wants to discuss this to read that paper, so an explanation will have to be given in this thread. I shall look into that.

That would be nice. Because it is difficult to understand any key points of WHR from the paper. What is WHR about in practice? Exactly how many games/months from past you would like to take in consideration? Real world resembling examples are always nice.

daniel_the_smith · **#35**

Liisa wrote:

If we see that subpopulation's rating is off by 38 points after the LOGC, then we can add 38 Elo points to entire active sub population. In practice we can add 12 points (30%) and then look how much subpopulation is still underrated after next year. If this kind of comparison is applied once a year, soon enough we will get acceptable differences between subpopulations or better yet that subpopulations will stay in sync. This is not that hard.

Do you... plan to do that by hand? For each subpopulation? And how do you even identify a subpopulation? How can you tell how over/underrated a subpopulation is? And how could you keep such a system impartial? I concede it may be possible, but I don't see how you can claim it isn't hard...

Besides, WHR basically does that for you, only in a much better way than arbitrarily giving subpopulations 30% bonuses...

Liisa · **#36**

daniel_the_smith wrote:

Besides, WHR basically does that for you

how?

daniel_the_smith · **#37**

As I understand it, WHR works backwards as well as forwards in time. So it should distribute those rating points back amongst the isolated pool retroactively, with no further games necessary.

Although, now that I think about it more, I don't know that it would distribute significantly more points than those few players won (as you were suggesting).

Harleqin · **#38**

WHR does not distribute points.

daniel_the_smith · **#39**

Maybe I should read the paper again. I read it a very long time ago...

yoyoma · **#40**

RobertJasiek wrote:

Quote:

the rating system can say he is 16k +/- 14 ranks.

If the rating systems did say it (in terms of rating points), that would be an improvement. Strange confidence values instead say too little to the reader.

AGA already does something like this. My AGA rating is 3.354528, with a sigma of 0.276734.

Whole History Rating

Who is online