Whole History Rating

The home for discussions about the EGF
pwaldron
Lives in gote
Posts: 409
Joined: Wed May 19, 2010 8:40 am
GD Posts: 1072
Has thanked: 29 times
Been thanked: 182 times

Re: Whole History Rating

Post by pwaldron »

RobertJasiek wrote:It defies your dream but insignificance should be taken into account instead of being overlooked.


It also defies mathematical theorems. It is always better to have more information (in form of game results). Your belief to the contrary is irrelevant.
User avatar
prokofiev
Lives with ko
Posts: 223
Joined: Tue Apr 27, 2010 8:03 pm
Rank: decent sdk
GD Posts: 138
Has thanked: 67 times
Been thanked: 10 times

Re: Whole History Rating

Post by prokofiev »

prokofiev wrote:- I'm confused by the example rating graph for CrazyStone in the paper. It seems to predict the large rise in CrazyStone's rating during one period of inactivity but not during another. That is, is that graph not "the rating this system would give CrazyStone at each point in time if we had it running and updating" but rather something bizarre like "what CrazyStone's rating seems to most likely have been at each point in time given the later data as well"? (Is that what is meant by "a posteriori" in the paper?)


Answering my own question (apologies):

The second "quote" above is in fact correct, but this is not a bug, it's a feature. The model seeks better & better approximations of the whole rating graph because it takes into account the likelihood of the ratings varying (e.g. slowly varying is more likely than quickly). To get a better approximation now, a better approximation in the past is desired too.

(Also, that isn't really what "a posteriori" refers to in the paper.)
pwaldron
Lives in gote
Posts: 409
Joined: Wed May 19, 2010 8:40 am
GD Posts: 1072
Has thanked: 29 times
Been thanked: 182 times

Re: Whole History Rating

Post by pwaldron »

prokofiev wrote:Also, that isn't really what "a posteriori" refers to in the paper.


The posterior function is a statistical term. It represents an updated probability based on what you knew before (called the prior function), modified by some new information (in this case game results).
User avatar
prokofiev
Lives with ko
Posts: 223
Joined: Tue Apr 27, 2010 8:03 pm
Rank: decent sdk
GD Posts: 138
Has thanked: 67 times
Been thanked: 10 times

Re: Whole History Rating

Post by prokofiev »

pwaldron wrote:
prokofiev wrote:Also, that isn't really what "a posteriori" refers to in the paper.


The posterior function is a statistical term. It represents an updated probability based on what you knew before (called the prior function), modified by some new information (in this case game results).


Thanks. I'd realized the meaning, but still didn't connect the term with prior!
RobertJasiek
Judan
Posts: 6272
Joined: Tue Apr 27, 2010 8:54 pm
GD Posts: 0
Been thanked: 797 times
Contact:

Re: Whole History Rating

Post by RobertJasiek »

Sverre, there are these possibilities: a) yes, b) give them pseudo-ratings that are shown for their pleasure but otherwise ignored, c) use a rating system that calulates only local ratings anyway.

prokofiev, I want something stronger than weak confidence parameters, which are a makeshift measure.

pwaldron, maybe in theory there are more information is better theorems but currently rating systems are so far from perfect that a more modest approach makes it easier to design better systems. When we will have them, one can still come back to the low confidence sparse data noise and see if one can explain them already well.
User avatar
Harleqin
Lives in sente
Posts: 921
Joined: Sat Mar 06, 2010 10:31 am
Rank: German 2 dan
GD Posts: 0
Has thanked: 401 times
Been thanked: 164 times

Re: Whole History Rating

Post by Harleqin »

It seems to me that Robert's fears are quite inspecific. I can see how they apply to the current systems, but I do not see them as an impediment to studying better ones.

Of course, looking how the new system handles sparse local data is an important subtopic.

Perhaps we could now have a more detailed look at the paper and get an impression on the different parts of the system and how they fit together.
A good system naturally covers all corner cases without further effort.
User avatar
Sverre
Lives with ko
Posts: 193
Joined: Thu Apr 22, 2010 1:04 pm
Rank: 2d EGF and KGS
GD Posts: 1005
Universal go server handle: sverre
Location: Trondheim, Norway
Has thanked: 76 times
Been thanked: 29 times

Re: Whole History Rating

Post by Sverre »

RobertJasiek wrote:Sverre, there are these possibilities: a) yes, b) give them pseudo-ratings that are shown for their pleasure but otherwise ignored, c) use a rating system that calulates only local ratings anyway.


OK, could you give some precise numbers on for example minimal number of rated games per year, or objective criteria for when one is in an "isolated subpopulation"? And also an estimate on what percentage of players would be booted from the rating system under these criteria?
User avatar
topazg
Tengen
Posts: 4511
Joined: Wed Apr 21, 2010 3:08 am
Rank: Nebulous
GD Posts: 918
KGS: topazg
Location: Chatteris, UK
Has thanked: 1579 times
Been thanked: 650 times
Contact:

Re: Whole History Rating

Post by topazg »

Sverre wrote:
RobertJasiek wrote:Sverre, there are these possibilities: a) yes, b) give them pseudo-ratings that are shown for their pleasure but otherwise ignored, c) use a rating system that calulates only local ratings anyway.


OK, could you give some precise numbers on for example minimal number of rated games per year, or objective criteria for when one is in an "isolated subpopulation"? And also an estimate on what percentage of players would be booted from the rating system under these criteria?


Well, I'm guessing the 5 or 6 games I play a year would probably get me booted anyway ..
pwaldron
Lives in gote
Posts: 409
Joined: Wed May 19, 2010 8:40 am
GD Posts: 1072
Has thanked: 29 times
Been thanked: 182 times

Re: Whole History Rating

Post by pwaldron »

RobertJasiek wrote:pwaldron, maybe in theory there are more information is better theorems but currently rating systems are so far from perfect that a more modest approach makes it easier to design better systems. When we will have them, one can still come back to the low confidence sparse data noise and see if one can explain them already well.


Robert, chant ten times: It is always better to have more information.

The very worst you can have is a game prediction algorithm that flips a coin to predict the winner. Every additional game adds more information, and it cannot make a system less accurate. Some game results are more useful than others in pinning down ratings, but they all have value and it is foolish to throw any away. If the information is not useful then it does little to reduce the uncertainty in the resulting estimated parameters (i.e., ratings) but it's never worse to have the information than not to have it.
RobertJasiek
Judan
Posts: 6272
Joined: Tue Apr 27, 2010 8:54 pm
GD Posts: 0
Been thanked: 797 times
Contact:

Re: Whole History Rating

Post by RobertJasiek »

Sverre wrote:OK, could you give some precise numbers on for example minimal number of rated games per year, or objective criteria for when one is in an "isolated subpopulation"? And also an estimate on what percentage of players would be booted from the rating system under these criteria?


No. One would have to think about it to set useful values. I have wanted to encourage such thinking; I have not carried it out in detail myself.
RobertJasiek
Judan
Posts: 6272
Joined: Tue Apr 27, 2010 8:54 pm
GD Posts: 0
Been thanked: 797 times
Contact:

Re: Whole History Rating

Post by RobertJasiek »

pwaldron wrote:Robert, chant ten times: It is always better to have more information.


My first statistics book had a nice example: Estimate the distance between two towns. First you take a rough look: "The next town is about 10km afar." The you measure your town's mediaeval wall: "It is 30cm thick". Now you conclude: "The distance is 10km + 30cm = 10.0003km."

Likewise if you have two isolated players who claim to 5k each and their total game data consist of exactly 1 game between themselves, you cannot connect that information to a huge pool of 5k players elsewhere.

Chant ten times: Strongly disconnected data should not be compared.:)

Every additional game adds more information, and it cannot make a system less accurate.


The problem lies in the system itself. If it is not good enough, then it does not interprete sparse data correctly. One must not overinterpret such a system by feeding it with also the sparse data.
User avatar
Harleqin
Lives in sente
Posts: 921
Joined: Sat Mar 06, 2010 10:31 am
Rank: German 2 dan
GD Posts: 0
Has thanked: 401 times
Been thanked: 164 times

Re: Whole History Rating

Post by Harleqin »

Robert, you seem to presume a weakness of the system before you have even looked at it.

In my as yet rough understanding, each game result is a data point. If only few data points are directly connected to a player, then that player's resulting rating graph will be easily moved with further (even indirect) data. Game results against this player will therefore naturally have little impact on the rating graph of a player with more games.

I understand that you have made bad experience with ELO-like systems. My impression is that this kind of problems is naturally covered by a WHR-like approach.

We shall keep this potential problem in mind, but I would like to move on to a more detailed look at the algorithm now.
A good system naturally covers all corner cases without further effort.
RobertJasiek
Judan
Posts: 6272
Joined: Tue Apr 27, 2010 8:54 pm
GD Posts: 0
Been thanked: 797 times
Contact:

Re: Whole History Rating

Post by RobertJasiek »

I have not referred to only one particular rating system but to rating systems in general.
User avatar
Li Kao
Lives in gote
Posts: 643
Joined: Wed Apr 21, 2010 10:37 am
Rank: KGS 3k
GD Posts: 0
KGS: LiKao / Loki
Location: Munich, Germany
Has thanked: 115 times
Been thanked: 102 times

Re: Whole History Rating

Post by Li Kao »

One problem with most ranking systems is how to anchor them. On online servers you can anchor some players who don't improve much but are very active and anchoring bots. Anchoring RL systems is much harder.
Perhaps define some percentiles for rank intervals?
Sanity is for the weak.
User avatar
Liisa
Lives with ko
Posts: 129
Joined: Wed Jun 16, 2010 3:30 am
Rank: EGF 1989 KGS 2d
GD Posts: 0
Location: Turku, Finland
Has thanked: 12 times
Been thanked: 21 times
Contact:

Re: Whole History Rating

Post by Liisa »

Li Kao wrote:One problem with most ranking systems is how to anchor them. On online servers you can anchor some players who don't improve much but are very active and anchoring bots. Anchoring RL systems is much harder.
Perhaps define some percentiles for rank intervals?


Anchoring is not necessary because we can just let the system to float freely. Mathematical rating system should not have any direct and fixed relationship with kyuu-dan ranks (that are subjective honorary titles). If we try to force that relationship, it will just decrease the reliability of the mathematical system. (We play handicap games in tournaments only when we are beginner double digit kyuus!)

And the good thing of plain and simple Elo is that even though we cannot deduce from Elo exact probability of beating specific opponent. We can always put players in very specific order within certain subpopulation. (This is the reason why GoR works like magic!) And there are always enough traffic between subpopulations (e.g. via EGC) so that we can calibrate them to match roughly each other if that is necessary.

But I agree that history approach has it's merits. The best way is to calculate simultaneously normal Elo and rating that includes enough history (a year or so to the past) and put both figures to the same graph.
Post Reply