New ways to rate players

John Fairbairn · #1

Elo ratings may be past their sell-by date even in chess, and certainly appear to cause too many controversies when applied to go. There are two new initiatives in the chess world that may be better suited to go.

One is a universal rating (URS) based on results for classical, rapid and blitz games combined. The guy behind it claims URS was a better predictor of the recent World Blitz event than predictions made using blitz events alone. In any case, the go results as used in ratings are an even denser mixture of classical and fast time limits than in chess and a system that combines them may be superior for go.

The other system may soon be available for go. CAPS is a computer assessment performance system which tests each move by a human in a game by comparing it to the move (or range of top moves) chosen by a computer. The rating is then a percentage score showing what proportion of "perfect" computer moves the player made. This would seem to be the best way now to assess historical players. This has already been done for chess, and the old masters did surprisingly well. I think Capablanca was best of the past masters and he was very high overall. Even Morphy did tolerably well. But all the best players scored very highly, at least as far as I understood it.

Here's a link:

https://www.chess.com/news/grand-chess- ... cards-3703

Monkey · #2

I don't like the CAPS idea because computers, even in chess, do not have "perfect" evaluation. For example, in chess it's well known that a computer does not correctly evaluate situations where one player has more material than the other player, but some pieces are almost useless because of their positioning. In practice, it's hard for a human to exploit this; however, in human vs. human games these situations occur frequently, so the problem of evaluation becomes more apparent. Moreover, even top computers are beatable (mostly by other top computers), so this method would beg the question of which computer's evaluation to use as the perfect move. If a computer that is stronger than the one used for evaluating the perfect move plays, then the stronger computer would not get a perfect score because it will play moves that are contrary to the move that is considered perfect.

dfan · #3

I look forward to the release of the full specification of the Universal Rating System. The extent to which I take it seriously is limited while the formula is secret, although the general outline is interesting, and I respect both Sonas and Glickman, who seem to be the main people behind it.

In fact in spirit it seems pretty similar to Rémi Coulom's Whole History Rating, which is used at goratings.org. If they publish a paper about it I hope that there will be an extensive section on comparisons with other rating systems.

I'm not that surprised that URS predicted the FIDE Blitz Championship results better than FIDE blitz Elo, due to the fact that FIDE blitz ratings are based on very little data. (There's no way that Artemiev was actually the second best blitz player in the world before the tournament, for example.) Thus I would assume that even if you posit some qualitative difference in people's blitz strengths relative to their classical strengths, you would get better predictive results for now by augmenting the small data set of blitz results with classical games. In fact I wonder what accuracy one would have gotten by using FIDE classical ratings, or by averaging classical and blitz ratings.

The CAPS system builds on Kenneth Regan's IPR research, and I'm surprised and disappointed that they didn't credit him (unless I missed it). Searching for "regan ipr" will turn up some of his work; here's one paper. His work is the main evidence against the claim of rating inflation over the years. Of course while a system like this is very interesting and informative for player evaluation, especially historically when we can't pit them head to head, what really matters is over-the-board performance. It makes me nervous when I see systems like this proposed as an element of actual ratings. Though if they turn out to predict actual results better than just using played games does, it would be hard to argue against them...

gowan · #4

Ratings have different purposes. One purpose is to make it possible to estimate the probability players will win. A second purpose is to make an absolute measure of strength. CAPS seems to try to make an absolute estimate of strength. An important part of games of chess and go is psychological. This aspect has been used deliberately, as done by Fischer in the world championship match with Spassky, but psychological issues also affect individual players due to circumstances e.g. kadoban games. Emanuel Lasker, the chess world champion who held the title the longest, was famous for playing his opponent rather than objectively playing the position on the board. As for go, we hear of people choosing openings because their opponent likes the particular opening, another effort to psych out the opponent. I suspect that some people play better against human opponents than against computers and we know many people who play less well on line than in face-to-face games. All this speaks to the effect of being separated from the opponent. I wonder whether ratings can ever be really accurate in measuring strength due to the psychological side of games.

dfan · #5

I feel that one's strength really fundamentally is how well one performs in games, and so estimating the probability that players will win (Arpad Elo's original goal when he created his rating system) is inherently estimating strength. We all know people who play better in theory than in practice; I'm probably one of them myself. I think it's useful to be aware of one's theoretical ability (not to mention that it can be broken down into aspects such as opening, endgame, tactics, etc.) but I would not want to confuse it with "actual strength". For example, the idea (not that anyone has proposed it here) of awarding a rank certificate based on a passing a test of problems rather than on game results gives me the heebie-jeebies.

Matti · #6

Maybe we could throw away the current rating systems and just see how well the player plays against AlphaGo. The program just needs a modification, that it is meaningful to measure, who loses least points against it.

Javaness2 · #7

I don't think either of these are particularly new, at least I seem to remember reading about such ideas years ago. Elo based systems are so simple to implement and have a reasonable enough accuracy that they seem by far and wide the most popular choice. The general problem with them in Europe is that associations do not implement the rating system in the same way. Yes the same algorithm is run each time we submit results to the database, but given an initial set of tournament results, there are 3 different ways in which that set of results can be submitted to the EGD. All 3 produce different exit ratings.

New ways to rate players

Who is online