Life In 19x19

Posted: **Wed Aug 22, 2012 7:32 am**

HermanHiddema wrote:
topazg wrote:I'm arguing that the support data collected argues towards a relationship between rating and performance where, with a large enough rating gap, the chance of the weaker playing is 0%. There is limited by existant data supporting this by demonstrating that, for chess at least, the relationship is closer to linear than logistic.

You are arguing that the relationship between rating and performance exists, but the lower and upper bounds of winning chance never reach 100% or 0% - a true logistic function. I'm asking for some data that supports the view that a true logistic function is more reliable model than a function that is linear with logistic elements.
Oh, I see.

Well, the basis of the Elo rating system and similar systems is logistical. There is no rating difference for which the formula returns 0 or 1. The data fits that curve reasonably well, AFAIK. The fact that a certain result, which according to the formula should have a very small but non-zero chance, has not in fact happened, does not in any way constitute proof that it cannot happen.

My original reference to the chess world revolved around an appeal from mathematicians (led, IIRC, by Jeff Sonas) that FIDE change their rating formula, precisely because the Elo rating system does not, in fact, fit the data very well, the data being closer to a linear model. Elo's model, whilst a really nice way of having a decently designed rating system at the time, is a very crude model, and was never originally based on supporting data (and with k values and rating difference / sd type values arbitrarily assigned so the model could be refitted to be as closely predictive as possible).

I fully understand that I have a rather sad life to find this so interesting, but I do

(and have spent quite a lot of time trawling through data surrounding it)

Because everyone loves graphs, and because some people probably don't have the faintest idea what we're talking about, I've made some very crappily drawn graphical examples:

Herman's theory (standard logistical function):

My theory (somewhat logistical function, but with upper and lower floors), please ignore the fact I can't draw freehand curves very well:

EDIT: p = the probability of the player beating the reference player, 1 being 100%, and r being the amount the player's rating in question differs from the reference player.

Posted: **Wed Aug 22, 2012 8:02 am**

HermanHiddema wrote:In fact, I think there is no meaningful support for my position. And that there is also none for yours. I do not think that thought experiments have value in a discussion like this. I do not think there is any meaningful way to extrapolate from the data we have.

The Monte Carlo bots, which effectively play random moves and random games in order to pick the best move, are surely a proof of some form that random playing monkeys can play at least 5d moves.

Posted: **Wed Aug 22, 2012 8:04 am**

daniel_the_smith wrote:What you all are saying amounts to a claim that 20k players are *worse than random*. I can't understand this. The random player is going to have to play like 10¹⁰⁰ games just to get one as good as an average 20k game...

Numbers scientifically pulled out of a donkey.

It is not equivalent to the claim that 20k players are worse than random. What people are claiming is that random play is vastly worse than the 20k, but that the range of play that the random player has is wider. Thus, the chances of a 20k player to beat a 20k player are 50-50, decreasing as you go up to 10k and onwards. For all these ranks, the random player is vastly worse, but at some point there is a crossover, where both the 20k and random player have absurdly small chances. Then the claim is that at some point, the 20k player reaches zero, while the random player continues to have an extremely small chance.

This is impossible according to our ratings math, but so what? You're confusing the model imposed by our rankings with reality.

Posted: **Wed Aug 22, 2012 8:08 am**

hyperpape wrote:This is impossible according to our ratings math, but so what? You're confusing the model imposed by our rankings with reality.

Exactly this ^^

Ranking models are exactly that, models, and it's not valid to assume that they are accurate across ranks. Models improve over time, because we all want better models. Why should we assume that a logistical model is definitive?

I think it's also fairly crucial for daniel's specific point not to assume that a random playing bot is just an absurdly weak human player.

Posted: **Wed Aug 22, 2012 8:11 am**

quantumf wrote:
HermanHiddema wrote:In fact, I think there is no meaningful support for my position. And that there is also none for yours. I do not think that thought experiments have value in a discussion like this. I do not think there is any meaningful way to extrapolate from the data we have.
The Monte Carlo bots, which effectively play random moves and random games in order to pick the best move, are surely a proof of some form that random playing monkeys can play at least 5d moves.

That random play can produce 5d moves is rather obvious. Random play can play any move on the board.

Posted: **Wed Aug 22, 2012 8:14 am**

topazg [76]:

while i am not completely decided which side of this argument i support, your claims based on the statistics are interesting, i shall look more into those systems proposed as a substitution for ELO (or GoR)

Posted: **Wed Aug 22, 2012 8:18 am**

I can see Topazg's argument, but I think one pretty important factor has been omitted from the analysis. The chance for blunders by the 9p/1d/higher ranked player. It's much more likely for a 20k player to punish such a blunder in comparison to a random play bot. I don't know how much this would shift the odds in favour of the 20k player in comparison to the random bot, but I suspect the effect will be quite large because the only realistic way for 20k to beat 1d/9p will be from blunders. This is partly corroborated by my experience against much weaker players - my losses against them are almost always due to blunders.

In other words, I think the discussion so far has been about the chance of the 20k/random bot playing near 1d/9p strength instead of the chance for 1d/9p to play at the 20k level (if only for one crucial move) :p

Posted: **Wed Aug 22, 2012 8:20 am**

HermanHiddema wrote:That random play can produce 5d moves is rather obvious. Random play can play any move on the board.

Exactly. But a 20k player, in my experience, will NOT play any move on the board. They will only play 20k moves, and will frequently/usually not play the right moves.

Posted: **Wed Aug 22, 2012 8:23 am**

Laman wrote:topazg [76]:

while i am not completely decided which side of this argument i support, your claims based on the statistics are interesting, i shall look more into those systems proposed as a substitution for ELO (or GoR)

I've just had a look for the source material that supports my argument, and it's gone from Jeff Sonas' Chessmetrics site

There's still a fair bit on the 'net from him about the discussions over appropriate K values and fudge factors (FIDE decided to keep Elo in the end, but made a few adjustments to the way it worked), but not the full broad spectrum results data the showed such a clear trend line.

Much of his work now seems to be around standardising chess ratings over time so you can compare players from different eras, which whilst interesting, isn't the point of this discussion.

I corresponded with him a couple of times a while back, so if I can't find it with a bit of further digging, I'll send him an email. What we really need is actual statistics of serious even games of Go between widely varying strengths of players. Play 1000 swiss tournaments with anyone from 6d to 20k and you should get enough data for trend analysis to support one side or the other - we just need to organise those tournaments

illluck wrote:I can see Topazg's argument, but I think one pretty important factor has been omitted from the analysis. The chance for blunders by the 9p/1d/higher ranked player. It's much more likely for a 20k player to punish such a blunder in comparison to a random play bot. I don't know how much this would shift the odds in favour of the 20k player in comparison to the random bot, but I suspect the effect will be quite large because the only realistic way for 20k to beat 1d/9p will be from blunders. This is partly corroborated by my experience against much weaker players - my losses against them are almost always due to blunders.

In other words, I think the discussion so far has been about the chance of the 20k/random bot playing near 1d/9p strength instead of the chance for 1d/9p to play at the 20k level (if only for one crucial move) :p

I agree, and this is definitely an important point. I'm pretty sure that this is precisely why losses between 1d and 7k players happen. I think the course of a Go game is probably too long for a 20k to manage to secure a victory due to a 1d blunder or 5 (unless the 1d is hopelessly drunk or something), but I'm pretty convinced a 9p wouldn't make enough of the sorts of blunders required to lose to a 20k. The stronger the player, the fewer (and smaller) the blunders there are.

For the sake of argument, if we modelled Go to be a 0.5 win for White with perfect play on both sides, the argument still holds with a perfect-bot playing black and the chance of losing to a 20k or a random-bot.

quantumf wrote:Exactly. But a 20k player, in my experience, will NOT play any move on the board. They will only play 20k moves, and will frequently/usually not play the right moves.

I agree. I suspect it is precisely the blinkered tunnel vision, that partial knowledge and the attempt to apply it provide, that allows a random bot to occasionally succeed where the 20k can't. I have never met a player, even of mid dan strength, that doesn't concede that they frequently simply have blind spots to certain moves and ideas. Random bots have no blind spots.

Posted: **Wed Aug 22, 2012 9:28 am**

HermanHiddema wrote:Well, the basis of the Elo rating system and similar systems is logistical. There is no rating difference for which the formula returns 0 or 1. The data fits that curve reasonably well, AFAIK. The fact that a certain result, which according to the formula should have a very small but non-zero chance, has not in fact happened, does not in any way constitute proof that it cannot happen.

Saying it fits the curve well is actually a pretty weak statement. One would need to know a lot about what tests were used, whether alternative curves were checked, and so on.

Posted: **Wed Aug 22, 2012 10:14 am**

Some of you might want to read this: http://lesswrong.com/lw/mp/0_and_1_are_ ... abilities/

It's not possible for a 20k to have a literally 0 chance of beating a 9p. We know 9p players occasionally keep the game close and occasionally self atari near the end. This is vastly more probable than random play producing a 1d game.

Posted: **Wed Aug 22, 2012 10:35 am**

daniel_the_smith wrote:It's not possible for a 20k to have a literally 0 chance of beating a 9p.

Proof?

EDIT: Probabilities of 0 do exist aplenty. What's the probability of rolling a 7 on a standard 6-sided die?

Posted: **Wed Aug 22, 2012 11:22 am**

It's on the same order as random play beating a 1d.

(You could be mistaken about what kind of die you're rolling, or many other similar weird cases.... You could even be hallucinating it.)

Posted: **Wed Aug 22, 2012 12:04 pm**

daniel_the_smith wrote:(You could be mistaken about what kind of die you're rolling, or many other similar weird cases.... You could even be hallucinating it.)

Hahaha, oh come on, even you know that's close to trolling

I spoke to Jeff, and he said he'd get back to me on that data, but in the meantime, he's running the following:

http://www.kaggle.com/c/ChessRatings2

Apparently FIDE intend on considering adopting the winner (by which I suspect they mean not adopting) - perhaps the EGF could for Go?

EDIT: And an interesting quote attributed to Arpad Elo (interesting given the debate over rank deflation / inflation):

"The true challenge (of the rating system administrator) is maintenance of the integrity of the ratings in his pool, so that from one year to the next, or from one decade to the next, a given rating will represent essentially the same level of chess proficiency." Arpad Elo - "The Rating of Chessplayers, Past and Present" 1978.

Posted: **Wed Aug 22, 2012 12:18 pm**

daniel_the_smith wrote:It's on the same order as random play beating a 1d.

(You could be mistaken about what kind of die you're rolling, or many other similar weird cases.... You could even be hallucinating it.)

To quote from that article:

"However, in the real world, when you roll a die, it doesn't literally have infinite certainty of coming up some number between 1 and 6. The die might land on its edge; or get struck by a meteor; or the Dark Lords of the Matrix might reach in and write "37" on one side."

This argument can quickly get into the ridiculous, and rarely does it not involve one side trolling the other. You can end up arguing that there is a probability of 1 that there are a negative number of possible outcomes from rolling said die, but seriously, what kind of theoretical abstractity is this?

Life In 19x19

The Probability of a Monkey Defeating Yi Chang-ho

Re: The Probability of a Monkey Defeating Yi Chang-ho

Re: The Probability of a Monkey Defeating Yi Chang-ho

Re: The Probability of a Monkey Defeating Yi Chang-ho

Re: The Probability of a Monkey Defeating Yi Chang-ho

Re: The Probability of a Monkey Defeating Yi Chang-ho

Re: The Probability of a Monkey Defeating Yi Chang-ho

Re: The Probability of a Monkey Defeating Yi Chang-ho

Re: The Probability of a Monkey Defeating Yi Chang-ho

Re: The Probability of a Monkey Defeating Yi Chang-ho

Re: The Probability of a Monkey Defeating Yi Chang-ho

Re: The Probability of a Monkey Defeating Yi Chang-ho

Re: The Probability of a Monkey Defeating Yi Chang-ho

Re: The Probability of a Monkey Defeating Yi Chang-ho

Re: The Probability of a Monkey Defeating Yi Chang-ho

Re: The Probability of a Monkey Defeating Yi Chang-ho