moha wrote:
The last example seem to have returned to a game without draws. Almost impossible to draw before the final transform, but even a nonperfect player can beat perfect play after it. I find it hard to see this as a good model for go with perfect komi, from what we know from smaller boards, solvable positions etc. Could you change it to match a perfect komi game with several potential subpoint (CGT or other) mistakes for both sides but with integer rounding, according to your interpretation?
Yep! The game is exactly the same except now the grid is on all the integers (...-2,-1,0,1,2,...) and the game starts at 0.499999 (so with perfect play with both players always flipping zeros on cards, the game is a draw).
The flip flopping that you get is now between wins and draws, rather than wins and losses. But that's fine, in variant C it's still pretty easy to fit multiple "75%" classes within the span of the last point, much more than 2 classes.
Let's ignore whether that many consecutive 9s is realistic as a model for Go - I'll settle for simply having exhibited a game where:
* Mistakes behave cumulatively/additively.
* The final result is a discrete integer or half-integer score in a way that seems on the larger scale to be stochastic and roughly linear in cumulative mistakes.
* The discretization of the score in this way does NOT tightly bound the number of possible "classes", in variant C (not even in the version with draws).
* Variants B and C are very hard to distinguish from only observing data from players whose error distributions are many integers wide, as they are right now for even current superhuman Go bots.
* None of it depends on any special pathologies between particular players or sets of players when matched versus each other.
Even if one doubts this is a great match for Go, because of more Go-specific details besides the above properties, isn't it fascinating that this is all explicitly possible at the same time?
As I said before many times, I don't want to try to argue this is a great match, because I don't think it is myself, at least not to this extreme. So I agree the specific numbers are doubtful, they're chosen to just make a point that this is possible in theory... and in a way still consistent with current observational evidence being too coarse to have actually ruled out!
moha wrote:
I'm reluctant to use it as is, but some of my doubts would translate here as: suppose the opponent is a bit weaker, and already moved the count to a significant bit in their losing way (the most common occurrence). Our variance is almost negligibly small (you assumed almost constant 0.1 avg mistake - a bit doubtful imo). Can we be sure that we can perform whole classes better or worse from these winning positions than our (near or perfect) neighbours, even if our and their mistakes will almost never amount to anything near a single point, and the final granulation is whole points so some of our tiny advantage over neighbours will surely get canceled?
Yep! If you take the model at face value. Still yes, but only to a smaller degree (e.g. maybe you only squeeze in an extra 1 class) if you think the behavior is enough closer to B but still maybe could have a small component of C-like behavior, in addition to "noise" of other sorts.
moha wrote:
hyperpape wrote:
I think that's ok. These players are nearly perfect. They'll win against almost all other players. But since they reliably beat each other, each of them should end up >= 400 points better than the previous one. Otherwise, playing against weaker opponents who you always beat would lower your ELO rating.
I don't think so. Any player can win against any other player except perfect one, but no player can win always (assuming a normal distribution and perfect komi). The difference in that tiny fraction of upset losses matters IMO, exactly because strength and performance is to be considered across all opponents and situations.
Oh I didn't realize you wanted to also consider vastly differently-skilled players to also measure how a model corresponds to Elo. I guess I had implicitly assumed that you meant a variety of not-too-distant players (eliminating all rock-paper-scissors effects, but still giving sane differences).
We already know that real life often increasingly deviates from Elo in the tails. And real ranking systems sometimes do use different tail models, and therefore do make orders-of-magnitude different predictions in the extremes from each other! Yet mostly it doesn't matter - there isn't a presumption that the model is supposed to be used for or should precisely match reality for estimating such tiny probabilities.
For what it's worth, the earlier example I had with normally distributed sum of errors will err on the side of the tails being *too thin*. Strong players will win *too often* against weaker players, so the rating difference as measured by strong vs vastly weaker player will actually be *even larger* than with chains of closer players. The opposite of the worry, hopefully making it more obvious that no matter how you slice it, you have more than 2 classes per point, not fewer? But if you wanted precisely Elo like tails, you could posit a logistic distribution, and if you thought that large upsets were actually more common in real life than merely logistic (plausibly true in some activities, deviating from Elo the other way), you could try a t-distribution.