Life In 19x19http://lifein19x19.com/ Strength as error distributionhttp://lifein19x19.com/viewtopic.php?f=10&t=16470 Page 1 of 1

 Author: moha [ Fri Feb 22, 2019 2:10 pm ] Post subject: Strength as error distribution This came up a few times recently - some random thoughts:The basic idea is that a player's strength can be described by the errors he makes. For simplicity I'd define an error as a move that loses points compared to the minimax solution (a bit doubtful *1). Such errors should be somewhat normal-ish (many small errors, fewer large errors *2), and after playing 100-200 moves the sum of these errors may be even more so (central limit).Overall I think assigning a mean and a deviation to a player's per-game error total could offer a decent model. This is not much different to Elo fundaments actually (performance = -errors, and deviation may even be guessable from the mean). Except in go, there is a more tangible meaning behind these numbers. When two players play, the winning margin is the actual sum of the errors of the opponent, minus actual sum of errors of the player (assuming correct komi).So for each game we have two distributions similar to this plot. The player wins if his "random sample" turns out to be higher than the opponent's (= he gives up less points in the game than the opponent).Player A has a distribution described by [Aev,Asd] and opponent has [Bev,Bsd]. For simple cases the distribution of the difference can be constructed, but a more general way of getting A's winning probability: for each point on A's distribution, we take its density multiplied by B's cumulative distribution from -infinity to that point (the cases where B made more errors than A's error point in question).Since only the relative width and position matters, B's distribution can be normalized, to use only A's shifted and scaled one afterwards: A becomes [Aev',Asd'] and B is [0,1]. This means A's numbers are expressed using B's original deviation as unit: we are only interested in where our distribution lies relative to opponent's one, and how it's shape aligns with his (how much wider/narrower it is).So Aev'=(Aev-Bev)/Bsd and Asd'=Asd/Bsd. With these, the winning probability can be approximated (*2): 1/(sqrt(pi*2*Asd'^2)) * int_x_-inf_inf( e^(-(x-Aev')^2/(2*Asd'^2)) * 0.5*(1+erf(x/sqrt(2))))Here is a wolfram example to calculate such win probabilities (variable substitution would make it too complex for the free version, so Aev' and Asd' occurrences need to be replaced manually inside square brackets).Although the absolute position of a distribution doesn't really matter, a very rough guess is strong pro level is somewhere around -50 (komi = 7, 1 stone = 2*komi, so 3-5 stones to perfect play). Two players are 1 stone apart if their ev difference is roughly 14 (supposedly 50% winrate with 1 extra stone or with reverse komi).More interesting is the question of deviation. There is a known problem in translating Elo-like ratings to stones: EGF win% table predicts that winrate against 1 stone stronger opponents is ~33% at 9k, ~25% at 1d, and only ~20% at 7d levels. Using the above function in reverse hints that at 1d the deviation may be a bit less than 1 stone (<14 points). For stronger levels the deviation decreases - making fewer and smaller errors not only means higher ev, but less absolute variance as well.These rank-dependent winrate differences are handled by EGF using an extra (deviation-like) variable term. This approach offers a natural explanation, from where A's distribution is shifted and scaled against B's normalised one. For stronger players the relative/scaled position of a one stone (14 points) stronger opponent's distribution is significantly farther (since the deviations are smaller). I think this is the real reason behind those differences observed in practice.*1 This ignores that a deliberate safety move that trades points for consolidation of a winning position is not the same kind of error as points lost on misplaying a local fight for example.*2 In go the actual error values and sums are integer, so something like a binomial distribution would probably be best. But approximating with other distributions like normal or logistic should also be ok, except maybe at near-perfect play (no positive values / side).

 Author: Joaz Banbeck [ Fri Feb 22, 2019 5:38 pm ] Post subject: Re: Strength as error distribution moha wrote:...errors should be somewhat normal-ish (many small errors, fewer large errors ...I'm suspicious of this assumption. The availability of errors of different sizes varies throughout the game. ( The largest error that can be made on the first move should be no more than komi*2, and in the last few moves it is usually a point or two. But in the middle game a bad move can sometimes throw away 100+ points )I suspect that it will not be a normal distribution: that small errors will be over-represented.

 Author: Bill Spight [ Fri Feb 22, 2019 6:44 pm ] Post subject: Re: Strength as error distribution To me the fact that errors are non-negative integers suggests a Poisson distribution.

 Author: Bill Spight [ Fri Feb 22, 2019 6:46 pm ] Post subject: Re: Strength as error distribution Joaz Banbeck wrote:moha wrote:...errors should be somewhat normal-ish (many small errors, fewer large errors ...I'm suspicious of this assumption. The availability of errors of different sizes varies throughout the game. ( The largest error that can be made on the first move should be no more than komi*2, and in the last few moves it is usually a point or two. But in the middle game a bad move can sometimes throw away 100+ points )I suspect that it will not be a normal distribution: that small errors will be over-represented.For many amateurs the error distribution may be bimodal. With better amateurs making fewer large errors.

 Author: moha [ Fri Feb 22, 2019 9:49 pm ] Post subject: Re: Strength as error distribution Joaz Banbeck wrote:The availability of errors of different sizes varies throughout the game. ( The largest error that can be made on the first move should be no more than komi*2, and in the last few moves it is usually a point or two. But in the middle game a bad move can sometimes throw away 100+ points )Right, the scales of individual errors likely correlate with temperature changes throughout the game. And a large per-game error total may have more to do with a middlegame blunder than with dozens of smaller errors, for example.This in itself doesn't exclude normality for the total though (e.g. the sum of a few normals is still normal, even if one of them is on orders of magnitude larger scale). But the normality of individual errors is even more questionable OC.Another possible consequence, verifiable from actual data on results: if the largest errors come from middlegame, the deviation of the total can significantly depend on the character of the player as well (so not guessable from the mean, like EGF tries). Someone who has a strong middlegame likely makes fewer errors there, so likely has a smaller deviation for his total than others with the same rank (mean). This still leaves him with 50% against them, but should have noticeable and consistent effects on his chances against 1 stone stronger opponents (similarly to 9k-1d-7d anomalies above).Quote:I suspect that it will not be a normal distribution: that small errors will be over-represented.This is why the longer route with the double integral seems preferable: it works for a wider range of distributions.Bill Spight wrote:To me the fact that errors are non-negative integers suggests a Poisson distribution. In a few years the newer bots (with multi-komi NNs or the SAI fork) may be able to provide actual data on this.

 Author: moha [ Mon Feb 25, 2019 6:20 pm ] Post subject: Re: Strength as error distribution Some further thoughts in comparison to 1-dimensional (mean only) systems:When two players play, the side with the higher mean always have the upper hand. How much his advantage is, however, depends on deviations nearly as much as on means.Balancing a matchup to 50% winrate with handicap or komi needs means only. This basically shifts means to be identical, then deviations don't matter anymore.Partially non-transitive situations are possible, rather practical even (no special correlations, players showing the same performance against all opponents). For example, A is [-100,10], B is [-110,15], and C is [-115,30]. Then A>B (71%), B>C (56%), but A wins less against C (68%) than against B.So it may be better to exclude non-handicapped games between players of different ranks from 1-dimensional systems. Otherwise deviations may get measured/smeared into the ratings (which should approximate means only - rating C higher than B would be incorrect).

 Author: moha [ Sun Apr 14, 2019 3:30 pm ] Post subject: Re: Strength as error distribution Out of curiosity I tried to use this approach on the relation between points (early mistakes / advantages) and winning probabilities.This is well defined if we know the shapes of players' distributions (or a good approximation), and we have at least a single data point to establish the scale (the distance between the two distributions in deviations). So if we know the percentage value of X points, we can calculate Y points and so on (by shifting the distributions).And we do have one data point: a whole stone. This is if one player passes his first move - or if there is 1 stone strength difference between the players. And we can guess the point value of this is twice komi - roughly 14 points.For human ranks I took the winrates against 1 stone stronger opponent from the above EGF table (adjusting down half-rank, and some guessing for pro levels / 9d since it only goes up to 7d). I experimented with bots as well, but their winrate gains fluctuate wildly and sometimes inconsistently (even for smaller mistakes), so I could only roughly conclude that one move for LZ is about 35-40% gain. Which is not much different from my 9d approximation so I made no column for this. Instead I include the idea of 2pt=10% - this can also be used as an anchor.So, using the above wolfram calculator in both directions, I get the following values:Code:                           1 dan   7 dan   9 dan   2=10?--------------------------------------------------------winrate vs 1 extra stone   27.4%   20.2%    ~16%    3.7%?equiv. distance in sd-s     0.85    1.18    1.411 point distance in sd-s   .0607   .0843   .1007   .1800sd in points               16.47   11.86    9.93    5.56------------------------winrate gain for 1 pt       1.71    2.38    2.84    5.06winrate gain for 2 pts      3.42    4.74    5.66    10.0winrate gain for 3 pts      5.12    7.10    8.46    14.9winrate gain for 5 pts      8.50    11.7    13.9    23.8winrate gain for 7 pts      11.8    16.2    19.1    31.4winrate gain for 14 pts     22.6    29.8    34.0    46.3This is for early game only OC - and similar results can be obtained by an oddswise approach as well.

 Page 1 of 1 All times are UTC - 8 hours [ DST ] Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Grouphttp://www.phpbb.com/