This came up a few times recently - some random thoughts:

The basic idea is that a player's strength can be described by the errors he makes. For simplicity I'd define an error as a move that loses points compared to the minimax solution (a bit doubtful

^{*1}). Such errors should be somewhat normal-ish (many small errors, fewer large errors

^{*2}), and after playing 100-200 moves the sum of these errors may be even more so (central limit).

Overall I think assigning a mean and a deviation to a player's per-game error total could offer a decent model. This is not much different to Elo fundaments actually (performance = -errors, and deviation may even be guessable from the mean). Except in go, there is a more tangible meaning behind these numbers. When two players play, the winning margin is the actual sum of the errors of the opponent, minus actual sum of errors of the player (assuming correct komi).

So for each game we have

two distributions similar to this plot. The player wins if his "random sample" turns out to be higher than the opponent's (= he gives up less points in the game than the opponent).

Player A has a distribution described by [Aev,Asd] and opponent has [Bev,Bsd]. For simple cases the distribution of the difference can be constructed, but a more general way of getting A's winning probability: for each point on A's distribution, we take its density multiplied by B's cumulative distribution from -infinity to that point (the cases where B made more errors than A's error point in question).

Since only the relative width and position matters, B's distribution can be normalized, to use only A's shifted and scaled one afterwards: A becomes [Aev',Asd'] and B is [0,1]. This means A's numbers are expressed using B's original deviation as unit: we are only interested in where our distribution lies relative to opponent's one, and how it's shape aligns with his (how much wider/narrower it is).

So Aev'=(Aev-Bev)/Bsd and Asd'=Asd/Bsd. With these, the winning probability can be approximated (

^{*2}):

1/(sqrt(pi*2*Asd'^2)) * int_x_-inf_inf( e^(-(x-Aev')^2/(2*Asd'^2)) * 0.5*(1+erf(x/sqrt(2))))

Here is a wolfram example to calculate such win probabilities (variable substitution would make it too complex for the free version, so Aev' and Asd' occurrences need to be replaced manually inside square brackets).

Although the absolute position of a distribution doesn't really matter, a very rough guess is that pro level is somewhere around -50 (komi = 7, 1 stone = 2*komi, so 3-5 stones to perfect play). Two players are 1 stone apart if their ev difference is roughly 14 (supposedly 50% winrate with 1 extra stone or with reverse komi).

More interesting is the question of deviation. There is a known problem in translating Elo-like ratings to stones:

EGF win% table predicts that winrate against 1 stone stronger opponents is ~33% at 9k, ~25% at 1d, and only ~20% at 7d levels. Using the above function in reverse hints that at 1d the deviation may be a bit less than 1 stone (<14 points). For stronger levels the deviation decreases - making fewer and smaller errors not only means higher ev, but less absolute variance as well.

These rank-dependent winrate differences are handled by EGF using an extra (deviation-like) variable term. This approach offers a natural explanation, from where A's distribution is shifted and scaled against B's normalised one. For stronger players the relative/scaled position of a one stone (14 points) stronger opponent's distribution is significantly farther (since the deviations are smaller). I think this is the real reason behind those differences observed in practice.

^{*1} This ignores that a deliberate safety move that trades points for consolidation of a winning position is not the same kind of error as points lost on misplaying a local fight for example.

^{*2} In go the actual error values and sums are integer, so something like a binomial distribution would probably be best. But approximating with other distributions like normal or logistic should also be ok, except maybe at near-perfect play (no positive values / side).