"Indefinite improvement" for AlphaZero-like engines

gennan · Post by **gennan** » Fri Apr 17, 2020 11:39 am

moha wrote:In stones there are unlikely more than a few above today's best bots, but strength classes defined as having ~75% winrate above the previous one are harder to tell.

Defining "moha" classes as separated by 75% winrate is more or less the same as saying that these classes are 200 Elo points wide.
But this is not the same as go ranks. The Elo point width of go ranks (in units of handicap stones or multiples of 14 points loss per game) varies along the rank range. Ranks around 10k are about 50 Elo points wide, ranks a round 1d EGF are about 100 points wide, ranks around 7d EGF are about 200 Elo points wide and in the limit of perfect play, rank width approaches infinity Elo points.

BTW, these widths are not the widths as predicted by the EGF system, but the actual widths from historical data in the EGF database.

moha · Post by **moha** » Fri Apr 17, 2020 12:32 pm

gennan wrote:Defining "moha" classes as separated by 75% winrate is more or less the same as saying that these classes are 200 Elo points wide.

Yes, or normals ~1 sd apart.

ranks around 7d EGF are about 200 Elo points wide and in the limit of perfect play, rank width approaches infinity Elo points.

This I find hard to believe. As mentioned I think 1 stone (14 pts) cannot worth more than 28 classes, thus quite finite Elo-wise.

Consider the last class difference. A player getting 25% against perfect play is something like dropping a point in every other game (thus 0%*50% + 50%*50%, either as draws with correct komi or W wins with fractional komi). This already means more than half point strength difference, and it seems reasonable to assume that preceding classes only get wider pointwise.

gennan · Post by **gennan** » Fri Apr 17, 2020 2:27 pm

moha wrote:
gennan wrote: .. in the limit of perfect play, rank width approaches infinity Elo points.
This I find hard to believe. As mentioned I think 1 stone (14 pts) cannot worth more than 28 classes, thus quite finite Elo-wise.

Consider the last class difference. A player getting 25% against perfect play is something like dropping a point in every other game (thus 0%*50% + 50%*50%, either as draws with correct komi or W wins with fractional komi). This already means more than half point strength difference, and it seems reasonable to assume that preceding classes only get wider pointwise.

Perfect komi between two perfect players would be an integer (probably 6 or 7, possibly depending on the rule set) and they would always get a jigo.

But you are right. I was thinking about a game with fractional scores where non-perfect player always makes some mistake, however small. But with integer scores and perfect komi, some games between a near perfect player and a perfect player would end up as a jigo (there is a non-zero probability that the near perfect player plays a perfect game), so the perfect player would not get an infinite Elo rating. The perfect player's Elo rating would be very high, but finite.

But I don't understand why 1 point komi cannot exceed 400 Elo points.

moha · Post by **moha** » Fri Apr 17, 2020 2:42 pm

gennan wrote:But I don't understand why 1 point komi cannot exceed 400 Elo points (1 point = 2 classes and 1 class = 200 Elo), or vice versa, why 400 Elo points is always less difference than 1 point komi.

In the above example the player 1 class below perfection is half point from it (drops slightly more than 0.5 pts per total games on average), and I assumed that preceding classes mean more and more total error difference (as we know from practical 1d-9d range).

gennan · Post by **gennan** » Fri Apr 17, 2020 3:24 pm

OK, I think I understand now.

In my own words:

Let's take a near perfect player who on average loses 1 point in 2 games (1 class below a perfect player).

To maintain a rating of 200 Elo below the perfect player, he would have to score 25% against the perfect player. To do that, he has to play prefectly in 50% of his games (alternating between jigo and losing gives a 25% score). With his average of losing 1 point in 2 games, he would be alternating between jigo and losing by 1 point. So this is perfectly consistent.
If the perfect player gives the near perfect player a handicap of 0.5 points, the score would become 50% (alternating wins and losses). Again, this is consistent with a perfect handicap between these players

Very interesting stuff!

From an analysis that I made from the EGD historical data, I estimate that the value of 1 point komi is as low as 1 Elo point around 50k. I thought there was no maximum Elo value for 1 point komi, but the above is convincing me that the maximum value of 1 point komi really is 400 Elo. So the gap between a perfect player and 1 rank below (14 points) should be less than 5,600 Elo.

LeelaZero currently has a selfplay rating of 16,000 Elo (where 0 is random play). If we estimate that LZ is about 1 rank (14 points) below perfect play, the perfect player's rating would not exceed 21,600 Elo.

lightvector · Post by **lightvector** » Fri Apr 17, 2020 3:55 pm

I'm still doubtful.

moha - if I understand correctly, you argue that the number of Elo points to perfect play can be very approximately bounded by 400 Elo * the number of points difference to perfect play, coming from the fact that 1 point is the minimum discrete unit that can be lost/gained.

This would seem to imply that there should not be more than 800 Elo points range in Chess. since there are only two units difference total in Chess. Define the "score difference" of a chess game as 1, 0, or -1 points depending on white wins / draw / black wins. But in actuality, we know that there are more than 800 Elo points range in Chess, even if we exclude very bad and very pathological players. Start with some decent 1400 Elo casual amateur player and we see there are still something like 1400 more Elo points to the top humans, and many many more Elo beyond that as you push into levels only achieved by bots. So the argument can't be correct.

I think this is because as players get better, even though any mistake objectively can cost you only a whole discrete unit or not at all, there are worlds of complexity and gradations hidden within the fight for that discrete unit, such that advantages and disadvantages and gains and losses from mistakes can happen at levels much finer than one whole unit for practical players, even if an omniscient objective view would not care until they finally "accumulated" to actually a whole unit of effect.

So the big flaw in the model is to treat mistakes as also being discretized to units, such as having a probability of losing a point on each move or not. In practice there exist "mistakes" that even if objectively they lose nothing, each successive one makes it increasingly "easier" to actually then make a move that does objectively lose later, unless the opponent's own mistakes cancels them out, and depending on the distributions of these "sub-unit" mistakes, this means you can get much more than a 75% winrate difference within the fight for that unit.

Now of course, there does seem to be some suggestiveness that Chess Elo range is finite - top level games have increasingly high draw percentages, particularly in the TCEC matches. But 400 Elo per "unit" is clearly too tight of a bound - exactly what the bound is should also depend on how much "subunit" gradation there is for practical real players and the mechanics of how it change or build up or not, which seems very hard to objectively pin down.

Maybe if players got within a point or two of optimal in Go it would turn out that there's no "sub-point" complexity in Go, but even only in the endgame you have things like fractional miai values, all those messy CGT problems whose only result is tedomari to round the score in your favor, and stuff like that - where one can make mistakes that are *much less* than one point - which makes me doubtful that this is true. And that's only the endgame.

So I would strongly suspect that the Elo range in Go is still finite (and not in some silly way like being exponentially large, but actually something not too unreasonable), but also that the last few points could quite easily be much more than 400 Elo per point.

moha · Post by **moha** » Fri Apr 17, 2020 4:07 pm

gennan wrote:LeelaZero currently has a selfplay rating of 16,000 Elo (where 0 is random play). If we estimate that LZ is about 1 rank (14 points) below perfect play, the perfect player's rating would not exceed 21,600 Elo.

LZ's rating could keep climbing long after it stopped improving (it does not measure real, reproducible strength difference). Also one should be careful with half point komi - that is just a hack and not necessarily as consistent as integer komi.

sorin · Post by **sorin** » Fri Apr 17, 2020 4:33 pm

Uberdude wrote:As a player gets stronger they become more consistent so a particular winning percentage in games becomes less and less points / handicap stones difference.
So maybe AG0.1 beats AG0 100-0 and is 2 points stronger aka 1/7 of a handicap stone.
AG0.2 beats AG0.1 100-0 and is 1.5 points stronger.
AG0.3 beats AG0.2 100-0 and is 1.2 points stronger.
AG0.4 beats AG0.3 100-0 and is 1.0 points stronger.
AG0.5 beats AG0.4 100-0 and is 0.8 points stronger.

Depending how these point margins decrease they may have a finite or infinite sum, and even if finite might not reach 20 points or however far we currently are from perfect play in the required timeframe.

And it's all kind of hand-wavy with lots of dubious assumptions: if A is x points stronger than B and B y stronger than C it does not necessarily follow A is x+y stronger than C (I forget what this operator property is called, transitive?). And what does being 0.2 points stronger mean in a game which is ultimately scored in integers?

I see, I understand the infinite series summing to a finite value.
I think I was focusing too much on points/handicap difference, and forgot that the difference in strength is on a "continuous scale", we just measure it in points/handicap.
I am still bothered by the "100-0" part, maybe because I am thinking too much in terms of human players

To me, that sounds more like a bug in the learning process (something like, say, an exploitable ladder blindness, or AlphaGo's bug that Lee Sedol found and punished in game 4, to materialize itself so decisively as zero chance to win).

moha · Post by **moha** » Fri Apr 17, 2020 6:45 pm

lightvector wrote:the number of Elo points to perfect play can be very approximately bounded by 400 Elo * the number of points difference to perfect play, coming from the fact that 1 point is the minimum discrete unit that can be lost/gained.

I wouldn't make this generalization. For one thing, playing to win by the most points (against large komi or handi) is not completely the same skill as playing to win the most games (highest wr on normal komi). I also didn't want to emphasize the "unit" concept, only used the 1 point as an example player at class -1. But I'm not sure now.

This would seem to imply that there should not be more than 800 Elo points range in Chess. since there are only two units difference total in Chess. Define the "score difference" of a chess game as 1, 0, or -1 points depending on white wins / draw / black wins.

In this case, assuming chess is draw by perfect play, achieving 25% against it means drawing one game and losing one on average. This drops 1 point per 2 games, so 0.5 per game - but receiving 0.5 komi (ouch, I don't take this seriously myself

) .. Edit: see below.

Bill Spight · Post by **Bill Spight** » Fri Apr 17, 2020 9:10 pm

Uberdude wrote:As a player gets stronger they become more consistent so a particular winning percentage in games becomes less and less points / handicap stones difference.
So maybe AG0.1 beats AG0 100-0 and is 2 points stronger aka 1/7 of a handicap stone.
AG0.2 beats AG0.1 100-0 and is 1.5 points stronger.
AG0.3 beats AG0.2 100-0 and is 1.2 points stronger.
AG0.4 beats AG0.3 100-0 and is 1.0 points stronger.
AG0.5 beats AG0.4 100-0 and is 0.8 points stronger.

I think that this scenario is impossible. Given the granularity of area scores, where the just noticeable difference is 2 pts., with some exceptions, any program that is less than 2 pts. stronger than its opponent will not be able to win all of the time. I have addressed this question with bots that play incomparable strategies. But, upon reflection, even if the strategy of one bot dominates the strategy of the other, if it does so by less than 2 pts., on average, then it must lose some games to the other bot, given a non-integer komi.

Edit: Conceivably, the best player might be able to force a seki with an odd number of dame when necessary to win by ½ pt. instead of losing by ½ pt., but I strongly doubt it.

moha · Post by **moha** » Fri Apr 17, 2020 10:36 pm

moha wrote:
lightvector wrote:This would seem to imply that there should not be more than 800 Elo points range in Chess. since there are only two units difference total in Chess. Define the "score difference" of a chess game as 1, 0, or -1 points depending on white wins / draw / black wins.
In this case, assuming chess is draw by perfect play, achieving 25% against it means drawing one game and losing one on average. This drops 1 point per 2 games, so 0.5 per game - but receiving 0.5 komi (ouch

Hm. I think I see how to score that now. Also both in your distorted chess example and in go, being at class -1 is a very strong player, close to perfect play. And these example players DID use up half point komi range to cover their one class distance.

The problem with your example seems not with this, but with the other assumption - that the second class step (achieving 25% against the first one) necessarily uses up more pointwise distance than the first. This is clearly false in your distorted chess: getting 2*0.5=1 point komi covers much more classes - that is enough for the worst player to draw with perfect play. But this assumption may still be true in go, there are reasons to think so (it is true lower in the range, even though there may be exceptions near these class-1 levels, who knows).

Your example seem to have fallen into the trap I mentioned earlier about half point komis: even though you declare a linear relationship between scores -1 .. 0 .. +1, this is simply not true. Turning draws to wins (what half point does) is not the same or half amount of utility as turning ALL losses into draws (what a whole point does - but only in your example, much less so in go.)

gennan · Post by **gennan** » Sat Apr 18, 2020 3:11 am

lightvector wrote:I'm still doubtful.

moha - if I understand correctly, you argue that the number of Elo points to perfect play can be very approximately bounded by 400 Elo * the number of points difference to perfect play, coming from the fact that 1 point is the minimum discrete unit that can be lost/gained.

This would seem to imply that there should not be more than 800 Elo points range in Chess. since there are only two units difference total in Chess. Define the "score difference" of a chess game as 1, 0, or -1 points depending on white wins / draw / black wins. But in actuality, we know that there are more than 800 Elo points range in Chess, even if we exclude very bad and very pathological players. Start with some decent 1400 Elo casual amateur player and we see there are still something like 1400 more Elo points to the top humans, and many many more Elo beyond that as you push into levels only achieved by bots. So the argument can't be correct.

Winning/losing the game may not be the smallest discrete error that can be made in chess. Errors may be smaller, such as losing a pawn (often considered a basic unit in chess), or losing the right to castle or exchanging a bishop for a knight, wich may be a only a fraction of this pawn unit. So I think chess is not really suitable for this approach of considering a smallest discrete error size.
But you could still use an error approach in chess and in go by ignoring the size of the error and only counting the number of mistakes that potentially affect the game result. A near perfect player (1 "class" below perfect play) would make 1 such mistake in 2 games and draw in the other game against a perfect player, so he would score 25% against a perfect player and maintain a rating 200 Elo below the perfect player. The size of the game losing error doesn't matter, nor if the error has some discrete fundamental unit. For this near perfect player, only the frequency matters.

lightvector wrote:I think this is because as players get better, even though any mistake objectively can cost you only a whole discrete unit or not at all, there are worlds of complexity and gradations hidden within the fight for that discrete unit, such that advantages and disadvantages and gains and losses from mistakes can happen at levels much finer than one whole unit for practical players, even if an omniscient objective view would not care until they finally "accumulated" to actually a whole unit of effect.

So the big flaw in the model is to treat mistakes as also being discretized to units, such as having a probability of losing a point on each move or not. In practice there exist "mistakes" that even if objectively they lose nothing, each successive one makes it increasingly "easier" to actually then make a move that does objectively lose later, unless the opponent's own mistakes cancels them out, and depending on the distributions of these "sub-unit" mistakes, this means you can get much more than a 75% winrate difference within the fight for that unit.

Now of course, there does seem to be some suggestiveness that Chess Elo range is finite - top level games have increasingly high draw percentages, particularly in the TCEC matches. But 400 Elo per "unit" is clearly too tight of a bound - exactly what the bound is should also depend on how much "subunit" gradation there is for practical real players and the mechanics of how it change or build up or not, which seems very hard to objectively pin down.

Using this error frequency approach, I think 400 Elo per unit is not too tight. Perhaps top chess engines are fairly close to making only 1 potentially game losing error per game (400 Elo below perfect play). Casual chess players probably make many potentially game losing errors per game. As a 1st order approximation, taking 4000 Elo as a top engine rating and 1200 Elo as a casual player rating, the casual player would be separated from the top engines by 14 "classes", so the frequency of potentially game losing errors might be 7 times higher than once per game (i.e. about 7 times per game).

lightvector wrote: Maybe if players got within a point or two of optimal in Go it would turn out that there's no "sub-point" complexity in Go, but even only in the endgame you have things like fractional miai values, all those messy CGT problems whose only result is tedomari to round the score in your favor, and stuff like that - where one can make mistakes that are *much less* than one point - which makes me doubtful that this is true. And that's only the endgame.

So I would strongly suspect that the Elo range in Go is still finite (and not in some silly way like being exponentially large, but actually something not too unreasonable), but also that the last few points could quite easily be much more than 400 Elo per point.

Using the error frequency approach still leaves the question to determine what qualifies as a potentially game losing error. Perhaps in chess and go we could use this 1st approximation: count the number of times a player makes a mistake that potentially changes the game result (like failing to convert a decisive advantage to a win, failing to hold a draw or throwing away a game where you had a decisive advantage).

For go, there probably exists a correlation between the frequency of potentially game losing moves and average error size per move. This would allow an approximate conversion between the error size distribution (in units of komi points or 14 point ranks or handicap stones) and the error frequency (assuming 400 Elo points for each error frequency increment per game).

Such a conversion may also increase the accuracy of measuring the quality of play from a small number of games, because the signal to noise ratio of frequency would be small for low frequencies. Also, when the opponents are badly matched (even game with large skill gap), the frequency approach would be unreliable, because chances are that the stronger player quickly builds up such a big lead that none of the mistakes is large enough to potentially swing the result, resulting in underestimating the error frequency and thus overestimating the quality of play. Perhaps we should use a less conservative measure of what counts as a potentially game swinging error (e.g. also including errors with a size above some threshold, like a 14 point mistake in go or hanging a knight in chess, even when it doesn't affect the game result).

A correlation between error frequency (assuming this is represented by Elo rating) and error size (average centipawn loss per move) also seems to exist in chess. See average centipawn loss per move vs Elo rating: https://chess-db.com/public/research/qualityofplay.html.

lightvector · Post by **lightvector** » Sat Apr 18, 2020 6:46 am

It's a little confusing to see people try to argue that my translation of (a naive version of) moha's argument to Chess gives a flawed argument... when demonstrating that the argument is flawed the entire point of my post.

One cannot purely mechanically tell from any apparent discretization in the rules to what extent there exists a notion of "mistake" that is less than that discretization, and that can accumulate or not in practice. And neither can one reason only about mistakes only in the sense of mistakes that affect the idealized game-theoretic value of the position and the frequency with which they occur per move. Things like "losing a pawn" or even more minor positional mistakes can be almost invisible to the "did-the-game-theoretic-value-change" notion of mistake, yet they matter for practical players and can accumulate over time to add up to a game-theoretic mistake, and whether and to what degree in practice mistakes exist that don't change the game theoretic value of a game seems to matter immensely for how widely that game spans on the Elo scale.

So yes, my point is that a-priori one can't rule out the same thing happening to Go. It could for example be that there are 3000 Elo points (15 "classes") between essentially-perfect play and a player that almost never does more than 1 point worse than perfect, if it turns out that just like "pawn advantage" is an emergent notion of advantage in Chess on which one can accumulate or lose gradually before finally enough of it builds up to switch the game-theoretic outcome of the game, perhaps in Go for such near-perfect players could be very notions of advantage/disadvantage/mistakes that emerge in Go for such players far finer than granularity of 1 point that accumluate to determine who finally gets that point. Or maybe not - it's hard to say. Maybe 7x7 and 9x9 and 11x11 could be ways to study this.

But anyways, that means that it's way too early and very shaky to say things like this:

As mentioned I think 1 stone (14 pts) cannot worth more than 28 classes, thus quite finite Elo-wise.

moha · Post by **moha** » Sat Apr 18, 2020 8:39 am

lightvector wrote:It's a little confusing to see people try to argue that my translation of (a naive version of) moha's argument to Chess gives a flawed argument... when demonstrating that the argument is flawed the entire point of my post.

I don't think your example demonstrates this as long as it has a different flaw. You cannot randomly assign numeric values to things (-1 = chess loss, 0 = chess draw, 1 = chess win) and expect that these will magically start to honor basic math about distances, addition and multiplication (0.5+0.5 = 1-0 = 0 - -1), such needs specific conditions met to work.

Also there is no real 0.5 pt komi in a game that is scored in integer units. You can treat draws as B wins or W wins, but this is not the same (not even in go but it's closer there). You may try 0.5 komi as alternating between 0 and 1 komi - and again your example doesn't seem work.

So yes, my point is that a-priori one can't rule out the same thing happening to Go. It could for example be that there are 3000 Elo points (15 "classes") between essentially-perfect play and a player that almost never does more than 1 point worse than perfect, if it turns out that just like "pawn advantage" is an emergent notion of advantage in Chess on which one can accumulate or lose gradually before finally enough of it builds up to switch the game-theoretic outcome of the game,

Sure, mistakes can occur in subpoint level, but in go we DO have a numeric point scoring system (regardless its name) where basic math is supposed to work to an extent. Whichever arrangement of those wr-classes you hypothetise near/below perfect play, you will need to find a pattern of wins/losses - even in go's discrete endpoint-scoring terms! - that honors both the chain of 25% wins, the 50% wins with the correct integer komi over a chain or subchain, the near-linear additivity of komi/handicap for subchains (as experienced in practice in go) AND achieve this without any rock-paper-scissor like effect, ie. with players showing the same performance against any opponent (which OC is not completely true in reality, but the model needs to work under these assumptions as well).

In go there is also the observation/assumption that classes have monoton widening point range downwards, which we don't know yet if true over the whole range or just below 9d - but clearly not in your example.

lightvector · Post by **lightvector** » Sat Apr 18, 2020 9:23 am

Okay let's be more explicit.

We know from miai-counting theory that even without fully getting into CGT-levels of complexity, we can have locally fractional move values, and that these fractional values accumulate in a way that, along with tedomari for the last point, the final score "rounds" to an integer at the end. You can have moves that differ in value by 1/4 of a point, such that if you make 4 such errors cumulatively you lose precisely 1 point, and if you make fewer than 4 such errors you may or may not lose a whole point depending on how close the "current" value your position was to the edge and how tedomari plays out. Importantly, it *doesn't* behave like independent random noise - if you make 4 such mistakes, you end up precisely 1 point worse 100% of the time, rather than it behaving like 4 separate independent random losses of 1 point with 1/4 probability each. If you make fewer than 4, it can depend on where the position was initially balanced, perhaps you are close enough to the edge that just 1 such mistake loses you a whole point, unless your opponent cancels it out with a corresponding mistake.

In the midgame (and even the macroendgame) positions are too inter-correlated to decompose them into independent local parts, but just like in the endgame, it seems highly plausible to me that in the midgame and opening there is is also some emergent notion of moves being able to differ in value "fractionally" in points, or "on average", but with those fractional values being globally additive in the same way as above.

Suppose it happens to be that Go with with a certain ruleset and 6.5 komi happens to be balanced at almost knife-edge with respect to some emergent notion of "fractional" averaged points like this, and with the property that even a slight deviation from optimal by this notion of fractional points will cause you to fully round to winning by 0.5 versus losing by 0.5. Then you could plausibly observe that player A wins 90% of the time against B by 0.5 points if A on average makes 0.1 points fewer worth of mistakes with some slight variability, and the game "rounds" in their favor to a whole 0.5 points. And similarly B wins 90% of the time against C, and C against D, D against E. Now if you compare A versus E, A would win almost all the time, but still almost always by 0.5 points.

This is of course an overly simplistic model, but it's seems hard to be sure that something broadly like it won't happen. You can't rule it out by observing approximately linear additivity at the current levels of play on 19x19, because current play is too low-level and noisy - with current levels of play we'd expect approximate linear additivity even with the above model too, which would gradually break as you get to within a few points of optimal. If you think about it, linear additivity is precisely one of the weaker assumptions you'd expect to have some chance of not holding once the discretization of results becomes so large compared to the level of play.

Life In 19x19

"Indefinite improvement" for AlphaZero-like engines

Re: "Indefinite improvement" for AlphaZero-like engines

Re: "Indefinite improvement" for AlphaZero-like engines

Re: "Indefinite improvement" for AlphaZero-like engines

Re: "Indefinite improvement" for AlphaZero-like engines

Re: "Indefinite improvement" for AlphaZero-like engines

Re: "Indefinite improvement" for AlphaZero-like engines

Re: "Indefinite improvement" for AlphaZero-like engines

Re: "Indefinite improvement" for AlphaZero-like engines

Re: "Indefinite improvement" for AlphaZero-like engines

Re: "Indefinite improvement" for AlphaZero-like engines

Re: "Indefinite improvement" for AlphaZero-like engines

Re: "Indefinite improvement" for AlphaZero-like engines

Re: "Indefinite improvement" for AlphaZero-like engines

Re: "Indefinite improvement" for AlphaZero-like engines

Re: "Indefinite improvement" for AlphaZero-like engines