"Indefinite improvement" for AlphaZero-like engines
-
moha
- Lives in gote
- Posts: 311
- Joined: Wed May 31, 2017 6:49 am
- Rank: 2d
- GD Posts: 0
- Been thanked: 45 times
Re: "Indefinite improvement" for AlphaZero-like engines
Hm, maybe it's not just about rounding afterall, but the prob mass of draws (achievable with perfect komi) also matters? So the hypothesis about class bounds is true for integer komi, but for half point komi it depends on where the initial subpoint balance (error margins before next worse rounding) is?
The particular worst case seems when initial fractionals extremely favor B, it is very hard to draw with W (0.001-0.999 balance). So basically only perfect play can draw with W vs perfect play. This doesn't break the Elo bound with perfect integer komi because now B can draw easily, even if relatively weaker.
But if you add half point komi in this scenario (or the rule that W wins ties), making the game a theoretical W win, this only affects perfect play (vs perfect play), and pushes him several subpoint classes ahead [-1], for example. This is because the other 0.999 balance mass is now lost, teared out of the prob space. It doesn't help weaker players, has no effect as drawing is useless with B.
Or am I hallucinating? Are those subpoint classes real or imaginary? They perform increasingly better against perfect play, catching more and more of those (now) very hard W draws. But maybe this only manifests vs perfect play (local chain again)? Can they also perform increasingly more classes better against weaker opponents as well?
Edit: Maybe what happens here is the hard-to-achieve W draw forms a new very small "smallest scoring unit" (without the B 0.999 part of a point)?
The particular worst case seems when initial fractionals extremely favor B, it is very hard to draw with W (0.001-0.999 balance). So basically only perfect play can draw with W vs perfect play. This doesn't break the Elo bound with perfect integer komi because now B can draw easily, even if relatively weaker.
But if you add half point komi in this scenario (or the rule that W wins ties), making the game a theoretical W win, this only affects perfect play (vs perfect play), and pushes him several subpoint classes ahead [-1], for example. This is because the other 0.999 balance mass is now lost, teared out of the prob space. It doesn't help weaker players, has no effect as drawing is useless with B.
Or am I hallucinating? Are those subpoint classes real or imaginary? They perform increasingly better against perfect play, catching more and more of those (now) very hard W draws. But maybe this only manifests vs perfect play (local chain again)? Can they also perform increasingly more classes better against weaker opponents as well?
Edit: Maybe what happens here is the hard-to-achieve W draw forms a new very small "smallest scoring unit" (without the B 0.999 part of a point)?
Last edited by moha on Thu Apr 23, 2020 1:05 pm, edited 1 time in total.
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: "Indefinite improvement" for AlphaZero-like engines
I remember being surprised in my undergraduate seminar on psychological research that, when comparing two treatments or conditions, you ignore results that show no difference. I was accustomed to thinking of rewarding a draw as ½ pt. On the basis of that reasoning, which I think is sound, rating systems should ignore draws. For chilled go a 0 score is a win for the player who got the last play.lightvector wrote:Sure, just focus on C then, ignore A and B.moha wrote:Thanks! I'm not sure how you meant your last comment? It seems variants A and B collapse immediately as it is now possible to beat perfect play, even for nonperfect players (and the game is NOT always draw even between perfect players). Variant C remains.lightvector wrote:The game is exactly the same except now the grid is on all the integers (...-2,-1,0,1,2,...) and the game starts at 0.499999 (so with perfect play with both players always flipping zeros on cards, the game is a draw).
Oh, I think I see what you're getting at, and why you've been insistent on adding draws. Yes, you're right about this objection. I hadn't considered enough the difference between draws and drawless games, thanks for pushing on this detail.moha wrote:But can they demonstrate their class advantage/differences over each other against the -1 pt player?
But back to my main point. Really, draws should not count for Elo ratings.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: "Indefinite improvement" for AlphaZero-like engines
On the question of half point komi, let me take the opportunity to plug button go, where a player can take the button, worth ½ pt. by area scoring. The main effect of the button is that it does not matter who gets the last dame.moha wrote:But there IS actual, meaningful rounding in go. As I wrote earlier, there is no real half point komi in integer games. Chinese with 7.5 komi is actually komi 7 with W winning ties. What happens here is that we play the game, THEN the score gets rounded (with ties retaining their prob mass), THEN we add the komi (which is integer), THEN we decide to treat final draws as W wins. Order matters.lightvector wrote:maybe by some chance Go with half-integer komi could have some partial element of this - or not, it's hard to tell
As for rounding, I am not sure what moha means. The only rounding I am aware of in go is what David Wolfe explained to me long ago, that a fractional score in chilled go gets rounded up or down to a territory integer score, depending on who has the move. (Ignoring ko complications, OC.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
moha
- Lives in gote
- Posts: 311
- Joined: Wed May 31, 2017 6:49 am
- Rank: 2d
- GD Posts: 0
- Been thanked: 45 times
Re: "Indefinite improvement" for AlphaZero-like engines
Rounding came up in different contexts, the general sense that since board results are integer, subpoint mistakes will disappear - one way or another. So the resolution of performance measurement in a game is a whole point ("smallest scoring unit").
But if draws are not treated as draws, there may be two new smaller granulated performance units (draw with W and draw with B - these are, at least vs perfect play, narrower than a point), and these may affect class bounds (which are related to the smallest unit).
But if draws are not treated as draws, there may be two new smaller granulated performance units (draw with W and draw with B - these are, at least vs perfect play, narrower than a point), and these may affect class bounds (which are related to the smallest unit).
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: "Indefinite improvement" for AlphaZero-like engines
OK, thanks.moha wrote:Rounding came up in different contexts, the general sense that since board results are integer, subpoint mistakes will disappear - one way or another. So the resolution of performance measurement in a game is a whole point ("smallest scoring unit").
BTW, that's one reason that I like button go, because, like chilling, it takes account of such tiny errors. And why I suggested chilled go for nearly perfect play. It seems to me that, since correct play in chilled go is also correct play in territory go and also in area go, a perfect player should be able to play a perfect game of chilled go.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
moha
- Lives in gote
- Posts: 311
- Joined: Wed May 31, 2017 6:49 am
- Rank: 2d
- GD Posts: 0
- Been thanked: 45 times
Re: "Indefinite improvement" for AlphaZero-like engines
What did I misunderstand then? I mean here:Bill Spight wrote:since correct play in chilled go is also correct play in territory go and also in area go
moha wrote:If I understood Bill correctly a chilled score of 6.8 could be seen as better than 6.7 (and it actually is if we stop chilled), but two chilled scores of 6.6 are the same. But since the rounding direction will matter for territory (an insanely lot at these levels), chilled 6.6 with W to move is different to chilled 6.6 with B to move - exactly what CGT wanted to avoid
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: "Indefinite improvement" for AlphaZero-like engines
I don't know if you misunderstood anything, except that I was not intending to distinguish between identical chilled scores depending upon who had the move, except for who gets the last play in case of a 0 result after adjusting for komi. Or not, if you want to have ties.moha wrote:What did I misunderstand then? I mean here:Bill Spight wrote:since correct play in chilled go is also correct play in territory go and also in area gomoha wrote:If I understood Bill correctly a chilled score of 6.8 could be seen as better than 6.7 (and it actually is if we stop chilled), but two chilled scores of 6.6 are the same. But since the rounding direction will matter for territory (an insanely lot at these levels), chilled 6.6 with W to move is different to chilled 6.6 with B to move - exactly what CGT wanted to avoid
Anyway, if the chilled komi in the 19x19 is 7, and the board score is 7, that means that the territory board score is also 7, which normally means that there are an even number of dame, and so White got the last play and White wins. It is theoretically possible that a perfect player might be able to engineer a seki such that Black would win, as lightvector was speculating at one point, I think.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
EricBackus
- Dies with sente
- Posts: 83
- Joined: Sun May 09, 2010 10:28 pm
- Rank: 2 kyu
- GD Posts: 109
- Universal go server handle: EricBackus
- Has thanked: 4 times
- Been thanked: 29 times
Re: "Indefinite improvement" for AlphaZero-like engines
I'm getting off track from the main discussion, but I don't understand these remarks.Bill Spight wrote:On the basis of that reasoning, which I think is sound, rating systems should ignore draws. ... Really, draws should not count for Elo ratings.
If all you want to know is which of two players is better than the other, clearly draws can be ignored. But if you want some understanding of how far apart in ability two players are, it seems like draws provide some information that would be better used than ignored.
For example, if two players play 100 games, and get a draw on 99 of them, the winner of the non-drawn game is more likely the stronger player. But the 99 draws give some indication that these players are relatively close in ability. Compare with two players playing 100 games and one player wins 99 of them, which gives some indication that the winning player is relatively much stronger than the other.
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: "Indefinite improvement" for AlphaZero-like engines
I think that the proper comparison is between equivalent results. Suppose that two players play 100 games, presumably 50 with each player going first, and they draw 99 games and player A wins 1; or the 100 game match ends with one draw, while player A wins 50 games and player B wins 49 games. Does one draw mean that they are more closely matched than 99 draws? Or perhaps it is the other way around?EricBackus wrote:I'm getting off track from the main discussion, but I don't understand these remarks.Bill Spight wrote:On the basis of that reasoning, which I think is sound, rating systems should ignore draws. ... Really, draws should not count for Elo ratings.
If all you want to know is which of two players is better than the other, clearly draws can be ignored. But if you want some understanding of how far apart in ability two players are, it seems like draws provide some information that would be better used than ignored.
For example, if two players play 100 games, and get a draw on 99 of them, the winner of the non-drawn game is more likely the stronger player. But the 99 draws give some indication that these players are relatively close in ability. Compare with two players playing 100 games and one player wins 99 of them, which gives some indication that the winning player is relatively much stronger than the other.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
moha
- Lives in gote
- Posts: 311
- Joined: Wed May 31, 2017 6:49 am
- Rank: 2d
- GD Posts: 0
- Been thanked: 45 times
Re: "Indefinite improvement" for AlphaZero-like engines
I think what matters is how hard it is to draw with either color, and how much harder it is to win than to draw. Cf this with the earlier problems of the distorted chess example (metrics, distances between results, 0.5+0.5<>1?) and the potentially differing reward for B and W draws below.
One problem with ignoring draws is you cannot measure performance vs perfect play (which may exist in practice). Two almost perfect players (like 1-2 pts away from it) would be seen as performing equally poorly - even though in reality they didn't, you just ignored the evidence.
Button go: Unlike the similar drawless/C example which rounds away from 0 so logically doesn't, button go - depending on it's initial error margin balance - can "round" small losses to wins (-0.1 directly to +). So perfect play is beatable even when playing nonperfectly. In this case this rounding size seems to be the limiting factor for Elo (the "smallest unit" - the margin within which you need to be to perfect play for class differences to reduce to/around 1). And consequently, what allows performance to be measured even vs perfect play.
With half point komi / W wins ties, the rounding is done to integer first on the board (so small losses at most rounded to draw), then only in a separate big swing, all draws are (may) treated as W wins. This can matter for metrics/distances reasons like above.
With perfect integer komi, we don't know if the initial error margins favor one side or not. So with W this margin to perfect play (class-1 boundary) may be closer than 0.5 while with B may be farther. But since W draws = B draws, this will in any case average to class-1 = [-0.5]. But if W and B draws are rewarded differently, the smallest unit of distance may become the smaller of the two cases without averaging.
One problem with ignoring draws is you cannot measure performance vs perfect play (which may exist in practice). Two almost perfect players (like 1-2 pts away from it) would be seen as performing equally poorly - even though in reality they didn't, you just ignored the evidence.
Button go: Unlike the similar drawless/C example which rounds away from 0 so logically doesn't, button go - depending on it's initial error margin balance - can "round" small losses to wins (-0.1 directly to +). So perfect play is beatable even when playing nonperfectly. In this case this rounding size seems to be the limiting factor for Elo (the "smallest unit" - the margin within which you need to be to perfect play for class differences to reduce to/around 1). And consequently, what allows performance to be measured even vs perfect play.
With half point komi / W wins ties, the rounding is done to integer first on the board (so small losses at most rounded to draw), then only in a separate big swing, all draws are (may) treated as W wins. This can matter for metrics/distances reasons like above.
With perfect integer komi, we don't know if the initial error margins favor one side or not. So with W this margin to perfect play (class-1 boundary) may be closer than 0.5 while with B may be farther. But since W draws = B draws, this will in any case average to class-1 = [-0.5]. But if W and B draws are rewarded differently, the smallest unit of distance may become the smaller of the two cases without averaging.
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: "Indefinite improvement" for AlphaZero-like engines
Well, you can argue the other way around. Draws hide the difference between players, so it is accounting for them rather than ignoring them that makes you think that they are performing equally poorly.moha wrote:I think what matters is how hard it is to draw with either color, and how much harder it is to win than to draw. Cf this with the earlier problems of the distorted chess example (metrics, distances between results, 0.5+0.5<>1?) and the potentially differing reward for B and W draws below.
One problem with ignoring draws is you cannot measure performance vs perfect play (which may exist in practice). Two almost perfect players (like 1-2 pts away from it) would be seen as performing equally poorly - even though in reality they didn't, you just ignored the evidence.
How? It depends, I suppose, on what you mean by a fractional error. I can define it for chilled go as the difference between the final score and the komi, whatever that is. (It doesn't actually need to be an integer for chilled go, since the chilled go scores are not necessarily integers. They are rational fractions, so you could avoid draws with an irrational komi.Button go: Unlike the similar drawless/C example which rounds away from 0 so logically doesn't, button go - depending on it's initial error margin balance - can "round" small losses to wins (-0.1 directly to +).
There is another kind of rounding between territory scoring and area scoring, where an even territory score is typically "rounded" to the next higher area score for Black, leading to a greater difference between area scores. What the button normally does is to simply add ½ pt. to territory scores, with no rounding at all. Without the button the usual effect of this rounding is to turn a territory score of +6 to +7, which would be a zero score after a 7 pt. komi is subtracted. With the button the score of +6 becomes +6½ which becomes -½ after subtracting the 7 pt. komi. The loss stays a loss. Likewise, a 7 pt. territory score normally becomes +½ after subtracting komi, and stays a win. In effect, the button subtracts ½ pt. from an integer territory komi. (Since the 7½ pt. komi seems to favor White, that might be a good thing, I dunno.)
With all this rounding, what can cause a loss at one level to become a win by yielding a larger difference between scores at the next lower level is ko. The button does not stop that from happening.
Not with the definition of fractional errors in terms of chilled go scores. Maybe with a different definition.So perfect play is beatable even when playing nonperfectly.
So don't round. I.e., use chilled go for potentially a countable infinity of classes of play. If that's what you want, OC.In this case this rounding size seems to be the limiting factor for Elo (the "smallest unit" - the margin within which you need to be to perfect play for class differences to reduce to/around 1).
No comprende. I thought that not rounding accounted better for small differences in play.And consequently, what allows performance to be measured even vs perfect play.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
moha
- Lives in gote
- Posts: 311
- Joined: Wed May 31, 2017 6:49 am
- Rank: 2d
- GD Posts: 0
- Been thanked: 45 times
Re: "Indefinite improvement" for AlphaZero-like engines
I probably guess what you mean but it doesn't seem to apply here: vs perfect play draws (their frequency) are your ONLY source of information.Bill Spight wrote:Draws hide the difference between players, so it is accounting for them rather than ignoring them that makes you think that they are performing equally poorly.
Yes I didn't mean in strict CGT terms. So far it was assumed that there (may) exist minor mistakes or inaccuracies in various sense. I suppose button go (with integer komi) is a theoretical win for B or W, and I didn't see a reason for the winning player's initial advantage (however measured) to completely disappear on the smallest mistake (instead of some margin). By small minus I meant negative performance (from the optimum) not theoretical score - but OC then the rounding is done to 0 (perfect play) not positive. My bad. Rounding scores from negative to 0 on the board, then to positive/win with half point komi may be possible in non-button go though.How? It depends, I suppose, on what you mean by a fractional error.button go - depending on it's initial error margin balance - can "round" small losses to wins (-0.1 directly to +).Not with the definition of fractional errors in terms of chilled go scores. Maybe with a different definition.So perfect play is beatable even when playing nonperfectly.
Button go vs perfect play: what if the optimal play and win involves rounding up a certain chilled score, but the player chooses a line where he rounds up a bit smaller chilled score instead? IIRC it was possible that the button is not the last play.
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: "Indefinite improvement" for AlphaZero-like engines
Do you mean that perfect play never loses? That may be so in chess, but I don't think that it is a general rule.moha wrote:I probably guess what you mean but it doesn't seem to apply here: vs perfect play draws (their frequency) are your ONLY source of information.Bill Spight wrote:Draws hide the difference between players, so it is accounting for them rather than ignoring them that makes you think that they are performing equally poorly.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: "Indefinite improvement" for AlphaZero-like engines
Yes, with certain kos taking the button will not be the last play. Kos can invalidate rounding, button or no.moha wrote:Button go vs perfect play: what if the optimal play and win involves rounding up a certain chilled score, but the player chooses a line where he rounds up a bit smaller chilled score instead? IIRC it was possible that the button is not the last play.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
moha
- Lives in gote
- Posts: 311
- Joined: Wed May 31, 2017 6:49 am
- Rank: 2d
- GD Posts: 0
- Been thanked: 45 times
Re: "Indefinite improvement" for AlphaZero-like engines
This came back to me half-asleep this morning, and some of my remaining doubts were resolved.
It seems best to consider games with W and B as two different games played alternately, and always use the weaker player's view. If performances would be on continuous scale, normals like usual, their difference is another normal with negative mean now. Sd usually relatively small (reason), except when BOTH players are weak. Here are some typical cases (performance differences, uniform y scale for simplicity).
Now either we use integer komi, or W wins draws (0.5 komi). And either the game is roughly balanced, same difficulty for both players (0.5-0.5 pts initial error margin balance) or is significantly harder to play / draw for a side (like 0.1-0.9 pts balance, ie. one of W or B can only afford 0.1 pts worth of subpoint mistakes before letting his draw slip, the other has 0.9). The raw theoretical result is always draw (komi).
Without discrete units these performance differences could be getting arbitrarily narrow. But the easiest point earning result puts a marker (the minimum needed for not losing) on the X-axis on the above plot. Anything to its right is treated as equally good (rounded up). So (loosely speaking) once the distribution has it's peak near the marker (and we are close enough to perfect play, the ultimate opponent), we are good enough to score enough in half cases with this color and not to be outclassed by anybody anymore.
The marker is at the relevant half of the initial error margin balance (which in turn sums up to the smallest scoring unit). It is normally negative (tiny errors are allowed and still draw) with the two colors averaging to -0.5 (the 2 classes per point). But if draws are treated losses (thus even perfect play "lose" some to nonperfect play) the marker is positive for a color, which case can practically be missing (unreachable vs significantly better opponent). So we can end up with the remaining case, either the smaller or larger half point of the draw balance (more or less than 2 classes per point) depending on which side draws are awarded to.
It seems best to consider games with W and B as two different games played alternately, and always use the weaker player's view. If performances would be on continuous scale, normals like usual, their difference is another normal with negative mean now. Sd usually relatively small (reason), except when BOTH players are weak. Here are some typical cases (performance differences, uniform y scale for simplicity).
Now either we use integer komi, or W wins draws (0.5 komi). And either the game is roughly balanced, same difficulty for both players (0.5-0.5 pts initial error margin balance) or is significantly harder to play / draw for a side (like 0.1-0.9 pts balance, ie. one of W or B can only afford 0.1 pts worth of subpoint mistakes before letting his draw slip, the other has 0.9). The raw theoretical result is always draw (komi).
Without discrete units these performance differences could be getting arbitrarily narrow. But the easiest point earning result puts a marker (the minimum needed for not losing) on the X-axis on the above plot. Anything to its right is treated as equally good (rounded up). So (loosely speaking) once the distribution has it's peak near the marker (and we are close enough to perfect play, the ultimate opponent), we are good enough to score enough in half cases with this color and not to be outclassed by anybody anymore.
The marker is at the relevant half of the initial error margin balance (which in turn sums up to the smallest scoring unit). It is normally negative (tiny errors are allowed and still draw) with the two colors averaging to -0.5 (the 2 classes per point). But if draws are treated losses (thus even perfect play "lose" some to nonperfect play) the marker is positive for a color, which case can practically be missing (unreachable vs significantly better opponent). So we can end up with the remaining case, either the smaller or larger half point of the draw balance (more or less than 2 classes per point) depending on which side draws are awarded to.