moha wrote:
Bill Spight wrote:
Drift is still possible, as is the accumulation of deleterious changes in the successive winners. Skill can be lost which an earlier winner might have, which would not be enough for that winner to beat the current winner, unless the current winner needs to show sufficient superiority (as in giving a handicap).
The only difference I see to the simple common case when a net is continuously trained on existing data is a potential negative feedback loop through the selfplay games (generated by the current partially trained net).
I don't want to strain a metaphor too far, but Uberdude's post exemplifies the potential problem which might mean that LeelaZero is making less progress than it appears to be making. Different players have different weaknesses, and it is possible for successive winners to cycle between different strengths and weaknesses without making overall progress. I don't mean that the cycle is only three winners long, but the accumulation of small errors in exchange for small advantages elsewhere can produce the effect. Both randomness and multiple skills make this phenomenon possible.
Quote:
But apparently such didn't happen with AlphaZero (without promotion matches), at least not to an extent to make real problems.
In hill-climbing this kind of phenomenon tends to happen near the top of a hill. Perhaps we have not seen it with AlphaGo Zero because it is not near the hilltop for go.
However, I suspect that it did happen with AlphaZero (chess), which is why they played against a hobbled version of Stockfish. Considering the rapid initial progress of AlphaZero, reaching top level play in only a few hours, why did they not run it for a few days more and take on the best, including an opening book and endgame table bases? My guess is that AlphaZero stalled out. That does not minimize their accomplishment, nor does it alter the fact that the way AlphaZero plays chess is more human like than the play of other chess engines. But stalling out is not so good from a PR standpoint. {shrug}
Quote:
Quote:
While a single player's variation in overall skill may be roughly normal (bell shaped), that is not the shape of the presumed "fitness landscape" for advancing players. Both Elo (and I, when I set up a ratings system years ago for New Mexico) assumed a kind of power law shape, which is decidedly not bell shaped.
Could you elaborate on "fitness landscape for advancing players" and it's role in the A>B>C case?
Consider the case of pool (pocket billiards). One test of skill in straight pool, where you can shoot the ball you pick and call the shot, is the average length of a run, how many balls, on average, that you can sink in a row. If the probability of sinking each ball is constant (not true, but perhaps approximately so), then in a sense the gain in skill for a poor player with an average run of 1 to increase it to 2 is approximately the same as for a much better player with an average run of 50 to increase it to 51. OC, the much better player has a harder time improving by 1 ball, in general, because he is much nearer the limit of the skills needed to play pool than the poor player (nearer the top of the hill).
Let us say that if Player B is one "level" better than Player A that he can beat Player A with a win/loss ratio of 1.5, and Player C is one level better than Player B. Then, based upon the structure of the levels (the "fitness landscape"), and not upon the shape of the variation in each player's play, we may, with simplifying assumptions, expect that Player C can beat Player A with a win/loss ratio of 1.5^2 = 2.25. The less the variation in each player's play, the more accurate that estimate will be. (Edit: But, as both of us have pointed out, it is more likely to be an overestimate than an underestimate.)
Quote:
I think go is a bit different to chess (elo) in that the accumulation of those tiny errors (which makes some normality of a players performance) is actually visible (in points) and verifiable here (with a strong enough program, and enough match samples). Which distribution would we see on expected points dropped (sum of single errors), and on the expected match score between two players (difference of the sums of single errors)?
I took advantage of that in my rating system by basing ratings on the ability to give handicaps, not simply upon win/loss ratios of even games.
Quote:
One distorting factor I see is winning players (like programs) trade margin for safety and simplicity, intentionally dropping some points.
Right. That is one reason to use handicap stones or variable komi to measure progress, so that the winner cannot afford to slack off against a weaker player.
Edit: Since I based ratings on the ability to give handicaps and komi, I did not follow in Elo's footsteps and had no reason to study that system. I infer Elo's "fitness landscape" from Vargo's remarks. I may well be mistaken about that.