It is currently Mon Jun 17, 2019 4:13 pm

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 418 posts ]  Go to page 1, 2, 3, 4, 5 ... 21  Next
Author Message
Offline
 Post subject: LZ's progression
Post #1 Posted: Wed May 09, 2018 11:00 pm 
Lives with ko

Posts: 236
Liked others: 4
Was liked: 57
In 3 weeks, LZ has made great progress between networks e860 (04/18) and b633 (05/09)
The chain of networks can be seen as :
04/22 : 3f6c wins over e860 by 243 games out of a total of 437 (55,61%) , then
04/25 : 1586-3f6c 235/433
04/29 : cfb2-1586 254/433
05/02 : 18e6-cfb2 239/413
05/04 : ecab-18e6 226/412
05/06 : 3737-ecab 253/429
05/07 : 4be6-3737 239/433
05/08 : 2fb0-4be6 235/426
05/08 : 05b7-2fb0 240/426
05/09 : b633-05b7 235/428

So, b633 should win 91.94% of its games against e860, not so far from the win percentage of LZ_ELF against LZ_best_normal_network (93-94%)
Maybe it's partly because the ELF weights loaded the dice for the most recent networks, but still, in just 3 weeks, it's an impressive progression :clap:

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #2 Posted: Wed May 09, 2018 11:19 pm 
Lives in gote
User avatar

Posts: 427
Liked others: 39
Was liked: 135
LZ_ELF (62b5417b) won 94.20 % games against b6337c69, so LeelaZero is still far from ELF.

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #3 Posted: Thu May 10, 2018 12:33 am 
Lives with ko

Posts: 236
Liked others: 4
Was liked: 57
You're right.
I was just saying that the 3 weeks difference between e860 and b633 is comparable to the difference between b633 and 62b5.
If the progression rate doesn't drop too much, in 3-4 weeks, LZ's "normally promoted network" could surpass 62b5.

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #4 Posted: Thu May 10, 2018 1:38 am 
Dies in gote

Posts: 40
Liked others: 2
Was liked: 14
Rank: EGF 1 kyu
KGS: finity
What is also interesting, that Leela with ELF weights won 93 % of games against LZ #132. After that there has been a skyrocket rise of stronger networks in just a few days, with LZ #136 being 150 ELO stronger than #132 in self-play (I think it's actually cumulative ELO so 136vs135 + 135vs134 + ... + 133vs132).

Now that they tried the stronger #136 again against ELF network, the ELF won 94 % of games. So ELO difference jumped from 450 to 490! I wonder if this is:

1) Statistical variance -- the winrate of LZ networks is small so random chance plays a role
2) Another change made at point of introducing ELF, "t=1" (whatever that means) changed the playing conditions, and quick ELO leap of networks is related to network adjusting to "new possibilities"
3) Once LZ playing style comes closer to ELF, the wins become rarer

Last option seems also possible. With humans, if one is a strong moyo-oriented player A and weaker but territory-oriented player B, the weaker B may win more games than a slightly stronger player C who also plays moyo. It's similar effect as the heightened ELO difference in self-play, because minuscule advantage with same playing style may mean 80 % win rate against the weaker version.

I'm hoping a few weeks will show that LZ is narrowing the gap against ELF network, I like LZ networks better, least of all because they play handicap games. Just tried yesterday against LZ #136 with 1 playout (so it just picks the top move without any search), and got crushed by move #100 even with 4 handicap stones. :D (I'm EGF 1 kuy so not a strong dan).


This post by jokkebk was liked by: dfan
Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #5 Posted: Tue May 15, 2018 5:19 am 
Lives with ko

Posts: 236
Liked others: 4
Was liked: 57
There's a new best network (90560), and it should win 93.24% of its games against the old e860.
I've run a 100 games TWOGTP-match between these two (--visits=3201 --noponder)
I find the result surprising : 90560 won "only" 78-22 (e860 won 8 games as B, and 14 as W)
78 seems a bit far from 93...
Maybe 100 games isn't enough, or am I missing something ?
Anyway, I'll set up another match, maybe with more games ;-)

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #6 Posted: Tue May 15, 2018 5:50 am 
Gosei

Posts: 1368
Liked others: 664
Was liked: 441
Rank: AGA 3k KGS 1k
GD Posts: 61
KGS: dfan
Why do you think it should win 93.24% of its games against e860?

If it's because of the supposed Elo difference on the web page graph, be aware that cumulative strength increases are smaller than they look there. Leela has not really gained 11000 Elo since it started. I forget what the ratio is, but when you compare two historical networks in a match, their results are significantly closer than you would expect just by looking at their places on that graph.

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #7 Posted: Tue May 15, 2018 7:00 am 
Lives with ko

Posts: 236
Liked others: 4
Was liked: 57
04/22 : 3f6c wins over e860 by 243 games out of a total of 437 (55,61%) , then
04/25 : 1586-3f6c 235/433
04/29 : cfb2-1586 254/433
05/02 : 18e6-cfb2 239/413
05/04 : ecab-18e6 226/412
05/06 : 3737-ecab 253/429
05/07 : 4be6-3737 239/433
05/08 : 2fb0-4be6 235/426
05/08 : 05b7-2fb0 240/426
05/09 : b633-05b7 235/428
05/14 : 9056-b633 232/424

A wins wa games out of a total of t1 games against B
B wins wb games out of t2 against C
C wins wc games out of t3 against D
D wins wd games out of t4 against E
(etc)

z = wa/(t1-wa) * wb/(t2-wb) * wc/(t3-wc) * wd/(t4-wd)

A should win z/(z+1) % of its games against E

Here, the "cumulative" percentages from e860 to 9056 are

55.61%
59.78%
66.64%
73.29%
76.92%
82.73%
85.51%
87.90%
90.36%
91.94%
93.24%

But maybe I'm missing something, because the 78 wins seem low.

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #8 Posted: Tue May 15, 2018 7:39 am 
Gosei

Posts: 1368
Liked others: 664
Was liked: 441
Rank: AGA 3k KGS 1k
GD Posts: 61
KGS: dfan
Yeah, the Elo model notwithstanding, it turns out that you can't just concatenate a string of self-play rating differences like that; as you observed, it will always be too optimistic. I'm not sure whether this is purely the result of trying to accumulate a bunch of small rating differences, or if it has to do with self-play match results being less generalizable than a dataset with games against multiple opponents. It is well known that the rating graph on http://zero.sjeng.org/ is far too optimistic as far as "actual Elo" goes.

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #9 Posted: Tue May 15, 2018 8:38 am 
Lives with ko

Posts: 236
Liked others: 4
Was liked: 57
You're right about the Elo model, but I don't use Elo differences, I only use win percentages from actual matches, and that should be "transitive".

Top
 Profile  
 
Online
 Post subject: Re: LZ's progression
Post #10 Posted: Tue May 15, 2018 9:18 am 
Honinbo

Posts: 8486
Liked others: 2481
Was liked: 2943
Vargo wrote:
You're right about the Elo model, but I don't use Elo differences, I only use win percentages from actual matches, and that should be "transitive".


Instead of percentages, take a look at the log of the odds. IMX, that's more informative. (In the human sense of the term. ;))

_________________
There is one human race.
----------------------------------------------------

The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #11 Posted: Tue May 15, 2018 9:19 am 
Judan

Posts: 5871
Location: Cambridge, UK
Liked others: 335
Was liked: 3137
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
If Andrew beats Bob 60% of the time and Bob beat Charlie 60% of the time what do you think Andrew's win rate against Charlie is? I don't think you can really say much, it might even be less than 50%, though in general it will be >60%, (how much more I've no idea, but I've a feeling something more like a geometric than arithmetic mean is likely to be less wrong).

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #12 Posted: Tue May 15, 2018 9:46 am 
Lives with ko

Posts: 236
Liked others: 4
Was liked: 57
Andrew would win 69.23% of his games against Charlie, I think.

One can see that this formula works in cases where the outcome is obvious :
A wins 50% against B, who wins 50% against C (--> A wins 50% against C)
or A wins 1 game out of 3 against B, who wins 2 games out of 3 against C (A wins 50% against C)
So, it seems right to me, but I'm looking forward to setting up further matches to verify this :)

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #13 Posted: Tue May 15, 2018 11:38 am 
Gosei

Posts: 1368
Liked others: 664
Was liked: 441
Rank: AGA 3k KGS 1k
GD Posts: 61
KGS: dfan
There is no particular reason that winning percentages have to be related in this exact mathematical way.

For example, Alice, Bob and Carol all play the classic game "Whose random number is bigger?". Alice is a beginner and picks integers from 1 to 100 uniformly at random. Bob is more experienced and picks integers from 51 to 150 uniformly at random. Carol is an expert and picks integers from 101 to 200 uniformly at random (she's very good at this game, though you can probably imagine even better strategies).

How often does Bob beat Alice? How often does Carol beat Bob? How often does Carol beat Alice?

Top
 Profile  
 
Online
 Post subject: Re: LZ's progression
Post #14 Posted: Tue May 15, 2018 11:54 am 
Honinbo

Posts: 8486
Liked others: 2481
Was liked: 2943
Vargo wrote:
Andrew would win 69.23% of his games against Charlie, I think.

One can see that this formula works in cases where the outcome is obvious :
A wins 50% against B, who wins 50% against C (--> A wins 50% against C)
or A wins 1 game out of 3 against B, who wins 2 games out of 3 against C (A wins 50% against C)
So, it seems right to me, but I'm looking forward to setting up further matches to verify this :)


Using odds, (3/2) (3/2) = 9/4. :)

However, in a multi-skill game like go, I would expect the odds to be less than that.

_________________
There is one human race.
----------------------------------------------------

The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #15 Posted: Tue May 15, 2018 12:05 pm 
Lives with ko

Posts: 234
Liked others: 0
Was liked: 30
Rank: 2d
Vargo wrote:
Andrew would win 69.23% of his games against Charlie, I think.
This assumes that the observed winrates equal to their theoretical values (without sampling errors), and also that there are no distorting factors (like various correlations).

Both assumptions seems wrong here, the first one in particular. Consider the extreme: a program has a bug and it plays randomly with all networks. You would still see a climbing elo graph (in a few percent of matches one side would go above the promotion threshold by pure luck), but the latest net would not do well against the first one.

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #16 Posted: Tue May 15, 2018 4:08 pm 
Gosei
User avatar

Posts: 2125
Location: Tokyo, Japan
Liked others: 1948
Was liked: 1202
Rank: Jp 6 dan
KGS: ez4u
The test matches are only 400 games long. As a result, they still reflect a good deal of luck. The project appears to use a threshold win rate of 55% in selecting the next “best network”. That 55% does not represent a reliable measurement of the difference in strength. It simply signals that ‘probably’ the new “best” is stronger than the old one. There are undoubtedly any number of candidates that have a sub-50% win rate in a 400-game match that would be over 50% in a 10,000-game match. However, that 55% threshold has worked to support an automated process of developing stronger and stronger networks.

If you think about it (I didn’t until now), the graph showing the strength progression is just cute PR. To check the real change in strength, we would need to go back and test across different ranges in the progression of best nets. However, we are much more interested in the ability of LZ to beat humans or other AI’s so why waste the time?

_________________
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #17 Posted: Tue May 15, 2018 8:40 pm 
Lives with ko

Posts: 236
Liked others: 4
Was liked: 57
Quote:
Bill Spight wrote:
Using odds, (3/2) (3/2) = 9/4. :)
A is weaker than B
C is weaker than B by the exact same ratio,
A and C must be the same strength.

But you're right, the mathematical model can probably not be a perfect fit here.

Quote:
Moha wrote :
This assumes that the observed winrates equal to their theoretical values (without sampling errors), and also that there are no distorting factors (like various correlations).

Both assumptions seems wrong here, the first one in particular. Consider the extreme: a program has a bug and it plays randomly with all networks. You would still see a climbing elo graph (in a few percent of matches one side would go above the promotion threshold by pure luck), but the latest net would not do well against the first one.


Sampling errors on a too small sample (100 games) are most likely, but they should go both ways, and I would have hoped that they more or less cancel each other. Your bugged program is a good example, but the progression would be so slow (probably logarithmic) as to be almost non existant.

Quote:
ez4u wrote :
To check the real change in strength, we would need to go back and test across different ranges in the progression of best nets.
I'll do that !

For the rest, you're right, and my much too small sample must be the principal cause.
But you're definitely WRONG ;-) about the waste of time, I find it fascinating to pitch different programs or different networks against each other.


And also, I like the feeling to be part of the LZ experiment by using autogtp and contributing to better networks.

Top
 Profile  
 
Offline
 Post subject: Re: LZ's progression
Post #18 Posted: Wed May 16, 2018 1:50 am 
Judan

Posts: 5871
Location: Cambridge, UK
Liked others: 335
Was liked: 3137
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
Something I saw raised on the Leela Zero github pages was whether a broader test than ">55% win vs last best network" would be better (such as a league against several previous versions). One could imagine 3 version of Leela that form a cycle of A beats B, B beat C, C beats A each by >55% (e.g. maybe B sucks at ladders and A doesn't so A can often win, but B is good at semeai and C sucks, and C has the best positional judgement and manages to make that count against A). If the training happens to select these in order then the self-play Elo will keep going up and up when really it's not getting stronger, just going round in circles.


This post by Uberdude was liked by: Bill Spight
Top
 Profile  
 
Online
 Post subject: Re: LZ's progression
Post #19 Posted: Wed May 16, 2018 3:10 am 
Honinbo

Posts: 8486
Liked others: 2481
Was liked: 2943
Vargo wrote:
Quote:
Bill Spight wrote:
Using odds, (3/2) (3/2) = 9/4. :)
A is weaker than B
C is weaker than B by the exact same ratio,
A and C must be the same strength.

But you're right, the mathematical model can probably not be a perfect fit here.


Let's back up. :)

Vargo wrote:
Andrew would win 69.23% of his games against Charlie, I think.


69.23% ≅ 9/13 , so the win/loss odds are 9/4.
60% = ⅗, so the win/loss odds are 3/2.

Andrew beats Bob 60% of the time, with win/loss odds of 3/2; Bob bets Charlie 60% of the time, with win/loss odds of 3/2. Assuming transitivity and no error, Andrew beats Charlie with win/loss odds of (3/2) (3/2) = 9/4, or 9/13 of the time.

In terms of the log of the odds, log(3/2) + log(3/2) = log(9/4). :)

Quote:
A is weaker than B
C is weaker than B by the exact same ratio,
A and C must be the same strength.


In that case, using odds, the odds that A beats B are p/q and the odds that B beats C are q/p; assuming transitivity and no error, the odds that A beats C are (p/q) (q/p) = 1. Or log(p/q) + log(q/p) = log(p/q) - log(p/q) = 0. (Obviously, if A always loses to B and C always loses to B, A and B do not have to be the same strength. ;))

Bill Spight wrote:
However, in a multi-skill game like go, I would expect the odds to be less than {9/4}.


Vargo wrote:
But you're right, the mathematical model can probably not be a perfect fit here.


It is not clear to me that you get my point. Lack of transitivity is a well known phenomenon where Andrew usually beats Bob, Bob usually beats Charlie, and Charlie usually beats Andrew. This lack of transitivity is not just a question of errors. Each player has a number of go skills, at different strengths. Thus a comparison of their strength at go is multi-dimensional, even though any one on one comparison reduces to a win/loss ratio. The win/loss ratio does not tell the whole story. That means that we cannot derive the win/loss ratio of Andrew vs. Charlie from the win/loss ratios of Andrew vs. Bob and the win/loss ratio of Bob vs. Charlie. Fortunately, however, in both chess and go transitivity approximately holds. I have never heard of a case where, except perhaps for short periods of time, Andrew can give two stones to Bob, who can give two stones to Charlie, who can give two stones to Andrew.

It is not just that the model, which assumes transitivity, is not a perfect fit, we are lucky that it is a fit at all. ;) My point is that the model will overestimate the win/loss ratio of A vs. C, as calculated from the win/loss ratios of A vs. B and B vs. C, when each of the ratios is greater than 1. The reason has to do with multi-dimensionality, and is similar to the phenomenon of regression to the mean.

Darwin's cousin Galton discovered that tall fathers had tall sons, on average, but the sons were not as tall, on average, as the fathers of a particular height. At first he thought that he had discovered a law of evolution, whereby the height of the sons approached average height over time. But it is actually a phenomenon of reducing a two-dimensional plot of father-son heights to one dimension, a line of regression. That becomes obvious when you notice that it works the other way. Given sons of a particular height, the fathers are not as tall as the sons, on average.

In such a case you cannot predict the height of the grandsons from the difference in average height of the sons, given the height of the fathers. It is not like the sons are on average 1" shorter, so the grandsons are 2" shorter. Ceteris paribus, the grandsons are probably only 1" shorter, as well.

Anyway, multi-dimensionality not only destroys (perfect) transitivity, it tends to do so in one direction. moha gives the example of pure drift, where successive winners in the contests are not actually better than the losers. This phenomenon does not depend upon the number of games played. ez4u points out that the assumption of progress needs to be checked out by play against more than the previous winner. Ideally you would play against all previous winners, but playing against the previous 3 or 4 is probably good enough.

Drift is a real concern, especially with self-play, where the players have similar strengths in all dimensions. You can see drift with hill-climbing, near the top of the hill. Randomness may be enough to stall progress, so that successive winners are no closer to the hilltop.

_________________
There is one human race.
----------------------------------------------------

The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins


Last edited by Bill Spight on Wed May 16, 2018 3:50 am, edited 1 time in total.
Top
 Profile  
 
Online
 Post subject: Re: LZ's progression
Post #20 Posted: Wed May 16, 2018 3:33 am 
Honinbo

Posts: 8486
Liked others: 2481
Was liked: 2943
Uberdude wrote:
Something I saw raised on the Leela Zero github pages was whether a broader test than ">55% win vs last best network" would be better (such as a league against several previous versions). One could imagine 3 version of Leela that form a cycle of A beats B, B beat C, C beats A each by >55% (e.g. maybe B sucks at ladders and A doesn't so A can often win, but B is good at semeai and C sucks, and C has the best positional judgement and manages to make that count against A). If the training happens to select these in order then the self-play Elo will keep going up and up when really it's not getting stronger, just going round in circles.


An excellent example of drift. :)

I would go further, and, since we are talking go, not just require the candidate winner to beat the previous version, but to beat each previous version by a greater margin. The reason is that the winner can play worse in some aspects than the loser, if it plays better in other aspects. Thus, skills can be lost in succeeding winners. Let's say that winning by 55% is roughly equivalent to taking White with no komi to produce an even game.

Then say that we have previous winners, A - E in alphabetical order. Suppose candidate F beats E 55% of the time. Now we have F take White vs. E with no komi. If F wins approximately 50% of the time, then let F play vs. D, taking White and giving (reverse) komi. Let's say that again, F wins about 50% of the time. Then let F give 2 stones to C. Now, surprise(!), C wins more than 50% of the time. That may well mean that C is stronger than D and E in some regard, to the detriment of F. At this point we train F against C until we get a version, F', that plays even with C at 2 stones. Now we go back and play F' versus E, taking White. Etc., etc.

_________________
There is one human race.
----------------------------------------------------

The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 418 posts ]  Go to page 1, 2, 3, 4, 5 ... 21  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group