LZ's progression

And · **#301**

does anyone know where to download LZ ZQ elf-2, LZ ZQ elf-5 ?
https://github.com/breakwa11/GoAIRatings

Vargo · **#302**

20 game match at time parity between
LZ0.16 #204 and LZ0.16 Elfv2
1x1080, twogtp 1.5.0, 5min per side and per game.

Elfv2 wins 13-7
All games by resignation, no error, no duplicate game.
Stats :

Uberdude · **#303**

Vargo, about how many playouts per move is this? The official LZ test was 1600 each and LZ won 65%.

Vargo · **#304**

Uberdude wrote:

how many playouts per move is this?

5 min per side and per game is, in fact, ~3.5 min/game effectively used, and is ~2s/move. It's similar to -v 1600 for #204 and -v 3000 for Elfv2 , all this with 1x1080.

Vargo · **#305**

Another 10 game match between LZ0.16_#204 and LZ0.16_Elfv2 at time parity
2x1080Ti, 5 minutes per side per game (probably similar to -v 5000 for #204 and to -v 9000 for Elfv2)
twogtp 1.5.0, no pondering, komi 7.5, no duplicate game, no error.
Result : Elfv2 wins 7-3.

The games :

Attachment:

204_Elfv2.zip [9.67 KiB]
Downloaded 532 times

I've used "-alternate", so, #204 is B in the even numbered games, and #204 is W for the odd numbers.
(#204 only won the games numbered 1, 3, and 6)

The command lines and the stats :

nbc44 · **#306**

LZ0.16_#204 vs LZ0.16_Elfv2 2x1080Ti, 3s per move:

Nothing interesting:

Code:

+28-72=0 (as black)
+34-66=0 (as white)

Total: +62-138=0

Vargo · **#307**

nbc44 wrote:

Nothing interesting:

Why do you say that ? I find it very interesting, particularly considering it's 200 games :tmbup:

!
________________________________________________________________________________________

New network #205

40 game match #205 v. Elfv2
1x1080, 5min per side and per game, no pondering, komi 7.5

Elfv2 wins 25-15 (62.5 %)

40 games :

Attachment:

elfV2_205.zip [34.9 KiB]
Downloaded 521 times

Command lines and stats (205 is B) :

Command lines and stats (205 is W) :

nbc44 · **#308**

Vargo wrote:

Why do you say that ? I find it very interesting, particularly considering it's 200 games :tmbup:

!

I suppose the test result is predetermined.

Long test now:
LZ0.16_#205 vs LZ0.16_Elfv2 - 2x1080Ti, 120s (wow!) per move, (it will be 10 games):

+1-4=0 (#205 is black)
+1-4=0 (#205 is white)

Elfv2 wins 8-2 (80 %)

P.S.
Dragon tail loss

:

Vargo · **#309**

In another thread, @jlt wrote an interesting comment :

Quote:

... I would be surprised if, for some n, LeelaZero(n) didn't beat LeelaZero(n-10) more than 50% of the time.

The last 40b network is #207, it's now 50 networks away from the last 15b, and 30+ networks from the last 20b.

20 game matches LZ(n) v. LZ(n-10) at time parity, 3 min/game and /side, 1x1080, komi 7.5, no pondering, LZ0.16, twogtp 1.5.0.

#207 v. #197 --> 12-8 (40b v. 40b)
#197 v. #187 --> 12-8 (40b v. 40b)
#187 v. #177 --> 15-5 (40b v. 40b)
#177 v. #167 --> 13-7 (40b v. 20b)
#167 v. #157 --> 5-15 (20b v. 15b)

And one more match : LZ(n) v. LZ(n-50)

#207 v. #157 --> 15-5 (40b v. 15b)

All games by resignation, no error, no duplicate game.

Average time was around 1.3 sec/move.

Below, the little hands point the networks #157,167,177, etc.

If someone wants the games or the stats, I'll upload them.

jlt · **#310**

Yes, I should have added the condition "if LZ(n) and LZ(n-10) are networks of the same size". Changing the network size introduces some discontinuity. When 20-block networks were introduced, results were disappointing, that's why the LeelaZero project shifted to 40 blocks rather quickly.

Vargo · **#311**

You're right, 15b #157 was a turning point, and 20b #158 was weaker.

Another 20 game match (just finished, with the same parameters) :
LZ(n) v. LZ(n-49)

#207 v. #158 --> 19-1 (40b v. 20b)

Not very surprising, but still... it's hard to pretend that LZ doesn't progress anymore ;-)

Vargo · **#312**

100 game match : LZ(today) v. LZ(1 year ago)

One year ago, the best LZ network was #90 (6x128)
2 minutes per game and side, LZ0.16, twogtp 1.5.0 no pondering, komi 7.5, gpu : 1x1080

Try to guess the result :scratch:

NB. Because of the "-alternate" command, #207 is always named B, even though it was W 50 times.

Vargo · **#313**

What's the effect of the number of visits on a given network ?
For example, what would be the score of LZ#207 --visits=801 v. LZ#207 --visits=1601 ?

I ran such a match yesterday (#207 with --visits=1, --visits=401, --visits=801, --visits=1601, --visits=3201) but... the results were inconclusive, more than half the games were duplicates :sad:

probably because #207 knows all the tricks of #207 ;-)

______________________________________________________________________________

Anyway, there's a new network, #208.
20 game matches : #208 with various visits counts, and -m 40
-m 40 is used to have a bit more randomness in the first 40 moves, and so, avoid duplicate games.

Code:

 gogui-twogtp -black "C:\PATH TO LZ\leelaz.exe --gtp --weights=C:\PATH TO NETWORKS\208.gz --noponder -m 40 -v yyyy" -white "C:\PATH TO LZ\leelaz.exe --gtp --weights=C:\PATH TO NETWORKS\208.gz --noponder -m 40 -v zzzz" -games 20 -sgffile XXX -auto -komi 7.5 -alternate

twogtp 1.5.0, LZ0.16, gpu:1x1080
no duplicate game, no error.

time/move seems to scale linearly :
-v 1 : ~0 sec/move
-v 401 : ~0.8
-v 801 : ~1.5
-v 1601 : ~3
-v 3201 : ~5 to 6

Results :

Attachment:

208.gif [ 9.4 KiB | Viewed 10838 times ]

If someone wants all the stats (times, lengthes, etc) , I'll upload them.

All the games :
The smallest number of visits is always B in the even numbered games (and W in the odd ones)
for example, 208_401_801-17 is game number 17 between #208 with 400 visits and # 208 with 800 visits. 400 visits is W

Attachment:

games.zip [143.93 KiB]
Downloaded 503 times

maf · **#314**

Did a quick test using LZ207, p100 vs p1000, got 0:20. Nothing surprising, just fyi.

And · **#315**

several matches 25x25, nets received by the program https://drive.google.com/open?id=1bgkVB ... oXHUdDuqt7,
https://github.com/leela-zero/leela-zero/issues/2240, 10sec/move, cpuonly, gogui-twogtp:
LM 192x15 GX89(25x25) - LZ 40x256 #205(25x25) 25:15
LZ 192x15 f438268e(25x25) - LZ 40x256 #205(25x25) 18:22
elf v2 256x20(25x25) - LZ 40x256 #205(25x25) 17:23, black elf all parties (11) won because of the ladder
converted minigo(25x25) 000930-goliath and 000990-cormorant do not work in gogui and sabaki.
Can someone with a powerful gpu make a couple of matches?

Vargo · **#316**

Here are some more 20 game matches of #208 v. #208, with --visits=6401

Same parameters, except for --gpu 0 --gpu 1 (2x1080Ti)
It shouldn't change anything.
No error , no duplicate game.

So, same table as before, with an extra line (6401 → ...)

Attachment:

6401.gif [ 12.09 KiB | Viewed 11381 times ]

Seems like more visits really makes a difference, I find the score of 6401 v. 801 specially harsh

!

The games between -v 6401 and -v 3201 (3201 is B in the even numbered games):

Attachment:

208_3201v6401.zip [17.77 KiB]
Downloaded 482 times

The stats for -v 6401 vs -v 3201 :

If someone wants the other stats or games, I can upload them.

moha · **#317**

Vargo wrote:

Seems like more visits really makes a difference, I find the score of 6401 v. 801 specially harsh

!

IIRC similar tests were posted on github a year ago, and that time double playouts seemed to give roughly 75% winrate. This coincides with performance distributions about one standard deviation apart, which in turn can explain quadruple and octuple visits behaviour (3sd->98%, though doubling visits is not the same as doubling playouts, and at high visits the relations may change as well).

nbc44 · **#318**

Time parity match.
LZ0.16 XXX and LZ0.16 Elfv2
2x1080ti, 60s per move.

1). #205

Code:

#205 v elfv2 ( 26 games)
           wins        black       white
#205    12 46.15%    2 50.00%   10 45.45%
elfv2   14 53.85%    2 50.00%   12 54.55%
                     4 15.38%   22 84.62%

2). #207

Code:

#207 v elfv2 ( 26 games)
           wins        black       white
#207    13 50.00%    7 53.85%    6 46.15%
elfv2   13 50.00%    6 46.15%    7 53.85%
                    13 50.00%   13 50.00%

3). #208

Code:

#208 v elfv2 ( 26 games)
           wins         black      white
#208     4 15.38%    1  9.09%    3 20.00%
elfv2   22 84.62%   10 90.91%   12 80.00%
                    11 42.31%   15 57.69%

4). #210
in progress...

Vargo · **#319**

New network #212

Quick test about @jlt's law ;-)

(reminder : LZ#(n) is stronger than LZ#(n-10) at blocks and time parity)

added parameters -m 20, to avoid duplicate games, and -v 1601, to "standardize" the test.

50 games, no duplicate, no error.
Result : #212 wins 32-18 (64%)
__________________________________________________________________________

And now, how about a little controversy...

If #n wins 55% of its games against #n-1, and
If #n-1 wins 55% of its games against #n-2,and
...
and #n-9 wins 55% of its games against #n-10

#n should win 88% of its games against #n-10, but in this test, it wins only 64%...

In this case, it's as if the real average winrate of #n against #n-1 was only ~51.5% , and not 55%

Some caveats : -m 20 can alter results, and 50 games is not enough, but still, I remember @moha spoke about the primary source of Elo inflation being the amount of luck accumulated by the new networks in test matches. I think he was right.

Code:

gogui-twogtp -black "C:\Users\jm\Desktop\gogui150\leela-zero-0.16-win64OK\leelaz.exe --gtp --weights=C:\Users\jm\Desktop\LZ_networks\212.gz --noponder --gpu 0 --gpu 1 -m 20 -v 1601" -white "C:\Users\jm\Desktop\gogui150\leela-zero-0.16-win64OK\leelaz.exe --gtp --weights=C:\Users\jm\Desktop\LZ_networks\202.gz --noponder --gpu 0 --gpu 1 -m 20 -v 1601" -games 50 -sgffile 212_202 -auto -komi 7.5 -alternate

The 50 games :

Attachment:

212_202.zip [43.7 KiB]
Downloaded 472 times

EDIT : #212 is B in the even numbered games, and W in the odd ones.

Uberdude · **#320**

Vargo wrote:

If #n wins 55% of its games against #n-1, and
If #n-1 wins 55% of its games against #n-2,and
...
and #n-9 wins 55% of its games against #n-10

#n should win 88% of its games against #n-10, but in this test, it wins only 64%...

In this case, it's as if the real average winrate of #n against #n-1 was only ~51.5% , and not 55%

Why should it? That's an assumption e.g. Elo rating systems take to make the problem simple enough to tackle, but there's no logical 'should' about it. If Man City beat Arsenal 3-0 and Arsenal beat Chelsea 2-0 we can't say Man City should beat Chelsea 5-0.

LZ's progression

Who is online