Author:  Uberdude [ Tue Aug 14, 2018 2:35 pm ]  
Well, I did a game (it took 2 hours) at large visits and didn't get the result I expected: 15 block #157 as white with 80k playouts beat 40 block 40b_157_360k e2be with 20k playouts which is very close to equal time (15b took 3114s, 40b 3201s). Looks like it was a half pointer but black did a load of inside territory nonsense instead of resigning, should I adjust some parameter? 2nd game 40 block won, same opening for first 27 moves, #157 diverged.

Author:  Tryss [ Tue Aug 14, 2018 3:52 pm ] 
Here is a 40k vs 10k match I just did : 40d win as white against #157, (and it's close to time parity, with 2407s for B and 2559s for W) 
Author:  Vargo [ Wed Aug 15, 2018 3:59 am ] 
Uberdude wrote: should I adjust some parameter? You could maybe add r 10Mini 4game match at time parity (40b_e2be48 v. #157) 5000 visits for e2be48, 20000 visits for #157 #157 wins 40 I was very surprised, I would have bet 40b would win , go figure Stats : 40b is B and 40b is W Attachment: @Uberdude@Tryss :games at 40k or 80 k visits... woaw 
Author:  Tryss [ Wed Aug 15, 2018 5:36 am ] 
If fact, I was mistaken, it's not a 40k/10k playouts match, because the LZ bots were not running at full playouts. By default, LZ thinking time is capped, so the real playouts numbers were lower (I'm not sure by how much, probably a little higher than 10k/2.5k). I suggest everyone doing this to verify if you're indeed running at the correct amount of playouts 
Author:  Vargo [ Wed Aug 15, 2018 6:55 am ] 
Tryss wrote: ...verify if you're indeed running at the correct amount of playouts My 4game match was around 5 sec/move, far from the limit.

Author:  splee99 [ Wed Aug 15, 2018 4:47 pm ] 
That's indeed the big issue of a large network. Not only more time is needed per playout, but also more training games (possibly much much more training games) are needed to get a stable, good performing network. 
Author:  moha [ Thu Aug 16, 2018 2:03 am ] 
Even if #157 could win a statistically significant (20+ games) match at 10k visits vs 40k (still seems unlikely), there surely is a visit limit where it falls apart. This is a race between a linear speed advantage vs an exponential search advantage. That's why LZ always entered competitions with experimental 20 and 40 blocks networks. The only question is whether the turning point is within the reach of an average user (which may be around 1020 sec per move on single 1080ti  both for playing and for analysis/review). But I would still bet on 40 blocks at 10k vs 40k visits already. One could also test if everything is ok (wrt settings / corrupted network files / etc) by a quick test at equal 1600 visits  this should reproduce the official result of 85+%. (high visits might also need timemanage off) Also remember this 40b is just a supervised network that haven't done any selfplay improvement yet. This will also change sooner or later. 
Author:  Vargo [ Thu Aug 16, 2018 6:45 am ] 
moha wrote: One could also test if everything is ok (wrt settings / corrupted network files / etc) by a quick test at equal 1600 visits Everything seems ok, I did a 20game check #157 v 1fdfb1 (40b) at visits=1601.1fdfb1 won 75% which is reasonable (the "official" score it got was 84.25% : 20180802). 40b is W 40b is B And 4 additional e2be48 v #157 games at visits=1601. As expected, e2be48 won them (taking 3 times more time than #157). 40b is W 40b is B Seing this, I'm still surprised that 40b doesn't perform better at time parity. 
Author:  Vargo [ Mon Sep 03, 2018 11:37 am ] 
20 game match between LZ0.15#157 and LZ0.15#173 twogtp 1.4.10, 5 min per side and per game, no ponder, komi 7.5 (GPU 1x1080) #157 wins 14:6 (8 wins as W, 6 wins as B) all games by resignation Still a lot of catching up to do for the new networks... If someone wants the games, I can upload them. 
Author:  Vargo [ Tue Sep 04, 2018 10:42 am ] 
20 game match between LZ0.15#157 andLZ0.15#174 (the new official best network is 256x40) twogtp 1.4.10, time parity : 5 min per side and per game, no ponder, komi 7.5 (GPU 1x1080) #157 v #174 > 10:10 (7 wins as W, 3 wins as B) all games by resignation A good surprise, 256x40 seems much stronger than the 256x20 series... I'll run some more tests tomorrow, to be sure it's not a fluke Attachment: Attachment:

Author:  Vargo [ Wed Sep 05, 2018 12:30 am ] 
40 more games between #157 and #174. In all, it's a 60 game match, at time parity (5 min per game, GPU: 1x1080, komi 7.5, no pondering) Final result : #157 wins 35:25 (58% , 17 wins as W, 18 wins as B) So, maybe #174(256x40) is not as strong as #157, but it seems stronger than the 256x20 networks. The 40 more games : Attachment: Attachment:

Author:  Marcel GrĂ¼nauer [ Wed Sep 05, 2018 6:27 am ] 
Vargo wrote: #157 wins 35:25 On http://zero.sjeng.org/ #157 has ELO 11806 but #174 has 12463. How is it possible for the last 15block network to still win against all the networks with higher ELO? And almost all of the networks inbetween had a win rate of at least 55% (some winrates dropped later) against the nextlower network. Does this mean even within LZ's ELO scale the rating is not so meaningful? 
Author:  Gomoto [ Wed Sep 05, 2018 6:38 am ] 
Vargo is doing time parity Matches on https://zero.sjeng.org/ are played at 1600 visits 
Author:  Uberdude [ Wed Sep 05, 2018 7:26 am ] 
Even without the extra time advantage the larger networks get with equal playouts in test matches, the Leelo scale from successive 55% promotions is highly inflated. Based on comparisons to Elf I estimated it as around a factor of 5. So if one network is 500 above another the Elo formula says it'd win 95% but in reality it's more likely a 100 difference for 65% (I don't have the resources to actually do a test, it's possible LZ is particularly bad against Elf compared to old versions of itself). Another way to get a similar ballpark figure: Top pros are 3600 on goratings which is kind of a continuation of EGf ratings where a beginner is about 0, whilst LZ is about 12000 now and started at 0 for random (lower than beginner), and 12000 / 3600 is about 5 too. 
Author:  Gomoto [ Wed Sep 05, 2018 7:41 am ] 
have a look at the recent matches, the network is now the offical top dog 
Author:  Vargo [ Wed Sep 05, 2018 7:42 am ] 
A little postscriptum : 5 min per game with 1x1080 is roughly equivalent to visits=3201 for #157, and to visits=801 for #174 At time parity, #157 has 4 times more visits and wins. At visits parity, #174 takes 4 times more time than #157 and wins. I still feel it would be more natural to determine Elo at time parity (but maybe it would be difficult to do ?) 
Author:  Gomoto [ Wed Sep 05, 2018 8:19 am ] 
Does anybody know why visits are used instead of time? I think because different hardware does not matter this way. I also think this is a possible error source for further improvement of the networks. 
Author:  jokkebk [ Thu Sep 06, 2018 2:25 pm ] 
Match games are run with visit parity because the architecture was originally designed for fixed size net, so time and visit parity would be essentially the same. With the Leela Zero project, there has been a size upgrade every few months or so, and as the change is usually done manually, it doesn't matter since all games after that are again at time parity. Breaks the ELO graph though (or not if you would want visit parity). Selfplay ELO graph is not absolute in any case but kind of relative, so this is probably not seen as such a huge issue. 
Author:  Vargo [ Wed Sep 12, 2018 9:52 am ] 
40 games between #157 and #176. Time parity, 5 min per game, GPU: 1x1080, komi 7.5, no pondering. #157 wins 29:11 (17 wins as W, 12 wins as B) Well, almost 2 months since #157... am I the only one to be so disappointed in the new networks ? Could someone run a 157 v 176 match (at timeparity, with no pondering), just to be sure of these results. Attachment: Attachment:

Author:  moha [ Wed Sep 12, 2018 1:50 pm ] 
Vargo wrote: Well, almost 2 months since #157... am I the only one to be so disappointed in the new networks ? You probably won't see a fast improvement in these 1s/move games, even if the slower networks are getting stronger and stronger, because that strength still needs a meaningful sized search tree to do it's work. Below a certain limit more search beats smarter search, no way around that.But this doesn't mean the new networks are weaker at "time parity" in general  just in these very fast games. 
