Page 26 of 28
Re: LZ's progression
Posted: Thu Apr 25, 2019 1:04 am
by nbc44
hoa803 wrote:NBC, if you're using visit parity you shouldn't use ponder. The time to reach that number of visits varies by position. Time parity matches can use ponder on separate hardware though, similar to how Alphago was tested.
Why not? I'm using separate GPU for each net.
hoa803 wrote:There's a thread on GitHub with a visit "parity" (1600 vs 3200) match between 220 and elfv2. The result was inconclusive, seems to indicate they're about the same strength at that visit count.
1). Wow. And you are right.
Code: Select all
#222 v elfv2 ( 414 games)
wins black white
#222 203 49.03% 86 48.86% 117 49.16%
elfv2 211 50.97% 90 51.14% 121 50.84%
176 42.51% 238 57.49%
2). Hmm. Are you right?
Code: Select all
#222 v elfv2 ( 400 games)
wins black white
#222 174 43.50% 82 43.16% 92 43.81%
elfv2 226 56.50% 108 56.84% 118 56.19%
190 47.50% 210 52.50%
3). Oh my God. You are definitely wrong.
Code: Select all
#222 v elfv2 ( 400 games)
wins black white
#222 143 35.75% 58 33.53% 85 37.44%
elfv2 257 64.25% 115 66.47% 142 62.56%
173 43.25% 227 56.75%
Re: LZ's progression
Posted: Thu Apr 25, 2019 4:15 am
by Aram
So you have shown that by increasing the number of threads manually way above the default of the program (which should be the optimum in most cases) you make it play worse?
EDIT:
Or do you want to say that the 20-ish block ELF2 network scales better with threads? Does the 40b network regress with more threads or stay the same?
In all it seems a bit confusing that thread amounts play such a difference in play quality when you are using a fixed number of visits?
Re: LZ's progression
Posted: Thu Apr 25, 2019 5:31 am
by nbc44
Aram wrote:So you have shown that by increasing the number of threads manually way above the default of the program (which should be the optimum in most cases) you make it play worse?
EDIT:
Or do you want to say that the 20-ish block ELF2 network scales better with threads? Does the 40b network regress with more threads or stay the same?
In all it seems a bit confusing that thread amounts play such a difference in play quality when you are using a fixed number of visits?
I don't know, it's very strange, but it's a fact.
Re: LZ's progression
Posted: Thu Apr 25, 2019 6:42 am
by Uberdude
I've not been following this thread for a while so I dont know if this is relevant, but I do recall when Facebook ran Elf it had more threads or batches than when I did trying to reproduce things. Given Elf is observed to be quite blind spotty in not considering enough choices and more threads means more independent randomness of choosing which variations to explore it wouldn't surprise me if Elf benefitted more than LZ from more threads.
Re: LZ's progression
Posted: Thu Apr 25, 2019 7:18 am
by Vargo
Aram wrote:In all it seems a bit confusing that thread amounts play such a difference
Maybe I can bring a little more confusion here
I never thought to make this little experiment, but maybe there's something wrong here, the numbers seem weird.
Win 10, i9-12 core, 2x1080Ti
Code: Select all
leelaz --gtp--benchmark -t XXX -w ...\223.gz --gpu 0 --gpu 1
XXX=1 ---> 214 n/s
XXX=4 ---> 610 n/s
XXX=12 ---> 731 n/s
XXX=36 ---> 1091 n/s
XXX=48 ---> 990 n/s
XXX=136 ---> 958 n/s
XXX=200 ---> 793 n/s
The maximum seems to be around t 36, but does it prove anything ?
t 1
t36
t200
Re: LZ's progression
Posted: Thu Apr 25, 2019 2:37 pm
by nbc44
The main question is what is more important to us - victory or honesty?

Re: LZ's progression
Posted: Thu Apr 25, 2019 6:06 pm
by iopq
Did you set the batch number to half of the threads? You can get better perf.
Re: LZ's progression
Posted: Thu Apr 25, 2019 10:21 pm
by Vargo
Three last benchmarks :
not specifying -t XXX seems to give slightly less n/s
iopq wrote:Did you set the batch number to half of the threads?
I'm not sure it's better... but again, I find the effect of -t XXX rather bizarre, and these benchmarks are maybe flawed, one way or another...
--precision half seems to be about the same as not specifying the precision

Re: LZ's progression
Posted: Fri Apr 26, 2019 12:36 am
by iopq
Benchmark with batching, it would be faster than just threading
Re: LZ's progression
Posted: Fri Apr 26, 2019 2:20 am
by Uberdude
LZ just beat Golaxy in the Fuzhou AI tournament
https://home.yikeweiqi.com/#/live/board/17523
Re: LZ's progression
Posted: Fri Apr 26, 2019 4:12 am
by Amtiskaw
Is there a way to download SGF from that site, and if yes, can someone post it here?
Alright I think I found them...
Re: LZ's progression
Posted: Fri Apr 26, 2019 5:44 am
by Uberdude
I've previously hacked out the sgf from yike using browser dev tools, don't know if there's an easier way.
Inline sgf players:
Re: LZ's progression
Posted: Fri Apr 26, 2019 6:38 pm
by splee99
iopq wrote:Benchmark with batching, it would be faster than just threading
Could you please show me the command option for batching? It seems that Sabaki always choose batch size 1 by default, while the autogtp chooses something different.
Re: LZ's progression
Posted: Sat Apr 27, 2019 4:45 am
by Amtiskaw
Leela lost both its semi-final games. I enjoyed watching the second one live, it had a rather drastic semeai, which sadly became 1-eye vs 0-eye...
Re: LZ's progression
Posted: Sat Apr 27, 2019 2:29 pm
by hoa803
nbc44 wrote:hoa803 wrote:NBC, if you're using visit parity you shouldn't use ponder. The time to reach that number of visits varies by position. Time parity matches can use ponder on separate hardware though, similar to how Alphago was tested.
Why not? I'm using separate GPU for each net.
Think about what you are trying to do in terms of mathematics. LZ has a chance to win a random game, let us call that probability P.
In a match with ponder turned on, you've introduce another variable - the total thinking time permitted for each engine due to use of ponder. On a given game either LZ or Elf is likely to get more overall thinking time. Since we already know that strength is directly related to thinking time, your chance of LZ winning a particular game is now the function P(x), where x is a random variable related to the strength at different thinking times.
That means that the statistical basis being used to evaluate strength is no longer valid, because with fixed visit count and ponder the result is a function of another random variable that we don't know anything about. The function P(x) is most likely Gaussian, but we don't know the standard deviation or anything along those lines. I'm not enough of a mathematician to know what that does to the conclusion over a 400 game match.
Also - you should put your queries about thread and batch count to the actual programmers on GitHub. Again you are introducing variables that you don't understand. I think I've seen some discussion about both batch size and number of threads having an impact on performance. You should definitely ask if you want to understand what is going on. Maybe post your results and see what GCP says about it.