Page 26 of 28

Re: LZ's progression

Posted: Thu Apr 25, 2019 1:04 am
by nbc44
hoa803 wrote:NBC, if you're using visit parity you shouldn't use ponder. The time to reach that number of visits varies by position. Time parity matches can use ponder on separate hardware though, similar to how Alphago was tested.
Why not? I'm using separate GPU for each net.
hoa803 wrote:There's a thread on GitHub with a visit "parity" (1600 vs 3200) match between 220 and elfv2. The result was inconclusive, seems to indicate they're about the same strength at that visit count.
1). Wow. And you are right.
C:\APPS\l0gpu17\validation.exe -k 222-elfv2 -s "0:1" -g 7 -n C:\APPS\net\0407e5b5.gz -o "-g -v 1600 --gpu 0 --gpu 1 -t 1 --noponder -q -d --timemanage off --precision single -w" -n C:\APPS\net\05dbca15.gz -o "-g -v 3200 --gpu 0 --gpu 1 -t 1 --noponder -q -d --timemanage off --precision single -w " -- C:\APPS\l0gpu17\leelaz -- C:\APPS\l0gpu17\leelaz

Code: Select all

#222 v elfv2 ( 414 games)
           wins        black       white
#222   203 49.03%   86 48.86%  117 49.16%
elfv2  211 50.97%   90 51.14%  121 50.84%
                   176 42.51%  238 57.49%
2). Hmm. Are you right?
C:\APPS\l0gpu17\validation.exe -k 222-elfv2 -s "0:1" -g 7 -n C:\APPS\net\0407e5b5.gz -o "-g -v 1600 --gpu 0 --gpu 1 -t 12 --noponder -q -d --timemanage off --precision single -w" -n C:\APPS\net\05dbca15.gz -o "-g -v 3200 --gpu 0 --gpu 1 -t 12 --noponder -q -d --timemanage off --precision single -w " -- C:\APPS\l0gpu17\leelaz -- C:\APPS\l0gpu17\leelaz

Code: Select all

#222 v elfv2 ( 400 games)
           wins        black       white
#222   174 43.50%   82 43.16%   92 43.81%
elfv2  226 56.50%  108 56.84%  118 56.19%
                   190 47.50%  210 52.50%
3). Oh my God. You are definitely wrong.
C:\APPS\l0gpu17\validation.exe -k 222-elfv2 -s "0:1" -g 6 -n C:\APPS\net\0407e5b5.gz -o "-g -v 1600 --gpu 0 --gpu 1 -t 24 --noponder -q -d --timemanage off --precision single -w" -n C:\APPS\net\05dbca15.gz -o "-g -v 3200 --gpu 0 --gpu 1 -t 24 --noponder -q -d --timemanage off --precision single -w " -- C:\APPS\l0gpu17\leelaz -- C:\APPS\l0gpu17\leelaz

Code: Select all

#222 v elfv2 ( 400 games)
           wins        black       white
#222   143 35.75%   58 33.53%   85 37.44%
elfv2  257 64.25%  115 66.47%  142 62.56%
                   173 43.25%  227 56.75%

Re: LZ's progression

Posted: Thu Apr 25, 2019 4:15 am
by Aram
So you have shown that by increasing the number of threads manually way above the default of the program (which should be the optimum in most cases) you make it play worse?

EDIT:
Or do you want to say that the 20-ish block ELF2 network scales better with threads? Does the 40b network regress with more threads or stay the same?


In all it seems a bit confusing that thread amounts play such a difference in play quality when you are using a fixed number of visits?

Re: LZ's progression

Posted: Thu Apr 25, 2019 5:31 am
by nbc44
Aram wrote:So you have shown that by increasing the number of threads manually way above the default of the program (which should be the optimum in most cases) you make it play worse?

EDIT:
Or do you want to say that the 20-ish block ELF2 network scales better with threads? Does the 40b network regress with more threads or stay the same?


In all it seems a bit confusing that thread amounts play such a difference in play quality when you are using a fixed number of visits?
I don't know, it's very strange, but it's a fact.

Re: LZ's progression

Posted: Thu Apr 25, 2019 6:42 am
by Uberdude
I've not been following this thread for a while so I dont know if this is relevant, but I do recall when Facebook ran Elf it had more threads or batches than when I did trying to reproduce things. Given Elf is observed to be quite blind spotty in not considering enough choices and more threads means more independent randomness of choosing which variations to explore it wouldn't surprise me if Elf benefitted more than LZ from more threads.

Re: LZ's progression

Posted: Thu Apr 25, 2019 7:18 am
by Vargo
Aram wrote:In all it seems a bit confusing that thread amounts play such a difference
Maybe I can bring a little more confusion here ;-)

I never thought to make this little experiment, but maybe there's something wrong here, the numbers seem weird.

Win 10, i9-12 core, 2x1080Ti

Code: Select all

leelaz --gtp--benchmark -t XXX -w ...\223.gz --gpu 0 --gpu 1
XXX=1 ---> 214 n/s
XXX=4 ---> 610 n/s
XXX=12 ---> 731 n/s
XXX=36 ---> 1091 n/s
XXX=48 ---> 990 n/s
XXX=136 ---> 958 n/s
XXX=200 ---> 793 n/s

The maximum seems to be around t 36, but does it prove anything ? :scratch:

t 1
t1.gif
t1.gif (25.93 KiB) Viewed 13170 times
t36
t36.gif
t36.gif (60.2 KiB) Viewed 13170 times
t200
t200.gif
t200.gif (93.19 KiB) Viewed 13170 times

Re: LZ's progression

Posted: Thu Apr 25, 2019 2:37 pm
by nbc44
The main question is what is more important to us - victory or honesty? :salute:

Re: LZ's progression

Posted: Thu Apr 25, 2019 6:06 pm
by iopq
Did you set the batch number to half of the threads? You can get better perf.

Re: LZ's progression

Posted: Thu Apr 25, 2019 10:21 pm
by Vargo
Three last benchmarks :

not specifying -t XXX seems to give slightly less n/s
1.gif
1.gif (39.75 KiB) Viewed 11879 times
iopq wrote:Did you set the batch number to half of the threads?
I'm not sure it's better... but again, I find the effect of -t XXX rather bizarre, and these benchmarks are maybe flawed, one way or another...
3.gif
3.gif (73.69 KiB) Viewed 11879 times

--precision half
seems to be about the same as not specifying the precision
2.gif
2.gif (57.26 KiB) Viewed 11879 times
:scratch: :scratch: :scratch:

Re: LZ's progression

Posted: Fri Apr 26, 2019 12:36 am
by iopq
Benchmark with batching, it would be faster than just threading

Re: LZ's progression

Posted: Fri Apr 26, 2019 2:20 am
by Uberdude
LZ just beat Golaxy in the Fuzhou AI tournament :tmbup:
https://home.yikeweiqi.com/#/live/board/17523

Re: LZ's progression

Posted: Fri Apr 26, 2019 4:12 am
by Amtiskaw
Is there a way to download SGF from that site, and if yes, can someone post it here? :study:

Alright I think I found them...

Re: LZ's progression

Posted: Fri Apr 26, 2019 5:44 am
by Uberdude
I've previously hacked out the sgf from yike using browser dev tools, don't know if there's an easier way.

Inline sgf players:





Re: LZ's progression

Posted: Fri Apr 26, 2019 6:38 pm
by splee99
iopq wrote:Benchmark with batching, it would be faster than just threading
Could you please show me the command option for batching? It seems that Sabaki always choose batch size 1 by default, while the autogtp chooses something different.

Re: LZ's progression

Posted: Sat Apr 27, 2019 4:45 am
by Amtiskaw
Leela lost both its semi-final games. I enjoyed watching the second one live, it had a rather drastic semeai, which sadly became 1-eye vs 0-eye...


Re: LZ's progression

Posted: Sat Apr 27, 2019 2:29 pm
by hoa803
nbc44 wrote:
hoa803 wrote:NBC, if you're using visit parity you shouldn't use ponder. The time to reach that number of visits varies by position. Time parity matches can use ponder on separate hardware though, similar to how Alphago was tested.
Why not? I'm using separate GPU for each net.
Think about what you are trying to do in terms of mathematics. LZ has a chance to win a random game, let us call that probability P.

In a match with ponder turned on, you've introduce another variable - the total thinking time permitted for each engine due to use of ponder. On a given game either LZ or Elf is likely to get more overall thinking time. Since we already know that strength is directly related to thinking time, your chance of LZ winning a particular game is now the function P(x), where x is a random variable related to the strength at different thinking times.

That means that the statistical basis being used to evaluate strength is no longer valid, because with fixed visit count and ponder the result is a function of another random variable that we don't know anything about. The function P(x) is most likely Gaussian, but we don't know the standard deviation or anything along those lines. I'm not enough of a mathematician to know what that does to the conclusion over a 400 game match.

Also - you should put your queries about thread and batch count to the actual programmers on GitHub. Again you are introducing variables that you don't understand. I think I've seen some discussion about both batch size and number of threads having an impact on performance. You should definitely ask if you want to understand what is going on. Maybe post your results and see what GCP says about it.