LZ's progression

nbc44 · Post by **nbc44** » Thu Apr 25, 2019 1:04 am

hoa803 wrote:NBC, if you're using visit parity you shouldn't use ponder. The time to reach that number of visits varies by position. Time parity matches can use ponder on separate hardware though, similar to how Alphago was tested.

Why not? I'm using separate GPU for each net.

hoa803 wrote:There's a thread on GitHub with a visit "parity" (1600 vs 3200) match between 220 and elfv2. The result was inconclusive, seems to indicate they're about the same strength at that visit count.

1). Wow. And you are right.

Code: Select all

#222 v elfv2 ( 414 games)
           wins        black       white
#222   203 49.03%   86 48.86%  117 49.16%
elfv2  211 50.97%   90 51.14%  121 50.84%
                   176 42.51%  238 57.49%

2). Hmm. Are you right?

Code: Select all

#222 v elfv2 ( 400 games)
           wins        black       white
#222   174 43.50%   82 43.16%   92 43.81%
elfv2  226 56.50%  108 56.84%  118 56.19%
                   190 47.50%  210 52.50%

3). Oh my God. You are definitely wrong.

Code: Select all

#222 v elfv2 ( 400 games)
           wins        black       white
#222   143 35.75%   58 33.53%   85 37.44%
elfv2  257 64.25%  115 66.47%  142 62.56%
                   173 43.25%  227 56.75%

Aram · Post by **Aram** » Thu Apr 25, 2019 4:15 am

So you have shown that by increasing the number of threads manually way above the default of the program (which should be the optimum in most cases) you make it play worse?

EDIT:
Or do you want to say that the 20-ish block ELF2 network scales better with threads? Does the 40b network regress with more threads or stay the same?

In all it seems a bit confusing that thread amounts play such a difference in play quality when you are using a fixed number of visits?

nbc44 · Post by **nbc44** » Thu Apr 25, 2019 5:31 am

Aram wrote:So you have shown that by increasing the number of threads manually way above the default of the program (which should be the optimum in most cases) you make it play worse?

EDIT:
Or do you want to say that the 20-ish block ELF2 network scales better with threads? Does the 40b network regress with more threads or stay the same?

In all it seems a bit confusing that thread amounts play such a difference in play quality when you are using a fixed number of visits?

I don't know, it's very strange, but it's a fact.

Uberdude · Post by **Uberdude** » Thu Apr 25, 2019 6:42 am

I've not been following this thread for a while so I dont know if this is relevant, but I do recall when Facebook ran Elf it had more threads or batches than when I did trying to reproduce things. Given Elf is observed to be quite blind spotty in not considering enough choices and more threads means more independent randomness of choosing which variations to explore it wouldn't surprise me if Elf benefitted more than LZ from more threads.

Vargo · Post by **Vargo** » Thu Apr 25, 2019 7:18 am

Aram wrote:In all it seems a bit confusing that thread amounts play such a difference

Maybe I can bring a little more confusion here

I never thought to make this little experiment, but maybe there's something wrong here, the numbers seem weird.

Win 10, i9-12 core, 2x1080Ti

Code: Select all

leelaz --gtp--benchmark -t XXX -w ...\223.gz --gpu 0 --gpu 1

XXX=1 ---> 214 n/s
XXX=4 ---> 610 n/s
XXX=12 ---> 731 n/s
XXX=36 ---> 1091 n/s
XXX=48 ---> 990 n/s
XXX=136 ---> 958 n/s
XXX=200 ---> 793 n/s

The maximum seems to be around t 36, but does it prove anything ?

t 1

t36

t200

nbc44 · Post by **nbc44** » Thu Apr 25, 2019 2:37 pm

The main question is what is more important to us - victory or honesty?

iopq · Post by **iopq** » Thu Apr 25, 2019 6:06 pm

Did you set the batch number to half of the threads? You can get better perf.

Vargo · Post by **Vargo** » Thu Apr 25, 2019 10:21 pm

Three last benchmarks :

not specifying -t XXX seems to give slightly less n/s

iopq wrote:Did you set the batch number to half of the threads?

I'm not sure it's better... but again, I find the effect of -t XXX rather bizarre, and these benchmarks are maybe flawed, one way or another...

--precision half seems to be about the same as not specifying the precision

iopq · Post by **iopq** » Fri Apr 26, 2019 12:36 am

Benchmark with batching, it would be faster than just threading

Uberdude · Post by **Uberdude** » Fri Apr 26, 2019 2:20 am

LZ just beat Golaxy in the Fuzhou AI tournament

https://home.yikeweiqi.com/#/live/board/17523

Amtiskaw · Post by **Amtiskaw** » Fri Apr 26, 2019 4:12 am

Is there a way to download SGF from that site, and if yes, can someone post it here?

Alright I think I found them...

Uberdude · Post by **Uberdude** » Fri Apr 26, 2019 5:44 am

I've previously hacked out the sgf from yike using browser dev tools, don't know if there's an easier way.

Inline sgf players:

splee99 · Post by **splee99** » Fri Apr 26, 2019 6:38 pm

iopq wrote:Benchmark with batching, it would be faster than just threading

Could you please show me the command option for batching? It seems that Sabaki always choose batch size 1 by default, while the autogtp chooses something different.

Amtiskaw · Post by **Amtiskaw** » Sat Apr 27, 2019 4:45 am

Leela lost both its semi-final games. I enjoyed watching the second one live, it had a rather drastic semeai, which sadly became 1-eye vs 0-eye...

hoa803 · Post by **hoa803** » Sat Apr 27, 2019 2:29 pm

nbc44 wrote:
hoa803 wrote:NBC, if you're using visit parity you shouldn't use ponder. The time to reach that number of visits varies by position. Time parity matches can use ponder on separate hardware though, similar to how Alphago was tested.
Why not? I'm using separate GPU for each net.

Think about what you are trying to do in terms of mathematics. LZ has a chance to win a random game, let us call that probability P.

In a match with ponder turned on, you've introduce another variable - the total thinking time permitted for each engine due to use of ponder. On a given game either LZ or Elf is likely to get more overall thinking time. Since we already know that strength is directly related to thinking time, your chance of LZ winning a particular game is now the function P(x), where x is a random variable related to the strength at different thinking times.

That means that the statistical basis being used to evaluate strength is no longer valid, because with fixed visit count and ponder the result is a function of another random variable that we don't know anything about. The function P(x) is most likely Gaussian, but we don't know the standard deviation or anything along those lines. I'm not enough of a mathematician to know what that does to the conclusion over a 400 game match.

Also - you should put your queries about thread and batch count to the actual programmers on GitHub. Again you are introducing variables that you don't understand. I think I've seen some discussion about both batch size and number of threads having an impact on performance. You should definitely ask if you want to understand what is going on. Maybe post your results and see what GCP says about it.

Life In 19x19

LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression