A new run of KataGo released - strength comparable to ELFv2

lightvector · Post by **lightvector** » Tue Dec 31, 2019 2:13 pm

How many playouts did KataGo typically get during that amount of time each move?

If you haven't tuned KataGo's settings, you should be aware the default config is set to be pretty low-resource usage. The main things are to adjust is the number of threads, and to set it to use FP16 if you are using the CUDA version and have an NVIDIA graphics card with of FP16 tensor cores. Both can give huge boosts if you do have a high-end graphics card.

For tuning threads, you can try running "./katago benchmark" using the exact same arguments you use to run "./katago gtp" and it will try a bunch of different numbers of threads on test positions, report the speed of each one and you can pick the best one (although, err on the low side if speed is similar, since holding speed and/or visits fixed more threads usually hurts strength).

For example, on a V100 GPU with somewhere from 16 to 24 threads and FP16 enabled, KataGo should be at least equally matched with ELF and possibly a little stronger at fixed time settings that reach single-digit thousands of playouts.

splee99 · Post by **splee99** » Tue Dec 31, 2019 9:08 pm

I have attached my configuration file and I'm happy to hear any suggestions to make it better. During that game the root visits was around 10000 to 20000, while the ELF playouts was around 2000 to 5000.

gtp_example .cfg.txt: (7.85 KiB) Downloaded 478 times

lightvector · Post by **lightvector** » Wed Jan 01, 2020 12:35 am

Thanks for sharing/testing.

You still haven't said if you're using the OpenCL or the CUDA version, but if you're using the CUDA version with a GPU that has tensor cores (such as RTX2080), you want to set cudaUseFP16 and cudaUseNHWC both to true - they currently are not set in your config.

But if you're using the CUDA version on modern yet not quite as cutting-edge GPU that doesn't have tensor cores but still has some FP16 support (for example RTX10** I think?), then setting them either won't work or it won't help much, I think. And if you're using the OpenCL version, that version doesn't have FP16 support at all. It would be straightforward to implement, I've just never gotten around to doing so yet. So assuming you're running ELF on Leela Zero's engine, I would expect ELF to be a little better in these cases, particularly because Leela Zero's engine has code that takes advantage of limited FP16 support even when tensor cores are not available.

Your dynamicScoreUtilityFactor has been modified quite a bit higher from the default - I'm not entirely sure what effect that will have. The default GTP config should have come with is 0.2 and 0.2 for static and dynamic, but you can also try 0.0 and 0.4 which is actually what is used in training. You have 0.2 and 0.5, which puts a lot of weight on score compared to winning/losing.

(Edit: Also numNNServerThreadsPerModel = 2 is interesting if you only have one GPU. If you've specifically benchmarked the difference between setting it to 2 instead of the default of 1, and found it better, great! If you haven't - then I'm not sure why you have a non-default value here).

Besides that your config looks okay. It's hard to compare the numbers you gave due to visits versus playouts difference, assuming you do mean "visits" vs "playouts" the way LZ people usually mean - tree reuse can cause the relationship to vary wildly. But I'd guess both ELF and KataGo should be able to each win a decent number of games against the other. At fixed playouts and smaller numbers of threads on each side I know they are generally fairly similar. And then, which one is better at fixed time is a matter of things like the hardware and implementation details above, which can make as much as a factor of 2 difference in performance one way or another - and which is not small, a factor of 2 is easily more than 100 Elo.

When bots are otherwise close, it's hard to make a blanket statement about what will be best or which bot "is stronger" - messy configuration and hardware details on both sides and simple statistical noise can have a pretty big effect case by case.

Hope that helps?

Maharani · Post by **Maharani** » Wed Jan 01, 2020 1:45 am

So interesting. I was wondering about that during the recent AI championship streamed by Stephen Hu from the AGA. Does everyone bring their own hardware etc?

And · Post by **And** » Wed Jan 08, 2020 12:11 pm

splee99 wrote:I was running a match between Katago(B) and ELF(W). Yes I used time parity with 20 seconds per move. The playouts are variable, but ELF used roughly 5000 playouts per move. It looks like ELF was ahead from the beginning to the end. I didn't set any resign threshold but at the end both sides indicated that Katago was lost.

which network KataGo did you use 20x256 or 10x128? if 10x128, then the last, s458837800-d26065887?

And · Post by **And** » Sat Jan 11, 2020 9:40 am

new network 10x128 https://github.com/lightvector/KataGo/i ... -573324889

And · Post by **And** » Mon Jan 20, 2020 4:13 am

KataGo 10x128 g170e - ELF v2 (gt 610, 120 sec/move) 1:1

TIME_B 20630.4 (5h 43m) TIME_W 19088.7 (5h 18m)

TIME_B 16603.3 (4h 36m) TIME_W 19787.4 (5h 29m)

Life In 19x19

A new run of KataGo released - strength comparable to ELFv2

Re: A new run of KataGo released - strength comparable to EL

Re: A new run of KataGo released - strength comparable to EL

Re: A new run of KataGo released - strength comparable to EL

Re: A new run of KataGo released - strength comparable to EL

Re: A new run of KataGo released - strength comparable to EL

Re: A new run of KataGo released - strength comparable to EL

Re: A new run of KataGo released - strength comparable to EL