Page 8 of 34

Re: KataGo Distributed Training and new networks

Posted: Sat Apr 03, 2021 4:32 am
by And

Re: KataGo Distributed Training and new networks

Posted: Mon Apr 19, 2021 6:06 am
by And
KataGo v1.8.2
"This is a minor release mainly of interest to contributors to KataGo's distributed run, or to users who run KataGo self-play training on their own GPUs. This release doesn't make any changes to the KataGo engine itself, but it does fix an issue that was believed to be limiting the diversity in KataGo's self-play games. Switching to this version should, over the long term of training, improve KataGo's learning, particularly on small boards, as well as enable a few further parameter changes in the future once most people have upgraded, which should also further improve opening diversity."
https://github.com/lightvector/KataGo/releases

Re: KataGo Distributed Training and new networks

Posted: Sun Apr 25, 2021 4:58 am
by And

Re: KataGo Distributed Training and new networks

Posted: Mon Apr 26, 2021 8:09 am
by And
KataGo b60c320-20210424 playouts 1, komi 0 - CS Zero 9d:

Re: KataGo Distributed Training and new networks

Posted: Mon Apr 26, 2021 2:45 pm
by And
KataGo b60c320-s275 playouts 1 - CS Zero 9d H2:

Re: KataGo Distributed Training and new networks

Posted: Wed Apr 28, 2021 5:34 am
by ez4u
And wrote:KataGo v1.8.2
"This is a minor release mainly of interest to contributors to KataGo's distributed run, or to users who run KataGo self-play training on their own GPUs. This release doesn't make any changes to the KataGo engine itself, but it does fix an issue that was believed to be limiting the diversity in KataGo's self-play games. Switching to this version should, over the long term of training, improve KataGo's learning, particularly on small boards, as well as enable a few further parameter changes in the future once most people have upgraded, which should also further improve opening diversity."
https://github.com/lightvector/KataGo/releases
The new v1.8.2 seems to run much slower than 1.8.1 on my machine. I am running contribute on windows 10 with an Nvidia GTX1650 super (no tensor cores, no FP16) using the opencl version. Moving from 1.8.1 to 1.8.2, I am seeing: -33% on plays per second, -30% on nn evals per second, and +61% on seconds per game. This is from the information in the log files.

Re: KataGo Distributed Training and new networks

Posted: Wed Apr 28, 2021 6:27 am
by And
I compared versions of eigenavx2 - 1.8.2 10% slower, several moves 20% slower, watched over 100 moves. The default_gtp.cfg files are the same, network 20b, 10 playouts :sad:

Re: KataGo Distributed Training and new networks

Posted: Wed Apr 28, 2021 6:50 am
by And
OpenCL version 1.8.2 is 2.5 times slower! but there are differences in opencltuning files (GT 610)
EDIT I replaced the opencltuning file, nothing changed. 1.8.2 is also 2.5x slower! The default_gtp.cfg files are the same, network 20b, 10 playouts

Re: KataGo Distributed Training and new networks

Posted: Wed Apr 28, 2021 6:58 am
by ez4u
When I run the cuda version on google colab, I do not see any difference. If anything, 1.8.2 might be a little faster. However, that was with quite short runs versus over 24 hours when I looked at the opencl versions on my machine.

Re: KataGo Distributed Training and new networks

Posted: Wed Apr 28, 2021 7:01 am
by And
obviously something is wrong :)

Re: KataGo Distributed Training and new networks

Posted: Thu Apr 29, 2021 4:06 pm
by lightvector
We've figured out what's going on - it's not a matter of the version, but apparently on some GPUs KataGo's tuning is noisy enough that it it has an occasional chance of picking to not use FP16 storage when FP16 storage is beneficial, so on those GPUs, every time you manually re-tune you have a chance of getting different performance.

I'm not entirely sure how to fix it without making the OpenCL tuning process heavier and slower. But definitely something I'll think about for a future release - @ez4u thanks for the report and being so careful to check and notice things. :tmbup:

Also, is there a place in the docs where I can write or fix to avoid giving the impression that people have to manually retune each version? Although, I guess it was really good that you did anyways, otherwise we wouldn't have known about this quirk about tuning. :)

Re: KataGo Distributed Training and new networks

Posted: Mon May 03, 2021 6:58 am
by And
KataGo s790 playouts 1 - CS Zero 9d H2:

Re: KataGo Distributed Training and new networks

Posted: Tue May 11, 2021 12:10 am
by And
KataGo s800 playouts 1 - CS Zero 9d H3:

Re: KataGo Distributed Training and new networks

Posted: Thu May 20, 2021 3:49 am
by And

Re: KataGo Distributed Training and new networks

Posted: Sat Jun 19, 2021 5:57 am
by And