KataGo Distributed Training and new networks

For discussing go computing, software announcements, etc.
And
Gosei
Posts: 1464
Joined: Tue Sep 25, 2018 10:28 am
GD Posts: 0
Has thanked: 212 times
Been thanked: 215 times

Re: KataGo Distributed Training and new networks

Post by And »

And
Gosei
Posts: 1464
Joined: Tue Sep 25, 2018 10:28 am
GD Posts: 0
Has thanked: 212 times
Been thanked: 215 times

Re: KataGo Distributed Training and new networks

Post by And »

KataGo v1.8.2
"This is a minor release mainly of interest to contributors to KataGo's distributed run, or to users who run KataGo self-play training on their own GPUs. This release doesn't make any changes to the KataGo engine itself, but it does fix an issue that was believed to be limiting the diversity in KataGo's self-play games. Switching to this version should, over the long term of training, improve KataGo's learning, particularly on small boards, as well as enable a few further parameter changes in the future once most people have upgraded, which should also further improve opening diversity."
https://github.com/lightvector/KataGo/releases
And
Gosei
Posts: 1464
Joined: Tue Sep 25, 2018 10:28 am
GD Posts: 0
Has thanked: 212 times
Been thanked: 215 times

Re: KataGo Distributed Training and new networks

Post by And »

KataGo b60c320-20210424 playouts 1, komi 0 - CS Zero 9d:
Attachments
CS Zero - KataGo.sgf
(1.7 KiB) Downloaded 3107 times
And
Gosei
Posts: 1464
Joined: Tue Sep 25, 2018 10:28 am
GD Posts: 0
Has thanked: 212 times
Been thanked: 215 times

Re: KataGo Distributed Training and new networks

Post by And »

KataGo b60c320-s275 playouts 1 - CS Zero 9d H2:
Attachments
CS Zero - KataGo.sgf
(1.78 KiB) Downloaded 3113 times
User avatar
ez4u
Oza
Posts: 2414
Joined: Wed Feb 23, 2011 10:15 pm
Rank: Jp 6 dan
GD Posts: 0
KGS: ez4u
Location: Tokyo, Japan
Has thanked: 2351 times
Been thanked: 1332 times

Re: KataGo Distributed Training and new networks

Post by ez4u »

And wrote:KataGo v1.8.2
"This is a minor release mainly of interest to contributors to KataGo's distributed run, or to users who run KataGo self-play training on their own GPUs. This release doesn't make any changes to the KataGo engine itself, but it does fix an issue that was believed to be limiting the diversity in KataGo's self-play games. Switching to this version should, over the long term of training, improve KataGo's learning, particularly on small boards, as well as enable a few further parameter changes in the future once most people have upgraded, which should also further improve opening diversity."
https://github.com/lightvector/KataGo/releases
The new v1.8.2 seems to run much slower than 1.8.1 on my machine. I am running contribute on windows 10 with an Nvidia GTX1650 super (no tensor cores, no FP16) using the opencl version. Moving from 1.8.1 to 1.8.2, I am seeing: -33% on plays per second, -30% on nn evals per second, and +61% on seconds per game. This is from the information in the log files.
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
And
Gosei
Posts: 1464
Joined: Tue Sep 25, 2018 10:28 am
GD Posts: 0
Has thanked: 212 times
Been thanked: 215 times

Re: KataGo Distributed Training and new networks

Post by And »

I compared versions of eigenavx2 - 1.8.2 10% slower, several moves 20% slower, watched over 100 moves. The default_gtp.cfg files are the same, network 20b, 10 playouts :sad:
And
Gosei
Posts: 1464
Joined: Tue Sep 25, 2018 10:28 am
GD Posts: 0
Has thanked: 212 times
Been thanked: 215 times

Re: KataGo Distributed Training and new networks

Post by And »

OpenCL version 1.8.2 is 2.5 times slower! but there are differences in opencltuning files (GT 610)
EDIT I replaced the opencltuning file, nothing changed. 1.8.2 is also 2.5x slower! The default_gtp.cfg files are the same, network 20b, 10 playouts
User avatar
ez4u
Oza
Posts: 2414
Joined: Wed Feb 23, 2011 10:15 pm
Rank: Jp 6 dan
GD Posts: 0
KGS: ez4u
Location: Tokyo, Japan
Has thanked: 2351 times
Been thanked: 1332 times

Re: KataGo Distributed Training and new networks

Post by ez4u »

When I run the cuda version on google colab, I do not see any difference. If anything, 1.8.2 might be a little faster. However, that was with quite short runs versus over 24 hours when I looked at the opencl versions on my machine.
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
And
Gosei
Posts: 1464
Joined: Tue Sep 25, 2018 10:28 am
GD Posts: 0
Has thanked: 212 times
Been thanked: 215 times

Re: KataGo Distributed Training and new networks

Post by And »

obviously something is wrong :)
lightvector
Lives in sente
Posts: 759
Joined: Sat Jun 19, 2010 10:11 pm
Rank: maybe 2d
GD Posts: 0
Has thanked: 114 times
Been thanked: 916 times

Re: KataGo Distributed Training and new networks

Post by lightvector »

We've figured out what's going on - it's not a matter of the version, but apparently on some GPUs KataGo's tuning is noisy enough that it it has an occasional chance of picking to not use FP16 storage when FP16 storage is beneficial, so on those GPUs, every time you manually re-tune you have a chance of getting different performance.

I'm not entirely sure how to fix it without making the OpenCL tuning process heavier and slower. But definitely something I'll think about for a future release - @ez4u thanks for the report and being so careful to check and notice things. :tmbup:

Also, is there a place in the docs where I can write or fix to avoid giving the impression that people have to manually retune each version? Although, I guess it was really good that you did anyways, otherwise we wouldn't have known about this quirk about tuning. :)
And
Gosei
Posts: 1464
Joined: Tue Sep 25, 2018 10:28 am
GD Posts: 0
Has thanked: 212 times
Been thanked: 215 times

Re: KataGo Distributed Training and new networks

Post by And »

KataGo s790 playouts 1 - CS Zero 9d H2:
Attachments
CS Zero - KataGo.sgf
(1.48 KiB) Downloaded 2888 times
And
Gosei
Posts: 1464
Joined: Tue Sep 25, 2018 10:28 am
GD Posts: 0
Has thanked: 212 times
Been thanked: 215 times

Re: KataGo Distributed Training and new networks

Post by And »

KataGo s800 playouts 1 - CS Zero 9d H3:
Attachments
CS Zero - KataGo.sgf
(2.46 KiB) Downloaded 2866 times
And
Gosei
Posts: 1464
Joined: Tue Sep 25, 2018 10:28 am
GD Posts: 0
Has thanked: 212 times
Been thanked: 215 times

Re: KataGo Distributed Training and new networks

Post by And »

And
Gosei
Posts: 1464
Joined: Tue Sep 25, 2018 10:28 am
GD Posts: 0
Has thanked: 212 times
Been thanked: 215 times

Re: KataGo Distributed Training and new networks

Post by And »

Post Reply