Page 1 of 4
KaTrain Questions
Posted: Thu Jun 01, 2023 1:10 am
by RobertJasiek
Having installed the very convenient Baduk AI Megapack 4.18.0 x64, I have KaTrain 1.7.2.0, some instances of KataGo etc. I use Ryzen 7700 (8 cores, 16 threads), RTX 4070 and HWiNFO64 7.46-5110 for monitoring CPU and GPU loads, temperatures and fan RPMs. In particular, therefore I know that the AI does run on the dGPU with 94% load while the CPU uses 16% +- 1% while the stupid Windows task manager notices some increased temperature but claims 0% load (apparently it does not notice Tensor cores usage but would notice a Furmark 100% CUDA load). So fine so well.
However, I am unsure whether I have set KaTrain correctly. Do I run the right KataGo instance and KataGo model? In KaTrain, my settings are:
KataGo = ...\lizzie\katago.exe
Config = analysis_config.cfg
Model = /lizzie/KataGo40b.gz
Override =
Are these three values for the file paths and names those that are applied? (They also appear in ...\.katrain\config.json .) Is this so because Override is empty?
The newest KataGo is 1.13.0 and I think that Baduk AI Megapack 4.18.0 comes with it. However, is each instance of KataGo this newest KataGo version? In particular, is ...\lizzie\katago.exe this newest KataGo version?
KaTrain says that one can choose from seven KataGo instances. I can find these six, which have different file sizes and dates 22nd or 23rd May:
...\lizzie\katago.exe
...\lizzie\katago_cuda.exe
...\_mydata\lizzie\katago.exe
...\LizGoban\resources\external\Katago\Katago_opencl.exe
...\LizGoban\resources\external\Katago\Katago_eigenavx2.exe
...\LizGoban\resources\external\Katago\Katago_eigen.exe
Obviously, I must avoid the eigen versions for the CPU.
Which of the other KataGo instances should I use to get the strongest play? Can I use all together with the KataGo40b.gz model?
Since I use RTX 4070, I can use CUDA or Tensor cores. Am I right that Tensor tend to result in stronger play? There used to be KataGo downloads for CUDA, Tensor and Eigen but now there are none for Tensor; does this mean that now the CUDA and / or OpenCL variants of KataGo always also offer Tensor core use as an option? During the Baduk AI Megapack installation, the query seems to enable Tensor for the installed KataGo instances, right?
Besides, ...\.katrain\opencltuning files set 1 for canUseFP16Storage, canUseFP16TensorCores, shouldUseFP16Storage, shouldUseFP16TensorCores and 0 for canUseFP16Compute, canUseFP16TensorCoresFor1x1, shouldUseFP16Compute, shouldUseFP16TensorCoresFor1x1. I guess, this indicates that Tensor cores are used, right?
Is KataGo40b.gz the strongest model? I guess it means 40 blocks and such is supposed to be the strongest. Otherwise, what strongest model should I set?
In analysis_config.cfg, the Baduk AI Megapack installation has created numSearchThreads = 8 and nnMaxBatchSize = 96. Are these values appropriate for my RTX 4070 or does every user need to experiment and find out the best tuned parameters for his GPU?
Re: KaTrain Questions
Posted: Thu Jun 01, 2023 6:55 am
by pwaldron
Congratulations on the new setup. It looks like a beast of a system and I'm sure you'll enjoy it.
With regards to which back end is best, you can check out the KataGo project page at
https://github.com/lightvector/KataGo#w ... load-stuff. There is a section called "OpenCL vs CUDA vs TensorRT vs Eigen" that goes through the various options. It looks like TensorRT is the best of the batch, but you may have to install the libraries from NVidia yourself. My experience is that the graphics cards generate analysis faster than my brain can handle, but you may have different applications in mind.
Like you, I would think the 40b model is a 40 block model, and will therefore be strong. If you're looking for other options, the networks can be found at
https://katagotraining.org/networks/kata1/. They are listed by estimated Elo strength; there is a 60 block option there as well another strong one. If you can't figure out how to install a new network, it is possible just to delete an older installed network and rename the new network in its place. At least, I did that for Lizzie without trouble.
I think your interpretation of the opencltuning file is correct. It looks like the Tensor cores are available (canUse...) and used (shouldUse...).
Not sure about the installation parameters. The author comments about it at
https://github.com/lightvector/KataGo/issues/28. It looks numSearchThreads is based on the number of *CPU* threads you have available. With a Ryzen 7700 you have a lot to play with so you may benefit from higher numbers. Note that the message in the link also talks about the size of your memory cache if you intend to do long searches. That may be worth checking out based on your available RAM and previous comments in the forum.
Good luck.
Re: KaTrain Questions
Posted: Fri Jun 02, 2023 4:08 am
by thirdfogie
Robert,
I run KataGo under Linux, but there may be some useful information for you.
You asked about the many files called katago.exe on your system. Under Linux.
one can simply type "./katago version" and the program will tell you its version number.
You may need to type "katago.exe version" under Windows. Obviously you must
first "cd" to the correct directory, if that is still how things work in Windows.
The other thing to look at is the file size. Version 1.10.0 was 133386512 bytes under Linux
and version 1.11.0 was 145386512 bytes. Any file called katago which is much smaller
than those sizes is probably some kind of link or start-up script or configuration file.
Hope this helps.
Re: KaTrain Questions
Posted: Fri Jun 02, 2023 4:51 am
by Cassandra
pwaldron wrote:It looks numSearchThreads is based on the number of *CPU* threads you have available.
As far as I know,
numSearchThreads should be larger than the number of *CPU* threads on your system.
Please remember that the main work is done by the
GPU, not by the CPU. The more the GPU can work in parallel, the more you use its special advantages.
Re: KaTrain Questions
Posted: Fri Jun 02, 2023 7:14 am
by RobertJasiek
Sysinternals ProcessExplorer shows KaTrain.exe's child process KataGo.exe with this process information (where ... are my abbreviations):
Path:
...\lizzie\katago.exe
Command line:
...\lizzie\katago.exe analysis -model ...\lizzie\KataGo40b.gz -config ...\KaTrain\analysis_config.cfg -analysis-threads 12 -override-config homeDataDir=C:\Users\.../.katrain
Column 'Version':
Therefore, I do not know the version yet but now I know that my current settings
KataGo = ...\lizzie\katago.exe
Config = analysis_config.cfg
Model = .../lizzie/KataGo40b.gz
Override =
are used indeed. I guess I can also derive and learn the basic syntax of the Override command line.
***
In PowerShell, I have used the following commands to find out any version numbers of KataGo:
(Get-Item -Path ...\lizzie\katago.exe).VersionInfo | Format-List -Force
(Get-Item -Path ...\lizzie\katago_cuda.exe).VersionInfo | Format-List -Force
(Get-Item -Path ...\_mydata\lizzie\katago.exe).VersionInfo | Format-List -Force
(Get-Item -Path ...\LizGoban\resources\external\katago\katago-opencl.exe).VersionInfo | Format-List -Force
(Get-Item -Path ...\LizGoban\resources\external\katago\katago-eigen.exe).VersionInfo | Format-List -Force
(Get-Item -Path ...\LizGoban\resources\external\katago\katago-eigenavx2.exe).VersionInfo | Format-List -Force
Each time, I get in particular this result:
FileVersion :
ProductVersion :
FileVersionRaw : 0.0.0.0
ProductVersionRaw : 0.0.0.0
Therefore, I have to conclude that the programmer of KataGo has not stored any version information in the executables' file information. Hence, I cannot know which KataGo executable has which version unless the programmers of KataGo or Baduk AI Megapack 4.18.0 x64 tell us!
Note that ProcessExplorer does show the version number of KaTrain.exe.
Re: KaTrain Questions
Posted: Fri Jun 02, 2023 8:33 am
by Cassandra
In Windows CMD line (opens after "WINDOWS + R" ==> "cmd" ==> enter):
Goto subdirectory with KataGo's executable(s) ...
C:\Users\thoma>cd c:/baduk4152/lizzie
Run the respective KataGo executable with the parameter "version" ...
c:\baduk4152\lizzie>katago version
KataGo v1.11.0
Git revision: d8d0cd76cf73df08af3d7061a639488ae9494419
Compile Time: Mar 20 2022 16:15:40
Using CUDA backend
Compiled with CUDA version 11.2.67
Compiled to support contributing to online distributed selfplay
There you are!!!
++++++++++++++++++++++++
You will also get KataGo's version by typing "version" in the command line of e.g. Sabaki (but without all the additional stuff above) after having attached it.
Re: KaTrain Questions
Posted: Fri Jun 02, 2023 9:48 am
by RobertJasiek
I see. This I get:
...\lizzie>katago version
KataGo v1.13.0
Git revision: 8bebc35ed0bbf3a9b11ed429bb90ad5928d79f12
Compile Time: May 23 2023 00:42:52
Using OpenCL backend
...\lizzie>katago_cuda version
KataGo v1.13.0
Git revision: 8bebc35ed0bbf3a9b11ed429bb90ad5928d79f12
Compile Time: May 23 2023 00:42:37
Using CUDA backend
Compiled with CUDA version 11.2.67
...\_mydata\lizzie>katago version
Execution of the code cannot be continued because... is not found.
libz.dll
libz.dll
libssl-1_1-x64.dll
libcrypto-1_1-x64.dll
...\LizGoban\resources\external\katago>katago-eigen version
KataGo v1.12.4
Git revision: 75280bf26582090dd4985dca62bc7124116c856d
Compile Time: Feb 17 2023 22:59:12
Using Eigen(CPU) backend
...\LizGoban\resources\external\katago>katago-eigenavx2 version
KataGo v1.12.4
Git revision: 75280bf26582090dd4985dca62bc7124116c856d
Compile Time: Feb 17 2023 22:59:27
Using Eigen(CPU) backend
Compiled with AVX2 and FMA instructions
...\LizGoban\resources\external\katago>katago-opencl version
KataGo v1.12.4
Git revision: 75280bf26582090dd4985dca62bc7124116c856d
Compile Time: Feb 17 2023 22:58:58
Using OpenCL backend
So ...\lizzie has the newest versions. My remaining questions are:
- Where / which is the 7th KataGo executable?
- Does it matter that ...\_mydata\ cannot be executed?
- Should I use ...\lizzie>katago or ...\lizzie>katago_cuda and why?
- Do one or both of them use my tensor cores?
- Since I have installed Nvidia's Studio driver for Windows 11 x64 for my RTX 4070, should this not mean that its tensor cores are enabled by the driver? Why all this talk about having to install additional tensor drivers?! If the drivers are ready for using tensor cores, does KataGo simply use them given my previously mentioned configuration settings?
Re: KaTrain Questions
Posted: Fri Jun 02, 2023 11:43 am
by Cassandra
RobertJasiek wrote:- Since I have installed Nvidia's Studio driver for Windows 11 x64 for my RTX 4070, should this not mean that its tensor cores are enabled by the driver? Why all this talk about having to install additional tensor drivers?! If the drivers are ready for using tensor cores, does KataGo simply use them given my previously mentioned configuration settings?
Remembering the great difficulties I had in getting my training environment ("only" CUDA, not the faster TensorRT) to work, I would think it extremely likely that your system does not yet have ALL the necessary TensorRT drivers.
In addition, it should hardly hurt to install the NVIDIA original for it.
Re: KaTrain Questions
Posted: Fri Jun 02, 2023 12:01 pm
by kvasir
TLDR
But maybe it is helpful that with newer versions of katago it is possible to run a command like this to generate a good config file.
Code: Select all
.\katago-v1.13.0-trt8.5-cuda11.2-windows-x64\katago.exe genconfig -model .\models\b18c384nbt-optimisticv13-s5971M.bin.gz -output .\katago-v1.13.0-cuda11.2-windows-x64\analysis_config.cfg
In this case I used the katago executable at ".\katago-v1.13.0-trt8.5-cuda11.2-windows-x64\katago.exe" to generate a config file at ".\katago-v1.13.0-cuda11.2-windows-x64\analysis_config.cfg" using the model at ".\models\b18c384nbt-optimisticv13-s5971M.bin.gz". This worked very well. The new config file can then be referenced from katrain.
Before using this command I had to setup somethings in the PATH environment variable and more. It was not entirely trouble free to get that particular TensorRT version to work for me but I knew how to do but not really recommending the TensorRT over CUDA version. It was just the command line that I actually used, I think last week to make a config file.
The TensorRT version often takes too long to start, there seem to be some improvements in this regards on the command line but it still takes forever in the old katrain that I am using. The performance gain over the CUDA version is sometimes there and sometimes not (maybe this depends on the exact settings?).
Which are the strongest models? You can download the strongest model in katrain. Or download them yourself, I believe the new b18 models are already strongest now (hope I am not wrong about this, not double checking right now), before it was usually a b40 model.
Re: KaTrain Questions
Posted: Fri Jun 02, 2023 5:38 pm
by ez4u
With your hardware, you presumably will end up running the tensorRT version. Version 1.13.0 has a bug in that version that was fixed in version 1.13.1. See the katago releases page at
https://github.com/lightvector/KataGo/releases/. It seems that KaTrain has not caught up with this.
Re: KaTrain Questions
Posted: Fri Jun 02, 2023 6:48 pm
by xela
pwaldron wrote:Like you, I would think the 40b model is a 40 block model, and will therefore be strong. If you're looking for other options, the networks can be found at
https://katagotraining.org/networks/kata1/. They are listed by estimated Elo strength; there is a 60 block option there as well another strong one.
Models get updated all the time. Don't stress too much over finding "the strongest": they are all superhuman. You'll get good results with any network from the last few years.
It's not always the case that more blocks = stronger. It depends on two other factors:
- Time. Fewer blocks give you more playouts per second. If you're spending only a few seconds per move on analysis, a smaller number of blocks may actually be better; as the analysis time gets longer, the larger networks will overtake the smaller ones. (I used to compile some data on this sort of thing, but I don't have numbers for the newer KataGo networks.)
- Training method and architecture. KataGo release 1.12.0 included a "uec" network which only had 18 blocks, but performed as well as 40-block networks because of improved design.
Re: KaTrain Questions
Posted: Sat Jun 03, 2023 9:38 am
by lightvector
At this point, the 18 block "b18c384nbt" networks on
https://katagotraining.org/networks/ should be the best for pretty much any normal usage, both short and long time controls, both weak and strong hardware. There's some automated testing for finding the "strongest confident", which is fine for general use, but the difference between that and any of the absolute most recent b18c384nbt net by date is very small and the recent ones have an advantage that the automated test is incapable of discriminating, which is that they are partially resistant to the cyclic group exploit and/or a little less bad at evaluating the rare pro game that contains such a group, although there's still a lot room for more improvement. So I would just go with the absolute most recent.
The 60 block networks ("b60c320") are slightly stronger on a per-playout basis, but are enough slower on almost all hardware that the above nets should dominate at any practical time control.
In a couple of weeks the 18 block nets will get another update that should result in a noticeable jump in the strength of the most recent ones, closer to the 60 block ones even on a per-playout basis.
Re: KaTrain Questions
Posted: Sat Jun 03, 2023 6:15 pm
by RobertJasiek
The newest net has the name
kata1-b18c384nbt-s6386600960-d3368371862
I can decipher "kata" = "KataGo" and "b18" = "18 blocks". What do "1", "c384nbt", "s6386600960" and "d3368371862" tell us?
In earlier years, stronger nets had larger blocks. Why do have current nets almost the same playing strengths regardless of the number of blocks?
Should we use the same net for all purposes (KaTrain, Lizzie, others and OpenCL, CUDA, Tensor, Eigen)?
Are your opinion / experience and the Elo ratings (with confidence and number of played test games) at
https://katagotraining.org/networks/ our only information for which net is the strongest at any particular time or can we find it out ourselves without extensive testing? I guess, currently we would not notice any strength difference by using the nets for play against them or positional analysis, right?
Re: KaTrain Questions
Posted: Sat Jun 03, 2023 6:31 pm
by xela
If your purpose in using KataGo is to set up a bot that can outperform other people's bots (e.g. climb to the top of KGS rankings, win computer go tournaments), then these questions matter. If your purpose, as I suspect, is to support your research into go theory and improve the accuracy of your books, then it makes very little practical difference which network you use. Just follow the advice above from the author of the software, and get on with the real work. You could potentially spend many months understanding all the details of how it works, and it's an interesting way to spend your time, but possibly not a good step towards your main goals.
I believe most of the filename is a "randomly" generated string to ensure unique names per network. This is easier than sequential numbering, because sequential relies on the software keeping track of the most recent network, or on human intervention, while hashing or pseudo-randomness can be entirely automated. I would guess that the "random" characters are a hash of the filename.
I've already hinted at the tradeoff between work per playout and playouts per second, which means that bigger is not necessarily better, but depends on your hardware configuration and time settings. And also the fact that b18 for KataGo is not a minified version of b60, but the start of a new training run with different architecture.
Re: KaTrain Questions
Posted: Sat Jun 03, 2023 11:00 pm
by Cassandra
RobertJasiek wrote:Should we use the same net for all purposes (KaTrain, Lizzie, others and OpenCL, CUDA, Tensor, Eigen)?
Like xela, I think you should start with your proper work instead of getting lost in irrelevant myteria of KataGo.
I assume that your potential use cases will (presumably to a not really small extent) be decidedly specific.
So it wouldn't hurt to ask questions BOTH the 18b network (as recommended by lightvector) AND the 60b network, which have been trained INDEPENDENTLY from each other.
In the games between my 60b IH20-network in training and Karl's 40b IH120-network from 2022, which I let play for further analysis, I notice that 60b has developed different preferences in some places than 40b, although both can (by now) largely agree on what the solution to IH 120 basically is.
And much more important than the technical questions you are currently concerned with is the credibility / reliability of KataGo's answers. Because ...
The more "unrealistic" the positions to be examined are (in the sense of "was not even remotely encountered during the entire training"), the more time you will have to put into analysing the answers given. And you will also have to deal with how to give KataGo the impression that the position to be examined comes from a "close" game.