I have a new PC, bought for the purpose of running KataGo.
The CPU is an i7-14700K and the GPU is a GeForce RTX 4060.
Once again, my son built the PC and we installed Debian 12 together.
(I was previously stuck on Debian 11.) Video output goes
through the GPU and everything seems to be working.
The current build instructions are very different to what I used before. The advice
(forgotten where it is) was to build leela-zero, then lizzie and finally KataGo,
so I tried that.
I believe all the required Debian packages were installed.
The attempt to build leela-zero resulted in various incomprehensible errors, of
which the last is shown below. A leelaz binary is left behind, but cannot be
tested.
Code:
In file included from /home/john/leela-zero/gtest/googletest/src/gtest-all.cc:43:
/home/john/leela-zero/gtest/googletest/src/gtest-death-test.cc: In function ‘bool testing::internal::StackGrowsDown()’:
/home/john/leela-zero/gtest/googletest/src/gtest-death-test.cc:1012:24: error: ‘dummy’ may be used uninitialized [-Werror=maybe-uninitialized]
1012 | StackLowerThanAddress(&dummy, &result);
| ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
/home/john/leela-zero/gtest/googletest/src/gtest-death-test.cc:1002:13: note: by argument 1 of type ‘const void*’ to ‘void testing::internal::StackLowerThanAddress(const void*, bool*)’ declared here
1002 | static void StackLowerThanAddress(const void* ptr, bool* result) {
| ^~~~~~~~~~~~~~~~~~~~~
/home/john/leela-zero/gtest/googletest/src/gtest-death-test.cc:1010:7: note: ‘dummy’ declared here
1010 | int dummy;
| ^~~~~
cc1plus: all warnings being treated as errors
gmake[2]: *** [gtest/googlemock/gtest/CMakeFiles/gtest.dir/build.make:76: gtest/googlemock/gtest/CMakeFiles/gtest.dir/src/gtest-all.cc.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:309: gtest/googlemock/gtest/CMakeFiles/gtest.dir/all] Error 2
gmake: *** [Makefile:156: all] Error 2
-bash: ./tests: No such file or directory
The attempt to build lizzie got nowhere. The current instructions are
Code:
git clone --recursive --branch next http://github.com/gcp/leela-zero.git
mvn package
It is unclear in which directory to issue the mvn (Maven) command. It is also
unclear whether 'package' is to be interpreted literally or is a place-holder for
an input or output name. Running the command as it stands caused Maven to
complain about a missing BOM (Bill of Materials?) and give up.
I haven't tried to build KataGo. I may now be too old and sick to succeed in
the installation. I can no longer play over a real board, but do not seem to
have lost any strength playing online.
Reverting to Debian 11 is not an option because it predates the introduction of
the GeForce 4060 and is unlikely to have the correct drivers.
I retried importing the binaries from Debian 11 for lizzie, leela-zero and KataGo.
Lizzie runs and can load Leela-zero, but the engine fails to analyse anything.
KataGo does not run. This is no surprise because the old engines probably embed GPU
libraries that do not work for the 4060.
I then built build KataGo afresh using these instructions and
placed it under lizzie/target.
Code:
git clone https://github.com/lightvector/KataGo.git
cd KataGo/cpp
# If you get missing library errors, install the appropriate packages using your system package manager and try again.
# -DBUILD_DISTRIBUTED=1 is only needed if you want to contribute back to public training.
cmake . -DUSE_BACKEND=OPENCL
make -j 4
This worked, and the analysis now runs 3 times faster than on the old PC, without
any explicit optimisation. Eventually the number of threads was doubled
in setting
numSearchThreads = 24 in file
gtp_custom.cfg.
EDITED TWICE TO REFLECT "IMPORT OLD BINARIES".
EDITED AGAIN FOR FUTURE REFERENCE AND TYPOS. 22 June 2024
Issuing the following command
watch -n 1 nvidia-smigenerates a display like the following.
Code:
Every 1.0s: nvidia-smi
Sat Jun 22 00:26:32 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A |
| 32% 58C P2 111W / 115W | 784MiB / 8188MiB | 99% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1164 G /usr/lib/xorg/Xorg 222MiB |
| 0 N/A N/A 1709 G xfwm4 2MiB |
| 0 N/A N/A 5008 G ...b/firefox-esr/firefox-esr 120MiB |
| 0 N/A N/A 5761 C ./katago 434MiB |
+-----------------------------------------------------------------------------+
The aim is to monitor the temperature of the GPU. In practice, it does not
exceed 60 Celsius when analysing at 100% GPU loadm which is good.
The bottom right corner shows memory used in the GPU's other role of driving the
display. When first running KataGo under lizzie, some memory was also taken by
leelaz, which is not doing anything. It may not matter much, but
killing that instance of leelaz removed it from the list. A better solution
is to use the Settings/Engine menu item in lizzie to blank the engine
parameters for leelaz in lizzie/target/config.txt so that leelaz never starts.
I used
Code:
kata1-b18c384nbt-s9996604416-d4316597426.bin.gz
not
Code:
kata1-b28c512nbt-s7168446720-d4316919285.bin.gz
I don't need the super strength, and the b28 model runs much slower on my PC.
Whether b28 at 3000 visits would be stronger than b18 at 7000 is not a question
I can answer. I always used 7000 visits with the old installation.