After a break, I have touched KaTrain again and changed the executable from <path>\lizzie\katago.exe to <path>\lizzie\katago_cuda.exe. This has been enough to run the latter. By name, I guess this means it is the CUDA version of KataGo. Of course, I do not trust names and have studied running files, processes and drivers in ProcessExplorer and Explorer as below.
However, first let me observe the different behaviours of the graphics card as to GPU and VRAM loads as follows. The CUDA version uses a bit more VRAM but loads the GPU more efficiently.
Code: Select all
Item katago katago_CUDA
GPU load 94% 81 ~ 90%
VRAM load 1.15GB 1.8 GB
Note what I have, or have not installed as follows. Instead of knowing in advance whether the Nvidia Studio drivers and Baduk AI Megapack have, or have not, already installed CUDA drivers and due to missing statements by experienced users, I have had to find out by trial and error that - apparently, I cannot be sure yet - CUDA drivers and CUDA libraries have already been installed. By file names, it appears - but again I cannot be sure yet - that CUDNN libraries have already been installed.
I do not know if some of the libraries must be in the same directory as the used katago_cuda.exe. It just happens to be so for the one I have been using so far. However, I cannot know whether this is a necessity. Is it?
From the Katago webpage and statements by some experienced users, there has been the strong recommendation to install cuda_12.1.1_531.14_windows.exe and cudnn-windows-x86_64-8.9.2.26_cuda12-archive.zip for CUDA and CUDNN libraries. However, I am not about to program my own neural net, have experienced that unnecessary driver / library installations can corrupt a system (in fact, for my very new PC, I have already experienced this for the AMD iGPU drivers) and it seems that katago_cuda.exe runs without the extra 4GB of installers. Therefore, at least before proceeding to tensor cores, it seems that their installation has been a bad recommendation. And this is what I call alchemy: forcing each go AI user to find out the correct installation procedure by trial and error.
Code: Select all
NOT INSTALLED YET
cuda_12.1.1_531.14_windows.exe Nvidia CUDA installer 3,3GB
cudnn-windows-x86_64-8.9.2.26_cuda12-archive.zip
Nvidia CUDNN installer-ZIP 0,7GB
INSTALLED
Baduk_AI_Megapack_v4.18.0_x64.exe Baduk AI Megapack
KaTrain Command
<path>\lizzie\katago_cuda.exe analysis -model <path>\lizzie\KataGo40b.gz
-config <path>\KaTrain\analysis_config.cfg -analysis-threads 12
-override-config homeDataDir=C:\Users\<username>/.katrain
Instead, I want to contribute information with which future AI newbies can make more informed decisions than mine. I have observed the following processes, libraries etc. on my new PC, among which many indicate (OpenCL and) CUDA and CUDNN. So if you do not know yet whether you have already installed, or still need to install, such, check for the following files or processes when running katago_cuda.exe:
Code: Select all
Go AI 64b Processses
<path>\KaTrain\KaTrain.exe KaTrain 1.7.2.0
<path>\lizzie\katago_cuda.exe katago_cuda.exe
C:\Windows\System32\conhost.exe Host für Konsolenfenster 10.0.22621.1194
Lizzie Katago
<path>\lizzie\katago_cuda.exe katago_cuda
Lizzie NVIDIA DLLs
<path>\lizzie\cublas64_11.dll NVIDIA CUDA BLAS Library 11.7.3.1
<path>\lizzie\cublasLt64_11.dll NVIDIA CUDA BLAS Light Library 11.7.3.1
<path>\lizzie\cudnn_cnn_infer64_8.dll NVIDIA CUDA CUDNN_CNN_INFER Library 11.4.128
<path>\lizzie\cudnn_ops_infer64_8.dll NVIDIA CUDA CUDNN_OPS_INFER Library 11.4.128
<path>\lizzie\cudnn64_8.dll NVIDIA CUDA CUDNN Library 6.5.0
<path>\lizzie\ <contains more CUDA files>
Lizzie OpenSSL DLLs
<path>\lizzie\libcrypto-1_1-x64.dll OpenSSL library The OpenSSL Project
<path>\lizzie\libssl-1_1-x64.dll OpenSSL library The OpenSSL Project
Lizzie Misc
<path>\lizzie\libz.dll zlib data compression library
<path>\lizzie\libzip.dll libzip for Windows
<path>\lizzie\ <contains more library files>
Nvidia Studio Driver DLLs
C:\Windows\System32\nvapi64.dll NVIDIA NVAPI Library 531.61
C:\Windows\System32\nvcuda.dll NVIDIA CUDA Driver 531.61
C:\Windows\System32\ <contains more Nvidia (CUDA) files>
C:\Windows\System32\DriverStore\FileRepository\
nv_dispsig.inf_amd64_89cdd9f6f9724565\nvcuda64.dll
NVIDIA CUDA Driver 531.61
C:\Windows\System32\DriverStore\FileRepository\
nv_dispsig.inf_amd64_89cdd9f6f9724565\
<contains more Nvidia (CUDA) files>
Nvidia System Services
C:\Windows\System32\DriverStore\FileRepository\
nv_dispsig.inf_amd64_89cdd9f6f9724565\nvcubins.bin
NVIDIA C:\Windows\System32\DriverStore\FileRepository\
nv_dispsig.inf_amd64_89cdd9f6f9724565\Display.NvContainer\NVDisplay.Container.exe
NVIDIA Container 1.37.3103.4323
"C:\Windows\System32\DriverStore\FileRepository\
nv_dispsig.inf_amd64_89cdd9f6f9724565\Display.NvContainer\NVDisplay.Container.exe"
-f %ProgramData%\NVIDIA\DisplaySessionContainer%d.log
-d C:\Windows\System32\DriverStore\FileRepository\
nv_dispsig.inf_amd64_89cdd9f6f9724565\Display.NvContainer\plugins\Session
-r -l 3 -p 30000 -cfg NVDisplay.ContainerLocalSystem\Session -c
RTX 4070 drivers, extract
C:\Windows\System32\drivers\NVIDIA Corporation\Drs\dbInstaller.exe
C:\Windows\System32\drivers\NVIDIA Corporation\Drs\nvdrsdb.bin
C:\Windows\System32\drivers\NVIDIA Corporation\license.txt
C:\Windows\System32\lxss\lib\libcuda.so
C:\Windows\System32\lxss\lib\libcuda.so.1
C:\Windows\System32\lxss\lib\libcuda.so.1.1
C:\Windows\System32\lxss\lib\libnvcuvid.so
C:\Windows\System32\lxss\lib\libnvcuvid.so.1
C:\Windows\System32\lxss\lib\libnvidia-ml.so.1
C:\Windows\System32\lxss\lib\<various>
C:\Windows\System32\MCU.exe
C:\Windows\System32\nvapi64.dll
C:\Windows\System32\nvcpl.dll
C:\Windows\System32\nvcuda.dll
C:\Windows\System32\nvcuvid.dll
C:\Windows\System32\OpenCL.dll
C:\Windows\System32\<various>
C:\Windows\SysWow64\<various>
C:\Windows\System32\DriverStore\FileRepository\nv_dispsig.inf_amd64_89cdd9f6f9724565\<various>
Needless to say, I still have the questions on tensor cores, to start with: have I already been using them?