It is currently Thu Mar 28, 2024 5:49 pm

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 54 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
Offline
 Post subject: Re: KaTrain Questions
Post #21 Posted: Sun Jun 04, 2023 3:58 am 
Lives in sente
User avatar

Posts: 1308
Liked others: 14
Was liked: 153
Rank: German 1 Kyu
RobertJasiek wrote:
I prefer the best AI play immediately. This is the attitude of high dans.
(As a mathematician, only 100% correctness is valid. 97% is not mathematics but statistics;) )

Dear Robert,

You do not really understand!
The percentages referred to the realisation of your presumed requirements / needs, not to the quality of KataGo's statements (which also rely on statistics, by the way :D ).

Even the "best" AI play will NEVER ALWAYS deliver "100% correctness", whatever you may understand by "correctness" (the best move in relation to the prospect of winning, the best move in relation to the size of the game's outcome, ...).
In the majority of cases, KataGo's objectives will NOT be yours.

And please always remember that KataGo (no matter how strong the network used is) will most likely NOT play "correctly" in positions that the network used has not worked through sufficiently during training (at least you cannot be sure that KataGo does). And I firmly believe that you have many positions of this type to be investigated up your sleeve.

IH120 is the most striking example.
For example, my IH120-60b (currently still) hallucinates changes in the order of moves where in fact there are none (i.e. no valid ones). And often it requires the gracious assistance of Karl's IH120-40b to determine that.
Do you know a valid solution for all the positions you want to investigate? For IH120 we already know some, which is incredibly helpful in the current phase of my IH120-60b training.

First of all, it is eminently important for you to gain experience. And that will be sufficiently possible with ANY strong KataGo network, even with the parameters' DEFAULT values!

_________________
The really most difficult Go problem ever: https://igohatsuyoron120.de/index.htm
Igo Hatsuyōron #120 (really solved by KataGo)

Top
 Profile  
 
Offline
 Post subject: Re: KaTrain Questions
Post #22 Posted: Sun Jun 04, 2023 4:25 am 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
Cassandra wrote:
Even the "best" AI play will NEVER ALWAYS deliver "100% correctness"


The joke has been beyond your perception...

Quote:
First of all, it is eminently important for you to gain experience.


Your approach is not mine. For me, I prefer good settings before gaining experience. Besides, I would prefer not having to reinvent the wheel by rediscovering everything other Katago users might already have discovered as to executables, drivers and settings.

AI usage should not be alchemy but should be readily applicable.

Top
 Profile  
 
Offline
 Post subject: Re: KaTrain Questions
Post #23 Posted: Sun Jun 04, 2023 4:53 am 
Lives in sente
User avatar

Posts: 1308
Liked others: 14
Was liked: 153
Rank: German 1 Kyu
RobertJasiek wrote:
Quote:
First of all, it is eminently important for you to gain experience.
Your approach is not mine. For me, I prefer good settings before gaining experience. Besides, I would prefer not having to reinvent the wheel by rediscovering everything other Katago users might already have discovered as to executables, ...

You could already read that the benefit of a TensorRT installation may not justify the additional effort required for it.
You should start with CUDA anyway, so why not stick with it (for now)?

Quote:
... drivers ...

If you are missing any, KataGo will not run. Usually there are also corresponding messages on the screen.

Quote:
... and settings.

Do you know anyone else in this world who shares your needs / demands?
So try it first and then let us know what didn't go as planned.

_________________
The really most difficult Go problem ever: https://igohatsuyoron120.de/index.htm
Igo Hatsuyōron #120 (really solved by KataGo)

Top
 Profile  
 
Offline
 Post subject: Re: KaTrain Questions
Post #24 Posted: Sun Jun 04, 2023 5:08 am 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
Cassandra wrote:
You could already read that the benefit of a TensorRT installation may not justify the additional effort required for it.


No, but I could read two types of opinions, the other being: TensorRT is about 1.5 as fast on my GPU, and may or may not be stronger depending on the position and net.

Quote:
You should start with CUDA anyway


Will do.

Quote:
so why not stick with it (for now)?


Because TensorRT can be about 1.5 as fast. If it were 1.005, I might not care but 1.5 is extraordinarily better.

Quote:
Quote:
... drivers ...

If you are missing any, KataGo will not run. Usually there are also corresponding messages on the screen.


Ok, this is a useful hint.

Quote:
Do you know anyone else in this world who shares your needs / demands?


Personalised meta-discussion leads nowhere.

As a broader meta-discussion, every user needs to decide between OpenCL, CUDA, Tensor and Eigen, would profit from 1.5x speed, and would profit from choosing good settings and nets with ease and confidence of sufficient understanding.

Top
 Profile  
 
Offline
 Post subject: Re: KaTrain Questions
Post #25 Posted: Mon Jun 05, 2023 2:53 am 
Lives in gote

Posts: 580
Location: Adelaide, South Australia
Liked others: 207
Was liked: 264
Rank: Australian 2 dan
GD Posts: 200
Robert, you're recapitulating a debate that's been going on in academic circles since the 1980s. Unfortunately, "statistics and alchemy" is actually a pretty good description of machine learning. It's an empirical practice, not an exact science. It certainly isn't a branch of pure mathematics. See https://projecteuclid.org/journals/stat ... 13726.full if you're interested in the philosophy.

I'd strongly recommend that you get KataGo using your GPU in the simplest way possible, spend a few weeks using it to explore go positions, and then make a decision on whether it's worth optimising the performance.

RobertJasiek wrote:
1) Have I already installed CUDA drivers as part of the Nvidia Studio driver?

"Drivers" isn't quite the right term. CUDA is a software library for GPUs. No, most likely you will need to install it. Type "install CUDA windows" or similar into your favourite search engine, and find a set of instructions that make sense to you. Ditto cuDNN and TensorRT, if you want/need to go ahead and install them. But I'd leave this for later.

RobertJasiek wrote:
9) (How) can I see in process or Nvidia tools whether a running Katago process uses OpenCL, CUDA or tensor cores?

I'm on Linux, and I believe you're on Windows, so I can't give you the exact answer. But you probably can't, because OpenCL/CUDA/TensorRT refer to software, while the Nvidia tools will only tell you what the hardware is doing. I expect the graphics card drivers would include some sort of status monitor to show which processes are using the GPU, how much GPU memory is allocated to each one, etc. It won't be in the Windows task manager though, but somewhere else. Browse through your system tray and control panels. Or google for advice :-)

If you run katago from a command line, it will display a lot of status information, including which backend it's using. This may or may not be useful!

RobertJasiek wrote:
10) How do I assess whether some Katago tuning parameters are better than others on my PC?

Using the same network and the same board position, different tuning parameters will give you different numbers of playouts per second. Higher numbers are better.

RobertJasiek wrote:
11) How do I see numbers of playouts?

Yes, good question. I just opened up katrain for the first time in a while, and remembered why I prefer to use Lizzie. The Lizzie status bar shows both total playouts and playouts per second. I haven't found how to do this in katrain. Hopefully someone else is reading this and can answer.

Top
 Profile  
 
Offline
 Post subject: Re: KaTrain Questions
Post #26 Posted: Mon Jun 05, 2023 3:57 am 
Lives in sente
User avatar

Posts: 1308
Liked others: 14
Was liked: 153
Rank: German 1 Kyu
xela wrote:
CUDA is a software library for GPUs. No, most likely you will need to install it. Type "install CUDA windows" or similar into your favourite search engine, and find a set of instructions that make sense to you. Ditto cuDNN and TensorRT, if you want/need to go ahead and install them.

According to my experience, you will have to make sure that ALL directories that contain the DLL's from these software libraries are known to your system via the PATH environment variable.
You will have to add some of these manually (Google will help you with this), if this has not already been done by the respective installation programme.

Quote:
But I'd leave this for later.

I'd strongly recommend that you get KataGo using your GPU in the simplest way possible, spend a few weeks using it to explore go positions, and then make a decision on whether it's worth optimising the performance.

I second this.

+ + + + + + + + + +

Most likely, you will (also) want to use KataGo to confirm your theories or to gain new insights into them.
In the process, as I have already mentioned, you will run into challenges that have NOTHING to do with performance. The more "unusual" the position you want to investigate, the higher the probability that KataGo will have to rely on its "global" Go knowledge only. KataGo will still play super-strong, but NOT as well and reliably as if the network used had sufficiently researched this (type of) position in its training phase.

In the games of my IH120-60b against Karl's IH120-40b, it may well happen that 60b has (sometimes noticeably) more playouts than 40b (per adjusted time unit, because 60b is by principle only half as fast as 40b).
In my estimation, this is an indication that 60b is more familiar (from the training) with the corresponding position than 40b.
In individual cases, this may well result in 40b being "fooled".
And this happens even though 60b is little trained compared to 40b and usually has no chance against 40b. Mainly because 60b currently still has the assessment that White will clearly win, and consequently tries avoidance strategies that naturally end in disaster.

In the time of Karl's 40b project, we have noticed that KataGo seems to have a strange preference for Triple-Ko, which is of course not so optimal in cases where one should correctly settle for a Double-Ko.
(Disclaimer: we do not know to what extent this is specific to IH120 only.)

_________________
The really most difficult Go problem ever: https://igohatsuyoron120.de/index.htm
Igo Hatsuyōron #120 (really solved by KataGo)

Top
 Profile  
 
Offline
 Post subject: Re: KaTrain Questions
Post #27 Posted: Mon Jun 05, 2023 4:58 am 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
xela wrote:
CUDA is a software library for GPUs. [...] OpenCL/CUDA/TensorRT refer to software


For three years, I have watched Youtube videos and read tech webpages on graphics cards. Everybody told that they have CUDA cores, tensor cores and other cores. Nobody there has ever mentioned that whatever cores could not be simply used by some softwares. Therefore, apparently naively I expected that the Nvidia either Gaming or Studio drivers would simply enable all software to use whichever cores it prefers to use.

On my ordinary PC, the iGPU uses several drivers, among which is C:\Windows\System32\OpenCL.dll. Therefore OpenCL can be both a driver and a software.

Now, from your explanations, I start to understand that there are CUDA cores, and CUDA or CuDNN software, for which Nvidia also refers to as containing more drivers. Similarly, there seem to be tensor cores and TensorRT software.

Hence I guess that KataGo might rely on drivers and dynamically linked software of Nvidia's software packages for CUDA cores and / or tensor cores.

I see, so the softwares mentioned in Youtube videos and on ordinary webpages might not be sufficiently advanced to need Nvidia's software packages while machine learning software, such as KataGo, is more advanced in its software design and needs them.

I hope I understand this roughly right.

Top
 Profile  
 
Offline
 Post subject: Re: KaTrain Questions
Post #28 Posted: Mon Jun 05, 2023 5:19 am 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
Cassandra wrote:
Most likely, you will (also) want to use KataGo to confirm your theories or to gain new insights into them.


You must distinguish my theories by their natures.

1) Mathematical theory established as proved theorems: I do not need any AI ever to confirm such theory because it is already established truth due to the proofs. We can only study whether AI is able to reach the same level of 100% correctness for those situations to which the theorems apply or whether AI does (much) worse.

2) Other go theory formulated as principles, methods or values for which there is a high correlation to professional / strong players' play: We can study whether AI agrees or how / when it disagrees.

3) Other go theory formulated as principles, methods or values that is useful to some extent but does not have a high correlation to professional / strong players' play: Our study of human or AI play might enable improved theory.

Apart from theory, there are also example positions for which I want to see whether AI "analyses" better than I have done. In particular, I am curious about my triple ladder analysis for a pro game's position. Will AI even be able to construct the complete ladders?

Top
 Profile  
 
Offline
 Post subject: Re: KaTrain Questions
Post #29 Posted: Tue Jun 06, 2023 5:49 am 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
After a break, I have touched KaTrain again and changed the executable from <path>\lizzie\katago.exe to <path>\lizzie\katago_cuda.exe. This has been enough to run the latter. By name, I guess this means it is the CUDA version of KataGo. Of course, I do not trust names and have studied running files, processes and drivers in ProcessExplorer and Explorer as below.

However, first let me observe the different behaviours of the graphics card as to GPU and VRAM loads as follows. The CUDA version uses a bit more VRAM but loads the GPU more efficiently.

Code:
Item       katago   katago_CUDA

GPU load   94%      81 ~ 90%

VRAM load  1.15GB    1.8 GB



Note what I have, or have not installed as follows. Instead of knowing in advance whether the Nvidia Studio drivers and Baduk AI Megapack have, or have not, already installed CUDA drivers and due to missing statements by experienced users, I have had to find out by trial and error that - apparently, I cannot be sure yet - CUDA drivers and CUDA libraries have already been installed. By file names, it appears - but again I cannot be sure yet - that CUDNN libraries have already been installed.

I do not know if some of the libraries must be in the same directory as the used katago_cuda.exe. It just happens to be so for the one I have been using so far. However, I cannot know whether this is a necessity. Is it?

From the Katago webpage and statements by some experienced users, there has been the strong recommendation to install cuda_12.1.1_531.14_windows.exe and cudnn-windows-x86_64-8.9.2.26_cuda12-archive.zip for CUDA and CUDNN libraries. However, I am not about to program my own neural net, have experienced that unnecessary driver / library installations can corrupt a system (in fact, for my very new PC, I have already experienced this for the AMD iGPU drivers) and it seems that katago_cuda.exe runs without the extra 4GB of installers. Therefore, at least before proceeding to tensor cores, it seems that their installation has been a bad recommendation. And this is what I call alchemy: forcing each go AI user to find out the correct installation procedure by trial and error.

Code:
NOT INSTALLED YET

cuda_12.1.1_531.14_windows.exe        Nvidia CUDA installer               3,3GB
cudnn-windows-x86_64-8.9.2.26_cuda12-archive.zip
                                      Nvidia CUDNN installer-ZIP          0,7GB

INSTALLED

Baduk_AI_Megapack_v4.18.0_x64.exe     Baduk AI Megapack

KaTrain Command

<path>\lizzie\katago_cuda.exe analysis -model <path>\lizzie\KataGo40b.gz
-config <path>\KaTrain\analysis_config.cfg -analysis-threads 12
-override-config homeDataDir=C:\Users\<username>/.katrain



Instead, I want to contribute information with which future AI newbies can make more informed decisions than mine. I have observed the following processes, libraries etc. on my new PC, among which many indicate (OpenCL and) CUDA and CUDNN. So if you do not know yet whether you have already installed, or still need to install, such, check for the following files or processes when running katago_cuda.exe:

Code:
Go AI 64b Processses

<path>\KaTrain\KaTrain.exe            KaTrain                              1.7.2.0
<path>\lizzie\katago_cuda.exe         katago_cuda.exe
C:\Windows\System32\conhost.exe       Host für Konsolenfenster             10.0.22621.1194

Lizzie Katago

<path>\lizzie\katago_cuda.exe         katago_cuda

Lizzie NVIDIA DLLs

<path>\lizzie\cublas64_11.dll         NVIDIA CUDA BLAS Library             11.7.3.1
<path>\lizzie\cublasLt64_11.dll       NVIDIA CUDA BLAS Light Library       11.7.3.1
<path>\lizzie\cudnn_cnn_infer64_8.dll NVIDIA CUDA CUDNN_CNN_INFER Library  11.4.128
<path>\lizzie\cudnn_ops_infer64_8.dll NVIDIA CUDA CUDNN_OPS_INFER Library  11.4.128
<path>\lizzie\cudnn64_8.dll           NVIDIA CUDA CUDNN Library            6.5.0
<path>\lizzie\                        <contains more CUDA files>

Lizzie OpenSSL DLLs

<path>\lizzie\libcrypto-1_1-x64.dll   OpenSSL library The OpenSSL Project
<path>\lizzie\libssl-1_1-x64.dll      OpenSSL library The OpenSSL Project

Lizzie Misc

<path>\lizzie\libz.dll                zlib data compression library
<path>\lizzie\libzip.dll              libzip for Windows
<path>\lizzie\                        <contains more library files>

Nvidia Studio Driver DLLs

C:\Windows\System32\nvapi64.dll       NVIDIA NVAPI Library                 531.61
C:\Windows\System32\nvcuda.dll        NVIDIA CUDA Driver                   531.61
C:\Windows\System32\                  <contains more Nvidia (CUDA) files>
C:\Windows\System32\DriverStore\FileRepository\
nv_dispsig.inf_amd64_89cdd9f6f9724565\nvcuda64.dll
                                      NVIDIA CUDA Driver                   531.61
C:\Windows\System32\DriverStore\FileRepository\
nv_dispsig.inf_amd64_89cdd9f6f9724565\
                                      <contains more Nvidia (CUDA) files>

Nvidia System Services

C:\Windows\System32\DriverStore\FileRepository\
nv_dispsig.inf_amd64_89cdd9f6f9724565\nvcubins.bin

NVIDIA C:\Windows\System32\DriverStore\FileRepository\
nv_dispsig.inf_amd64_89cdd9f6f9724565\Display.NvContainer\NVDisplay.Container.exe
                                      NVIDIA Container                     1.37.3103.4323
"C:\Windows\System32\DriverStore\FileRepository\
nv_dispsig.inf_amd64_89cdd9f6f9724565\Display.NvContainer\NVDisplay.Container.exe"
-f %ProgramData%\NVIDIA\DisplaySessionContainer%d.log
-d C:\Windows\System32\DriverStore\FileRepository\
nv_dispsig.inf_amd64_89cdd9f6f9724565\Display.NvContainer\plugins\Session
-r -l 3 -p 30000 -cfg NVDisplay.ContainerLocalSystem\Session -c


RTX 4070 drivers, extract

C:\Windows\System32\drivers\NVIDIA Corporation\Drs\dbInstaller.exe
C:\Windows\System32\drivers\NVIDIA Corporation\Drs\nvdrsdb.bin
C:\Windows\System32\drivers\NVIDIA Corporation\license.txt
C:\Windows\System32\lxss\lib\libcuda.so
C:\Windows\System32\lxss\lib\libcuda.so.1
C:\Windows\System32\lxss\lib\libcuda.so.1.1
C:\Windows\System32\lxss\lib\libnvcuvid.so
C:\Windows\System32\lxss\lib\libnvcuvid.so.1
C:\Windows\System32\lxss\lib\libnvidia-ml.so.1
C:\Windows\System32\lxss\lib\<various>
C:\Windows\System32\MCU.exe
C:\Windows\System32\nvapi64.dll
C:\Windows\System32\nvcpl.dll
C:\Windows\System32\nvcuda.dll
C:\Windows\System32\nvcuvid.dll
C:\Windows\System32\OpenCL.dll
C:\Windows\System32\<various>
C:\Windows\SysWow64\<various>
C:\Windows\System32\DriverStore\FileRepository\nv_dispsig.inf_amd64_89cdd9f6f9724565\<various>



Needless to say, I still have the questions on tensor cores, to start with: have I already been using them?

Top
 Profile  
 
Offline
 Post subject: Re: KaTrain Questions
Post #30 Posted: Tue Jun 06, 2023 10:05 pm 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
For possible later use, I have downloaded Nvidia's installers / ZIPs for CUDA, CuDNN and TensorRT. The files are not always labelled Windows 11 but sometimes Windows 10. The installation instructions are often outdated referring to older file versions in various combinations. Since I cannot know yet which combination of installers / ZIPs will work, I have downloaded several of their file versions. Once in a lifetime, Nvidia is not at full fault but only at partial fault for its outdated installation instructions and cryptic installer versions. With only 14 minutes download time left, the download of 5 of some dozen files stopped. It has turned out that this is my fault: my download partition got stuck at 0KB remaining space - it was full! At least, Windows kept working flawlessly (maybe because it is not my system partition). So I had to clean up the partition, login to Nvidia's webpage again and start the 5 remaining downloads afresh for another some half an hour.

Meanwhile, I have noticed that KataGo 1_13_0 of Baduk AI Megapack is reported to have a bug for tensor cores. So I have also downloaded the install ZIP of KataGo 1_13_1. Plus every installation instructions I could get hold of, of which some address the PATH jobs. Furthermore, I will install a tool listing files of directories so I can protocol any damage further installation might do to some files when older installers might replace newer with older files, because such is exactly what can happen (and has happened in the past). Therefore, I guess I might have every installer / ZIP I possibly need.

I think the order of installation is:
1. Nvidia GPU driver (done)
2. Nvidia CUDA version x
3. Nvidia CuDNN for CUDA version x
4. Nvidia TensorRT for CUDA version x
5. If necessary, set Windows system settings | environmental PATHs.

Independently, install KataGo 1_13_1.

Am I on the right track?

Even so, I still do not understand what in KataGo 1_13_1 enables use of tensor cores instead of CUDA cores. In earlier days, there were KataGo installers / ZIPs specific for Tensor[RT] but currently there are not. Is the KataGo 1_13_1 for CUDA also for Tensor[RT]? Is it sufficient to have told Baduk AI Megapack to create the right configuration files for using tensor cores or what, besides the command (line) calling KataGo in KaTrain or another GUI program, do I also need to do to let KataGo actually use tensor cores instead of CUDA cores?

Top
 Profile  
 
Offline
 Post subject: Re: KaTrain Questions
Post #31 Posted: Tue Jun 06, 2023 10:42 pm 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
* You don't need to do anything to make KataGo use tensor cores, and there's not a great way to be absolutely sure whether it does or not, unless you use OpenCL. The OpenCL version will tell you if it's using tensor cores or not as it runs the tuning the first time you run it. Look at the tuning output as it tunes each operation reporting various performance stats and you'll see a section where it tunes (or fails to tune) for tensor cores and whether or not it decides to use them. The CUDA and TensorRT versions just call Nvidia's libraries and those libraries determine what to do by their own underlying magic, and you just have to trust whatever they are doing. Unlike the OpenCL, KataGo has little to no control over it. If CUDA or TensorRT version runs and works without crashing, then that's probably what you're going to get and that's it. So I recommend not worrying about whether it uses tensor cores or whatever or even trying to find that out because *you* also have little to no control over such low-level details, just benchmark each thing and see what finally gives the best visits/s (./katago.exe benchmark).

* KataGo has *always* historically had separate TensorRT and CUDA versions, and the same is true now, it's just that v1.13.0 had a bug with TensorRT (*NOT* a bug with tensor cores, tensor cores and TensorRT are completely different things with no particular relationship to each other), so it got its own release v1.13.1 with a fix, whereas all other versions (OpenCL, Eigen, CUDA) were not because they had no difference. Use TensorRT if you want to attempt Nvidia black magic, which goes beyond even the CUDA version by doing some secret Nvidia proprietary magic optimization of different layers of operations and such, which neither you or I have control over, which if it works, might squeeze out a bit more performance at the cost of much longer startup and loading times. Otherwise, just use whatever version works for you. CUDA is fine if you've installed CUDA. OpenCL is fine too, and has a decent chance of working right out of the box, it comes zipped with all the DLLs already that it should need.

* It's been a long time since I tried to install this stuff on Windows. I think that once you have the right drivers, "installation" consists of just having the right DLLs in your path, which are some some CUDA and CUDNN dlls for CUDA, or some DLL with "nvinfer" in the name for TensorRT ("nvinfer" is Nvidia's technical name for TensorRT that it uses for filenames or some technical docs). So, don't trust me on this too much since it's been a while, but I think for example that one could even crudely find the appropriate DLLs from digging into the installation folders and just copy them into the katago executable directory (since Windows also normally considers the local directory of an exe to be a search location for DLLs for that exe).

Top
 Profile  
 
Offline
 Post subject: Re: KaTrain Questions
Post #32 Posted: Wed Jun 07, 2023 1:06 am 
Gosei
User avatar

Posts: 1348
Liked others: 202
Was liked: 203
in order to run katago-v1.13.1-trt8.5-cuda11.2-windows-x64, I downloaded Lizzieyzy and copied the nvinfer.dll, nvinfer_builder_resource.dll from there. I didn't install anything else. I checked in Sabaki with different networks.
it works about twice as fast as opencl for versions v1.12.4. for v1.13 I haven't compared yet

Lizzieyzy https://github.com/yzyray/lizzieyzy/releases
2023-01-30-windows64+katago.zip ~1.8gb
https://drive.google.com/file/d/1fhad97 ... drive_link


This post by And was liked by: Dragon
Top
 Profile  
 
Offline
 Post subject: Re: KaTrain Questions
Post #33 Posted: Wed Jun 07, 2023 2:38 am 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
Thank you both! Now I have something to look for, try and benchmark. If I should install all the Nvidia stuff, I might then fetch the suitable DLLs and put them in KataGo's directory or see if setting PATH does the job.

Top
 Profile  
 
Offline
 Post subject: Re: KaTrain Questions
Post #34 Posted: Wed Jun 07, 2023 4:14 pm 
Gosei
User avatar

Posts: 1348
Liked others: 202
Was liked: 203
v1.13 katago_tensorRT ~2.2 times faster than katago_opencl (GeForce GTX 1650, b18)

Top
 Profile  
 
Offline
 Post subject: Re: KaTrain Questions
Post #35 Posted: Fri Jun 09, 2023 2:31 am 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
Suppose C:\katago is my directory to KataGo OpenCL, C:\baduk\katrain is my directory to KaTrain and C:\baduk is my directory of Baduk AI Megapack.

Now that I could run KataGo OpenCL, CUDA and TensorRT each in its directory as

katago benchmark

on the command line, I want to start with KataGo OpenCL in KaTrain in Windows. In C:\katago I run

katago.exe genconfig -model b18.bin.gz -output gtp_custom.cfg

and answer the questions as follows:

Code:
rules = new-zealand
search limits | limit = n
pondering = y
max num seconds =
Found OpenCL Device 0 = AMD
Found OpenCL Device 1 = RTX 4070
GPUs to use = 1
max cache up to ~3GB in addition to whatever current search is using =
number of visits to test/tune performance with = 10000
number of seconds/move to optimise performance (default 5) =


KataGo creates gtp_custom.cfg:

Code:
# Config for KataGo C++ GTP engine, i.e. "./katago.exe gtp"

# In this config, when a parameter is given as a commented out value,
# that value also is the default value, unless described otherwise. You can
# uncomment it (remove the pound sign) and change it if you want.

# ===========================================================================
# Command-line usage
# ===========================================================================
# All of the below values may be set or overridden via command-line arguments:
#
# -override-config KEY=VALUE,KEY=VALUE,...

# ===========================================================================
# Logs and files
# ===========================================================================
# This section defines where and what logging information is produced.

# Each run of KataGo will log to a separate file in this dir.
# This is the default.
logDir = gtp_logs
# Uncomment and specify this instead of logDir to write separate dated subdirs
# logDirDated = gtp_logs
# Uncomment and specify this instead of logDir to log to only a single file
# logFile = gtp.log

# Logging options
logAllGTPCommunication = true
logSearchInfo = true
logToStderr = false

# KataGo will display some info to stderr on GTP startup
# Uncomment the next line and set it to false to suppress that and remain silent
# startupPrintMessageToStderr = true

# Write information to stderr, for use in things like malkovich chat to OGS.
# ogsChatToStderr = false

# Uncomment and set this to a directory to override where openCLTuner files
# and other cached data is written. By default it saves into a subdir of the
# current directory on windows, and a subdir of ~/.katago on Linux.
# homeDataDir = PATH_TO_DIRECTORY

# ===========================================================================
# Analysis
# ===========================================================================
# This section configures analysis settings.
#
# The maximum number of moves after the first move displayed in variations
# from analysis commands like kata-analyze or lz-analyze.
# analysisPVLen = 15

# Report winrates for chat and analysis as (BLACK|WHITE|SIDETOMOVE).
# Most GUIs and analysis tools will expect SIDETOMOVE.
# reportAnalysisWinratesAs = SIDETOMOVE

# Extra noise for wider exploration. Large values will force KataGo to
# analyze a greater variety of moves than it normally would.
# An extreme value like 1 distributes playouts across every move on the board,
# even very bad moves.
# Affects analysis only, does not affect play.

# analysisWideRootNoise = 0.04

# ===========================================================================
# Rules
# ===========================================================================
# This section configures the scoring and playing rules. Rules can also be
# changed mid-run by issuing custom GTP commands.
#
# See https://lightvector.github.io/KataGo/rules.html for rules details.
#
# See https://github.com/lightvector/KataGo/blob/master/docs/GTP_Extensions.md
# for GTP commands.

koRule = SITUATIONAL  # options: SIMPLE, POSITIONAL, SITUATIONAL

scoringRule = AREA  # options: AREA, TERRITORY

taxRule = NONE  # options: NONE, SEKI, ALL

multiStoneSuicideLegal = true

hasButton = false

whiteHandicapBonus = 0  # options: 0, N, N-1

friendlyPassOk = true

# ===========================================================================
# Bot behavior
# ===========================================================================

# ------------------------------
# Resignation
# ------------------------------

# Resignation occurs if for at least resignConsecTurns in a row, the
# winLossUtility (on a [-1,1] scale) is below resignThreshold.
allowResignation = true
resignThreshold = -0.90
resignConsecTurns = 3

# By default, KataGo may resign games that it is confidently losing even if they
# are very close in score. Uncomment and set this to avoid resigning games
# if the estimated difference is points is less than or equal to this.
# resignMinScoreDifference = 10

# ------------------------------
# Handicap
# ------------------------------
# Assume that if black makes many moves in a row right at the start of the
# game, then the game is a handicap game. This is necessary on some servers
# and for some GUIs and also when initializing from many SGF files, which may
# set up a handicap game using repeated GTP "play" commands for black rather
# than GTP "place_free_handicap" commands; however, it may also lead to
# incorrect understanding of komi if whiteHandicapBonus is used and a server
# does not have such a practice. Uncomment and set to false to disable.
# assumeMultipleStartingBlackMovesAreHandicap = true

# Makes katago dynamically adjust in handicap or altered-komi games to assume
# based on those game settings that it must be stronger or weaker than the
# opponent and to play accordingly. Greatly improves handicap strength by
# biasing winrates and scores to favor appropriate safe/aggressive play.
# Does not affect analysis (lz-analyze, kata-analyze, used by programs like
# Lizzie) so analysis remains unbiased. Uncomment and set this to 0 to disable
# this and make KataGo play the same always.
# dynamicPlayoutDoublingAdvantageCapPerOppLead = 0.045

# Instead of "dynamicPlayoutDoublingAdvantageCapPerOppLead", you can comment
# that out and uncomment and set "playoutDoublingAdvantage" to a value between
# from -3.0 to 3.0 to set KataGo's aggression to a FIXED level. This affects
# analysis tools (lz-analyze, kata-analyze, used by programs like Lizzie).
# Negative makes KataGo behave as if it is much weaker than the opponent,
# preferring to play defensively. Positive makes KataGo behave as if it is
# much stronger than the opponent, prefering to play aggressively or even
# overplay slightly.
#
# If this and "dynamicPlayoutDoublingAdvantageCapPerOppLead" are both set
# then dynamic will be used for all games and this fixed value will be used
# for analysis tools.
# playoutDoublingAdvantage = 0.0

# Uncomment one of these when using "playoutDoublingAdvantage" to enforce
# that it will only apply when KataGo plays as the specified color and will be
# negated when playing as the opposite color.
# playoutDoublingAdvantagePla = BLACK
# playoutDoublingAdvantagePla = WHITE

# ------------------------------
# Passing and cleanup
# ------------------------------
# Make the bot never assume that its pass will end the game, even if passing
# would end and "win" under Tromp-Taylor rules. Usually this is a good idea
# when using it for analysis or playing on servers where scoring may be
# implemented non-tromp-taylorly. Uncomment and set to false to disable.
# conservativePass = true

# When using territory scoring, self-play games continue beyond two passes
# with special cleanup rules that may be confusing for human players. This
# option prevents the special cleanup phases from being reachable when using
# the bot for GTP play. Uncomment and set to false to enable entering special
# cleanup. For example, if you are testing it against itself, or against
# another bot that has precisely implemented the rules documented at
# https://lightvector.github.io/KataGo/rules.html
# preventCleanupPhase = true

# ------------------------------
# Miscellaneous behavior
# ------------------------------
# If the board is symmetric, search only one copy of each equivalent move.
# Attempts to also account for ko/superko, will not theoretically perfect for
# superko. Uncomment and set to false to disable.
# rootSymmetryPruning = true

# Uncomment and set to true to avoid a particular joseki that some networks
# misevaluate, and also to improve opening diversity versus some particular
# other bots that like to play it all the time.
# avoidMYTDaggerHack = false

# Prefer to avoid playing the same joseki in every corner of the board.
# Uncomment to set to a specific value. See "Avoid SGF patterns" section.
# By default: 0 (even games), 0.005 (handicap games)
# avoidRepeatedPatternUtility = 0.0

# Experimental logic to fight against mirror Go even with unfavorable komi.
# Uncomment to set to a specific value to use for both playing and analysis.
# By default: true when playing via GTP, but false when analyzing.
# antiMirror = true

# Enable some hacks that mitigate rare instances when passing messes up deeper searches.
# enablePassingHacks = true


# ===========================================================================
# Search limits
# ===========================================================================

# Terminology:
# "Playouts" is the number of new playouts of search performed each turn.
# "Visits" is the same as "Playouts" but also counts search performed on
# previous turns that is still applicable to this turn.
# "Time" is the time in seconds.

# For example, if KataGo searched 200 nodes on the previous turn, and then
# after the opponent's reply, 50 nodes of its search tree was still valid,
# then a visit limit of 200 would allow KataGo to search 150 new nodes
# (for a final tree size of 200 nodes), whereas a playout limit of of 200
# would allow KataGo to search 200 nodes (for a final tree size of 250 nodes).

# Additionally, KataGo may also move before than the limit in order to
# obey time controls (e.g. byo-yomi, etc) if the GTP controller has
# told KataGo that the game has is being played with a given time control.

# Limits for search on the current turn.
# If commented out or unspecified, the default is to have no limit.
# maxVisits = 500
# maxPlayouts = 300
# maxTime = 10.0

# Ponder on the opponent's turn?
ponderingEnabled = true
# maxTimePondering = 60.0

# ------------------------------
# Other search limits and behavior
# ------------------------------

# Approx number of seconds to buffer for lag for GTP time controls - will
# move a bit faster assuming there is this much lag per move.
lagBuffer = 1.0

# Number of threads to use in search
numSearchThreads = 40

# Play a little faster if the opponent is passing, for human-friendliness.
# Comment these out to disable them, such as if running a controlled match
# where you are testing KataGo with fixed compute per move vs other bots.
searchFactorAfterOnePass = 0.50
searchFactorAfterTwoPass = 0.25

# Play a little faster if super-winning, for human-friendliness.
# Comment these out to disable them, such as if running a controlled match
# where you are testing KataGo with fixed compute per move vs other bots.
searchFactorWhenWinning = 0.40
searchFactorWhenWinningThreshold = 0.95

# ===========================================================================
# GPU settings
# ===========================================================================
# This section configures GPU settings.
#
# Maximum number of positions to send to a single GPU at once. The default
# value is roughly equal to numSearchThreads, but can be specified manually
# if running out of memory, or using multiple GPUs that expect to share work.
# nnMaxBatchSize = <integer>

# Controls the neural network cache size, which is the primary RAM/memory use.
# KataGo will cache up to (2 ** nnCacheSizePowerOfTwo) many neural net
# evaluations in case of transpositions in the tree.
# Increase this to improve performance for searches with tens of thousands
# of visits or more. Decrease this to limit memory usage.
# If you're happy to do some math - each neural net entry takes roughly
# 1.5KB, except when using whole-board ownership/territory
# visualizations, where each entry will take roughly 3KB. The number of
# entries is (2 ** nnCacheSizePowerOfTwo). (E.g. 2 ** 18 = 262144.)
# You can compute roughly how much memory the cache will use based on this.
nnCacheSizePowerOfTwo = 20

# Size of mutex pool for nnCache is (2 ** this).
nnMutexPoolSizePowerOfTwo = 16

numNNServerThreadsPerModel = 1
openclDeviceToUseThread0 = 1


# ===========================================================================
# Root move selection and biases
# ===========================================================================
# Uncomment and edit any of the below values to change them from their default.

# If provided, force usage of a specific seed for various random things in
# the search. The default is to use a random seed.
# searchRandSeed = hijklmn

# Temperature for the early game, randomize between chosen moves with
# this temperature
# chosenMoveTemperatureEarly = 0.5

# Decay temperature for the early game by 0.5 every this many moves,
# scaled with board size.
# chosenMoveTemperatureHalflife = 19

# At the end of search after the early game, randomize between chosen
# moves with this temperature
# chosenMoveTemperature = 0.10

# Subtract this many visits from each move prior to applying
# chosenMoveTemperature (unless all moves have too few visits) to downweight
# unlikely moves
# chosenMoveSubtract = 0

# The same as chosenMoveSubtract but only prunes moves that fall below
# the threshold. This setting does not affect chosenMoveSubtract.
# chosenMovePrune = 1

# Number of symmetries to sample (without replacement) and average at the root
# rootNumSymmetriesToSample = 1

# Using LCB for move selection?
# useLcbForSelection = true

# How many stdevs a move needs to be better than another for LCB selection
# lcbStdevs = 5.0

# Only use LCB override when a move has this proportion of visits as the
# top move.
# minVisitPropForLCB = 0.15

# ===========================================================================
# Internal params
# ===========================================================================
# Uncomment and edit any of the below values to change them from their default.

# Scales the utility of winning/losing
# winLossUtilityFactor = 1.0

# Scales the utility for trying to maximize score
# staticScoreUtilityFactor = 0.10
# dynamicScoreUtilityFactor = 0.30

# Adjust dynamic score center this proportion of the way towards zero,
# capped at a reasonable amount.
# dynamicScoreCenterZeroWeight = 0.20
# dynamicScoreCenterScale = 0.75

# The utility of getting a "no result" due to triple ko or other long cycle
# in non-superko rulesets (-1 to 1)
# noResultUtilityForWhite = 0.0

# The number of wins that a draw counts as, for white. (0 to 1)
# drawEquivalentWinsForWhite = 0.5

# Exploration constant for mcts
# cpuctExploration = 1.0
# cpuctExplorationLog = 0.45

# Parameters that control exploring more in volatile positions, exploring
# less in stable positions.
# cpuctUtilityStdevPrior = 0.40
# cpuctUtilityStdevPriorWeight = 2.0
# cpuctUtilityStdevScale = 0.85

# FPU reduction constant for mcts
# fpuReductionMax = 0.2
# rootFpuReductionMax = 0.1
# fpuParentWeightByVisitedPolicy = true

# Parameters that control weighting of evals based on the net's own
# self-reported uncertainty.
# useUncertainty = true
# uncertaintyExponent = 1.0
# uncertaintyCoeff = 0.25

# Explore using optimistic policy
# rootPolicyOptimism = 0.2
# policyOptimism = 1.0

# Amount to apply a downweighting of children with very bad values relative
# to good ones.
# valueWeightExponent = 0.25

# Slight incentive for the bot to behave human-like with regard to passing at
# the end, filling the dame, not wasting time playing in its own territory,
# etc., and not play moves that are equivalent in terms of points but a bit
# more unfriendly to humans.
# rootEndingBonusPoints = 0.5

# Make the bot prune useless moves that are just prolonging the game to
# avoid losing yet.
# rootPruneUselessMoves = true

# Apply bias correction based on local pattern keys
# subtreeValueBiasFactor = 0.45
# subtreeValueBiasWeightExponent = 0.85

# Use graph search rather than tree search - identify and share search for
# transpositions.
# useGraphSearch = true

# How much to shard the node table for search synchronization
# nodeTableShardsPowerOfTwo = 16

# How many virtual losses to add when a thread descends through a node
# numVirtualLossesPerThread = 1

# Improve the quality of evals under heavy multithreading
# useNoisePruning = true

# ===========================================================================
# Avoid SGF patterns
# ===========================================================================
# The parameters in this section provide a way to avoid moves that follow
# specific patterns based on a set of SGF files loaded upon startup.
# Uncomment them to use this feature. Additionally, if the SGF file
# contains the string %SKIP% in a comment on a move, that move will be
# ignored for this purpose.

# Load SGF files from this directory when the engine is started
# (only on startup, will not reload unless engine is restarted)
# avoidSgfPatternDirs = path/to/directory/with/sgfs/
# You can also surround the file path in double quotes if the file path contains trailing spaces or hash signs.
# Within double quotes, backslashes are escape characters.
# avoidSgfPatternDirs = "path/to/directory/with/sgfs/"

# Penalize this much utility per matching move.
# Set this negative if you instead want to favor SGF patterns instead of
# penalizing them. This number does not need to be large, even 0.001 will
# make a difference. Values that are too large may lead to bad play.
# avoidSgfPatternUtility = 0.001

# Optional - load only the newest this many files
# avoidSgfPatternMaxFiles = 20

# Optional - Penalty is multiplied by this per each older SGF file, so that
# old SGF files matter less than newer ones.
# avoidSgfPatternLambda = 0.90

# Optional - pay attention only to moves made by players with this name.
# For example, set it to the name that your bot's past games will show up
# as in the SGF, so that the bot will only avoid repeating moves that itself
# made in past games, not the moves that its opponents made.
# avoidSgfPatternAllowedNames = my-ogs-bot-name1,my-ogs-bot-name2

# Optional - Ignore moves in SGF files that occurred before this turn number.
# avoidSgfPatternMinTurnNumber = 0

# For more avoid patterns:
# You can also specify a second set of parameters, and a third, fourth,
# etc. by numbering 2,3,4,...
#
# avoidSgf2PatternDirs = ...
# avoidSgf2PatternUtility = ...
# avoidSgf2PatternMaxFiles = ...
# avoidSgf2PatternLambda = ...
# avoidSgf2PatternAllowedNames = ...
# avoidSgf2PatternMinTurnNumber = ...


KataGo creates this LOG file:

Code:
2023-06-09 07:28:08+0200: Running with following config:
allowResignation = true
friendlyPassOk = true
hasButton = false
koRule = SITUATIONAL
lagBuffer = 1.0
logAllGTPCommunication = true
logDir = gtp_logs
logSearchInfo = true
logToStderr = false
multiStoneSuicideLegal = true
nnCacheSizePowerOfTwo = 20
nnMutexPoolSizePowerOfTwo = 16
numNNServerThreadsPerModel = 1
numSearchThreads = 6
openclDeviceToUseThread0 = 1
ponderingEnabled = true
resignConsecTurns = 3
resignThreshold = -0.90
scoringRule = AREA
searchFactorAfterOnePass = 0.50
searchFactorAfterTwoPass = 0.25
searchFactorWhenWinning = 0.40
searchFactorWhenWinningThreshold = 0.95
taxRule = NONE
whiteHandicapBonus = 0

2023-06-09 07:28:08+0200: Loading model and initializing benchmark...
2023-06-09 07:28:08+0200: nnRandSeed0 = 4385763048445920344
2023-06-09 07:28:08+0200: After dedups: nnModelFile0 = b18.bin.gz useFP16 auto useNHWC auto
2023-06-09 07:28:08+0200: Initializing neural net buffer to be size 19 * 19 exactly
2023-06-09 07:28:08+0200: Found OpenCL Platform 0: AMD Accelerated Parallel Processing (Advanced Micro Devices, Inc.) (OpenCL 2.1 AMD-APP (3516.0))
2023-06-09 07:28:08+0200: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator
2023-06-09 07:28:08+0200: Found OpenCL Platform 1: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 07:28:08+0200: Found 1 device(s) on platform 1 with type CPU or GPU or Accelerator
2023-06-09 07:28:08+0200: Found OpenCL Device 0: gfx1036 (Advanced Micro Devices, Inc.) (score 11000200)
2023-06-09 07:28:08+0200: Found OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) (score 11000300)
2023-06-09 07:28:08+0200: Creating context for OpenCL Platform: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 07:28:09+0200: Using OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) OpenCL 3.0 CUDA (Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32)
2023-06-09 07:28:09+0200: Loaded tuning parameters from: C:\katago/KataGoData/opencltuning/tune11_gpuNVIDIAGeForceRTX4070_x19_y19_c384_mv11.txt
2023-06-09 07:28:09+0200: OpenCL backend thread 0: Device 1 Model version 11
2023-06-09 07:28:09+0200: OpenCL backend thread 0: Device 1 Model name: kata1-b18c384nbt-s6386600960-d3368371862
2023-06-09 07:28:09+0200: OpenCL backend thread 0: Device 1 FP16Storage true FP16Compute false FP16TensorCores true FP16TensorCoresFor1x1 false
2023-06-09 07:36:57+0200: GPU 1 finishing, processed 456614 rows 74761 batches
2023-06-09 07:36:57+0200: nnRandSeed0 = 495893077473133403
2023-06-09 07:36:57+0200: After dedups: nnModelFile0 = b18.bin.gz useFP16 auto useNHWC auto
2023-06-09 07:36:57+0200: Initializing neural net buffer to be size 19 * 19 exactly
2023-06-09 07:36:58+0200: Found OpenCL Platform 0: AMD Accelerated Parallel Processing (Advanced Micro Devices, Inc.) (OpenCL 2.1 AMD-APP (3516.0))
2023-06-09 07:36:58+0200: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator
2023-06-09 07:36:58+0200: Found OpenCL Platform 1: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 07:36:58+0200: Found 1 device(s) on platform 1 with type CPU or GPU or Accelerator
2023-06-09 07:36:58+0200: Found OpenCL Device 0: gfx1036 (Advanced Micro Devices, Inc.) (score 11000200)
2023-06-09 07:36:58+0200: Found OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) (score 11000300)
2023-06-09 07:36:58+0200: Creating context for OpenCL Platform: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 07:36:58+0200: Using OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) OpenCL 3.0 CUDA (Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32)
2023-06-09 07:36:58+0200: Loaded tuning parameters from: C:\katago/KataGoData/opencltuning/tune11_gpuNVIDIAGeForceRTX4070_x19_y19_c384_mv11.txt
2023-06-09 07:36:58+0200: OpenCL backend thread 0: Device 1 Model version 11
2023-06-09 07:36:58+0200: OpenCL backend thread 0: Device 1 Model name: kata1-b18c384nbt-s6386600960-d3368371862
2023-06-09 07:36:59+0200: OpenCL backend thread 0: Device 1 FP16Storage true FP16Compute false FP16TensorCores true FP16TensorCoresFor1x1 false
2023-06-09 07:39:46+0200: GPU 1 finishing, processed 191655 rows 7196 batches


The command line LOG is:

Code:
C:\katago>katago.exe genconfig -model b18.bin.gz -output gtp_custom.cfg

=========================================================================
RULES

What rules should KataGo use by default for play and analysis?
(chinese, japanese, korean, tromp-taylor, aga, chinese-ogs, new-zealand, bga, stone-scoring, aga-button):
new-zealand

=========================================================================
SEARCH LIMITS

When playing games, KataGo will always obey the time controls given by the GUI/tournament/match/online server.
But you can specify an additional limit to make KataGo move much faster. This does NOT affect analysis/review,
only affects playing games. Add a limit? (y/n) (default n):
n

NOTE: No limits configured for KataGo. KataGo will obey time controls provided by the GUI or server or match script
but if they don't specify any, when playing games KataGo may think forever without moving. (press enter to continue)


When playing games, KataGo can optionally ponder during the opponent's turn. This gives faster/stronger play
in real games but should NOT be enabled if you are running tests with fixed limits (pondering may exceed those
limits), or to avoid stealing the opponent's compute time when testing two bots on the same machine.
Enable pondering? (y/n, default n):y

Specify max num seconds KataGo should ponder during the opponent's turn. Leave blank for no limit:


=========================================================================
GPUS AND RAM

Finding available GPU-like devices...
Found OpenCL Device 0: gfx1036 (Advanced Micro Devices, Inc.) (score 11000200)
Found OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) (score 11000300)

Specify devices/GPUs to use (for example "0,1,2" to use devices 0, 1, and 2). Leave blank for a default SINGLE-GPU config:
1

By default, KataGo will cache up to about 3GB of positions in memory (RAM), in addition to
whatever the current search is using. Specify a different max in GB or leave blank for default:


=========================================================================
PERFORMANCE TUNING

Specify number of visits to use test/tune performance with, leave blank for default based on GPU speed.
Use large number for more accurate results, small if your GPU is old and this is taking forever:
10000

Specify number of seconds/move to optimize performance for (default 5), leave blank for default:

2023-06-09 07:28:08+0200: Running with following config:
allowResignation = true
friendlyPassOk = true
hasButton = false
koRule = SITUATIONAL
lagBuffer = 1.0
logAllGTPCommunication = true
logDir = gtp_logs
logSearchInfo = true
logToStderr = false
multiStoneSuicideLegal = true
nnCacheSizePowerOfTwo = 20
nnMutexPoolSizePowerOfTwo = 16
numNNServerThreadsPerModel = 1
numSearchThreads = 6
openclDeviceToUseThread0 = 1
ponderingEnabled = true
resignConsecTurns = 3
resignThreshold = -0.90
scoringRule = AREA
searchFactorAfterOnePass = 0.50
searchFactorAfterTwoPass = 0.25
searchFactorWhenWinning = 0.40
searchFactorWhenWinningThreshold = 0.95
taxRule = NONE
whiteHandicapBonus = 0

2023-06-09 07:28:08+0200: Loading model and initializing benchmark...

2023-06-09 07:28:08+0200: nnRandSeed0 = 4385763048445920344
2023-06-09 07:28:08+0200: After dedups: nnModelFile0 = b18.bin.gz useFP16 auto useNHWC auto
2023-06-09 07:28:08+0200: Initializing neural net buffer to be size 19 * 19 exactly
2023-06-09 07:28:08+0200: Found OpenCL Platform 0: AMD Accelerated Parallel Processing (Advanced Micro Devices, Inc.) (OpenCL 2.1 AMD-APP (3516.0))
2023-06-09 07:28:08+0200: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator
2023-06-09 07:28:08+0200: Found OpenCL Platform 1: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 07:28:08+0200: Found 1 device(s) on platform 1 with type CPU or GPU or Accelerator
2023-06-09 07:28:08+0200: Found OpenCL Device 0: gfx1036 (Advanced Micro Devices, Inc.) (score 11000200)
2023-06-09 07:28:08+0200: Found OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) (score 11000300)
2023-06-09 07:28:08+0200: Creating context for OpenCL Platform: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 07:28:09+0200: Using OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) OpenCL 3.0 CUDA (Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32)
2023-06-09 07:28:09+0200: Loaded tuning parameters from: C:\katago/KataGoData/opencltuning/tune11_gpuNVIDIAGeForceRTX4070_x19_y19_c384_mv11.txt
2023-06-09 07:28:09+0200: OpenCL backend thread 0: Device 1 Model version 11
2023-06-09 07:28:09+0200: OpenCL backend thread 0: Device 1 Model name: kata1-b18c384nbt-s6386600960-d3368371862
2023-06-09 07:28:09+0200: OpenCL backend thread 0: Device 1 FP16Storage true FP16Compute false FP16TensorCores true FP16TensorCoresFor1x1 false

=========================================================================
TUNING NOW
Tuning using 10000 visits.
Automatically trying different numbers of threads to home in on the best (board size 19x19):


Possible numbers of threads to test: 1, 2, 3, 4, 5, 6, 8, 10, 12, 16, 20, 24, 32,

numSearchThreads =  5: 10 / 10 positions, visits/s = 839.35 nnEvals/s = 563.10 nnBatches/s = 225.34 avgBatchSize = 2.50 (119.2 secs)
numSearchThreads = 12: 10 / 10 positions, visits/s = 1330.61 nnEvals/s = 877.08 nnBatches/s = 146.43 avgBatchSize = 5.99 (75.2 secs)
numSearchThreads = 10: 10 / 10 positions, visits/s = 1251.00 nnEvals/s = 797.66 nnBatches/s = 159.74 avgBatchSize = 4.99 (80.0 secs)
numSearchThreads = 20: 10 / 10 positions, visits/s = 1582.83 nnEvals/s = 1014.95 nnBatches/s = 101.76 avgBatchSize = 9.97 (63.3 secs)
numSearchThreads = 16: 10 / 10 positions, visits/s = 1458.56 nnEvals/s = 947.94 nnBatches/s = 118.78 avgBatchSize = 7.98 (68.7 secs)
numSearchThreads = 24: 10 / 10 positions, visits/s = 1662.11 nnEvals/s = 1076.06 nnBatches/s = 89.90 avgBatchSize = 11.97 (60.3 secs)
numSearchThreads = 32: 10 / 10 positions, visits/s = 1683.90 nnEvals/s = 1099.53 nnBatches/s = 68.60 avgBatchSize = 16.03 (59.6 secs)


Optimal number of threads is fairly high, increasing the search limit and trying again.

2023-06-09 07:36:57+0200: GPU 1 finishing, processed 456614 rows 74761 batches
2023-06-09 07:36:57+0200: nnRandSeed0 = 495893077473133403
2023-06-09 07:36:57+0200: After dedups: nnModelFile0 = b18.bin.gz useFP16 auto useNHWC auto
2023-06-09 07:36:57+0200: Initializing neural net buffer to be size 19 * 19 exactly
2023-06-09 07:36:58+0200: Found OpenCL Platform 0: AMD Accelerated Parallel Processing (Advanced Micro Devices, Inc.) (OpenCL 2.1 AMD-APP (3516.0))
2023-06-09 07:36:58+0200: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator
2023-06-09 07:36:58+0200: Found OpenCL Platform 1: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 07:36:58+0200: Found 1 device(s) on platform 1 with type CPU or GPU or Accelerator
2023-06-09 07:36:58+0200: Found OpenCL Device 0: gfx1036 (Advanced Micro Devices, Inc.) (score 11000200)
2023-06-09 07:36:58+0200: Found OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) (score 11000300)
2023-06-09 07:36:58+0200: Creating context for OpenCL Platform: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 07:36:58+0200: Using OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) OpenCL 3.0 CUDA (Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32)
2023-06-09 07:36:58+0200: Loaded tuning parameters from: C:katago/KataGoData/opencltuning/tune11_gpuNVIDIAGeForceRTX4070_x19_y19_c384_mv11.txt
2023-06-09 07:36:58+0200: OpenCL backend thread 0: Device 1 Model version 11
2023-06-09 07:36:58+0200: OpenCL backend thread 0: Device 1 Model name: kata1-b18c384nbt-s6386600960-d3368371862
2023-06-09 07:36:59+0200: OpenCL backend thread 0: Device 1 FP16Storage true FP16Compute false FP16TensorCores true FP16TensorCoresFor1x1 false


Possible numbers of threads to test: 16, 20, 24, 32, 40, 48, 64, 80, 96,

numSearchThreads = 64: 10 / 10 positions, visits/s = 1824.11 nnEvals/s = 1165.36 nnBatches/s = 29.23 avgBatchSize = 39.87 (55.2 secs)
numSearchThreads = 40: 10 / 10 positions, visits/s = 1808.78 nnEvals/s = 1128.90 nnBatches/s = 55.08 avgBatchSize = 20.50 (55.5 secs)
numSearchThreads = 48: 10 / 10 positions, visits/s = 1784.88 nnEvals/s = 1149.96 nnBatches/s = 44.81 avgBatchSize = 25.66 (56.3 secs)


Ordered summary of results:

numSearchThreads =  5: 10 / 10 positions, visits/s = 839.35 nnEvals/s = 563.10 nnBatches/s = 225.34 avgBatchSize = 2.50 (119.2 secs) (EloDiff baseline)
numSearchThreads = 10: 10 / 10 positions, visits/s = 1251.00 nnEvals/s = 797.66 nnBatches/s = 159.74 avgBatchSize = 4.99 (80.0 secs) (EloDiff +137)
numSearchThreads = 12: 10 / 10 positions, visits/s = 1330.61 nnEvals/s = 877.08 nnBatches/s = 146.43 avgBatchSize = 5.99 (75.2 secs) (EloDiff +157)
numSearchThreads = 16: 10 / 10 positions, visits/s = 1458.56 nnEvals/s = 947.94 nnBatches/s = 118.78 avgBatchSize = 7.98 (68.7 secs) (EloDiff +184)
numSearchThreads = 20: 10 / 10 positions, visits/s = 1582.83 nnEvals/s = 1014.95 nnBatches/s = 101.76 avgBatchSize = 9.97 (63.3 secs) (EloDiff +209)
numSearchThreads = 24: 10 / 10 positions, visits/s = 1662.11 nnEvals/s = 1076.06 nnBatches/s = 89.90 avgBatchSize = 11.97 (60.3 secs) (EloDiff +221)
numSearchThreads = 32: 10 / 10 positions, visits/s = 1683.90 nnEvals/s = 1099.53 nnBatches/s = 68.60 avgBatchSize = 16.03 (59.6 secs) (EloDiff +214)
numSearchThreads = 40: 10 / 10 positions, visits/s = 1808.78 nnEvals/s = 1128.90 nnBatches/s = 55.08 avgBatchSize = 20.50 (55.5 secs) (EloDiff +230)
numSearchThreads = 48: 10 / 10 positions, visits/s = 1784.88 nnEvals/s = 1149.96 nnBatches/s = 44.81 avgBatchSize = 25.66 (56.3 secs) (EloDiff +213)
numSearchThreads = 64: 10 / 10 positions, visits/s = 1824.11 nnEvals/s = 1165.36 nnBatches/s = 29.23 avgBatchSize = 39.87 (55.2 secs) (EloDiff +198)


Based on some test data, each speed doubling gains perhaps ~250 Elo by searching deeper.
Based on some test data, each thread costs perhaps 7 Elo if using 800 visits, and 2 Elo if using 5000 visits (by making MCTS worse).
So APPROXIMATELY based on this benchmark, if you intend to do a 5 second search:
numSearchThreads =  5: (baseline)
numSearchThreads = 10:  +137 Elo
numSearchThreads = 12:  +157 Elo
numSearchThreads = 16:  +184 Elo
numSearchThreads = 20:  +209 Elo
numSearchThreads = 24:  +221 Elo
numSearchThreads = 32:  +214 Elo
numSearchThreads = 40:  +230 Elo (recommended)
numSearchThreads = 48:  +213 Elo
numSearchThreads = 64:  +198 Elo

Using 40 numSearchThreads!
2023-06-09 07:39:46+0200: GPU 1 finishing, processed 191655 rows 7196 batches

=========================================================================
DONE

Writing new config file to gtp_custom.cfg
You should be now able to run KataGo with this config via something like:
katago.exe gtp -model 'b18.bin.gz' -config 'gtp_custom.cfg'

Feel free to look at and edit the above config file further by hand in a txt editor.
For more detailed notes about performance and what options in the config do, see:
https://github.com/lightvector/KataGo/blob/master/cpp/configs/gtp_example.cfg


In KaTrain general settings, I set Override with my used model path and name:

C:\katago\katago.exe gtp -model 'C:\katago\b18.bin.gz' -config 'C:\katago\gtp_custom.cfg'

KataGo Engine Failed: exception: Could not open file 'C:\katago\gtp_custom.cfg' - does not exist or invalid permissions
KATAGO-INTERNAL-ERROR

The permissions are the same as in C:\baduk\katrain. The three files exist in C:\katago. Is the command syntax correct?

If yes, KaTrain might want all files in the same directory. Therefore, my next attempt has been to merge all supposedly necessary files into the same directory C:\baduk\test.

Code:
Copy C:\baduk\lizzie\ to C:\baduk\test
From C:\baduk\test

Delete

KataGoData
analysis_example.cfg
contribute_example.cfg
default_gtp.cfg
match_example.cfg
katago.exe
katago_cuda.exe
README.txt
*.gz

From C:\katago copy all missing objects to C:\baduk\test

delete default_gtp.cfg

rename b18.bin.gz default_model.bin.gz
rename gtp_custom.cfg default_gtp.cfg


Now, in command line I run:

C:\baduk\test>katago gtp

Code:
KataGo v1.13.0
Using NewZealand rules initially, unless GTP/GUI overrides this
Creating context for OpenCL Platform: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
Using OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) OpenCL 3.0 CUDA (Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32)
Loaded tuning parameters from: C:\baduk\test/KataGoData/opencltuning/tune11_gpuNVIDIAGeForceRTX4070_x19_y19_c384_mv11.txt
Initializing board with boardXSize 19 boardYSize 19
Loaded config C:\baduk\test/default_gtp.cfg
Loaded model C:\baduk\test/default_model.bin.gz
Model name: kata1-b18c384nbt-s6386600960-d3368371862
GTP ready, beginning main protocol loop


Now, in command line I run:

C:\baduk\test>katago benchmark

Code:
2023-06-09 09:41:15+0200: Running with following config:
allowResignation = true
friendlyPassOk = true
hasButton = false
koRule = SITUATIONAL
lagBuffer = 1.0
logAllGTPCommunication = true
logDir = gtp_logs
logSearchInfo = true
logToStderr = false
multiStoneSuicideLegal = true
nnCacheSizePowerOfTwo = 20
nnMutexPoolSizePowerOfTwo = 16
numNNServerThreadsPerModel = 1
numSearchThreads = 40
openclDeviceToUseThread0 = 1
ponderingEnabled = true
resignConsecTurns = 3
resignThreshold = -0.90
scoringRule = AREA
searchFactorAfterOnePass = 0.50
searchFactorAfterTwoPass = 0.25
searchFactorWhenWinning = 0.40
searchFactorWhenWinningThreshold = 0.95
taxRule = NONE
whiteHandicapBonus = 0

2023-06-09 09:41:15+0200: Loading model and initializing benchmark...
2023-06-09 09:41:15+0200: Testing with default positions for board size: 19
2023-06-09 09:41:15+0200: nnRandSeed0 = 17739763996423611530
2023-06-09 09:41:15+0200: After dedups: nnModelFile0 = C:\baduk\test/default_model.bin.gz useFP16 auto useNHWC auto
2023-06-09 09:41:15+0200: Initializing neural net buffer to be size 19 * 19 exactly
2023-06-09 09:41:15+0200: Found OpenCL Platform 0: AMD Accelerated Parallel Processing (Advanced Micro Devices, Inc.) (OpenCL 2.1 AMD-APP (3516.0))
2023-06-09 09:41:15+0200: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator
2023-06-09 09:41:15+0200: Found OpenCL Platform 1: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 09:41:15+0200: Found 1 device(s) on platform 1 with type CPU or GPU or Accelerator
2023-06-09 09:41:15+0200: Found OpenCL Device 0: gfx1036 (Advanced Micro Devices, Inc.) (score 11000200)
2023-06-09 09:41:15+0200: Found OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) (score 11000300)
2023-06-09 09:41:15+0200: Creating context for OpenCL Platform: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 09:41:15+0200: Using OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) OpenCL 3.0 CUDA (Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32)
2023-06-09 09:41:15+0200: Loaded tuning parameters from: C:\baduk\test/KataGoData/opencltuning/tune11_gpuNVIDIAGeForceRTX4070_x19_y19_c384_mv11.txt
2023-06-09 09:41:16+0200: OpenCL backend thread 0: Device 1 Model version 11
2023-06-09 09:41:16+0200: OpenCL backend thread 0: Device 1 Model name: kata1-b18c384nbt-s6386600960-d3368371862
2023-06-09 09:41:16+0200: OpenCL backend thread 0: Device 1 FP16Storage true FP16Compute false FP16TensorCores true FP16TensorCoresFor1x1 false

2023-06-09 09:41:16+0200: Loaded config C:\baduk\test/default_gtp.cfg
2023-06-09 09:41:16+0200: Loaded model C:\baduk\test/default_model.bin.gz

Testing using 800 visits.
  If you have a good GPU, you might increase this using "-visits N" to get more accurate results.
  If you have a weak GPU and this is taking forever, you can decrease it instead to finish the benchmark faster.

You are currently using the OpenCL version of KataGo.
If you have a strong GPU capable of FP16 tensor cores (e.g. RTX2080), using the Cuda version of KataGo instead may give a mild performance boost.

Your GTP config is currently set to use numSearchThreads = 40
Automatically trying different numbers of threads to home in on the best (board size 19x19):

2023-06-09 09:41:16+0200: GPU 1 finishing, processed 5 rows 5 batches
2023-06-09 09:41:16+0200: nnRandSeed0 = 1537048183467396486
2023-06-09 09:41:16+0200: After dedups: nnModelFile0 = C:\baduk\test/default_model.bin.gz useFP16 auto useNHWC auto
2023-06-09 09:41:16+0200: Initializing neural net buffer to be size 19 * 19 exactly
2023-06-09 09:41:17+0200: Found OpenCL Platform 0: AMD Accelerated Parallel Processing (Advanced Micro Devices, Inc.) (OpenCL 2.1 AMD-APP (3516.0))
2023-06-09 09:41:17+0200: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator
2023-06-09 09:41:17+0200: Found OpenCL Platform 1: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 09:41:17+0200: Found 1 device(s) on platform 1 with type CPU or GPU or Accelerator
2023-06-09 09:41:17+0200: Found OpenCL Device 0: gfx1036 (Advanced Micro Devices, Inc.) (score 11000200)
2023-06-09 09:41:17+0200: Found OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) (score 11000300)
2023-06-09 09:41:17+0200: Creating context for OpenCL Platform: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 09:41:17+0200: Using OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) OpenCL 3.0 CUDA (Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32)
2023-06-09 09:41:17+0200: Loaded tuning parameters from: C:\baduk\test/KataGoData/opencltuning/tune11_gpuNVIDIAGeForceRTX4070_x19_y19_c384_mv11.txt
2023-06-09 09:41:17+0200: OpenCL backend thread 0: Device 1 Model version 11
2023-06-09 09:41:17+0200: OpenCL backend thread 0: Device 1 Model name: kata1-b18c384nbt-s6386600960-d3368371862
2023-06-09 09:41:18+0200: OpenCL backend thread 0: Device 1 FP16Storage true FP16Compute false FP16TensorCores true FP16TensorCoresFor1x1 false


Possible numbers of threads to test: 1, 2, 3, 4, 5, 6, 8, 10, 12, 16, 20, 24, 32,

numSearchThreads =  5: 10 / 10 positions, visits/s = 674.36 nnEvals/s = 567.57 nnBatches/s = 228.00 avgBatchSize = 2.49 (11.9 secs)
numSearchThreads = 12: 10 / 10 positions, visits/s = 1076.47 nnEvals/s = 878.78 nnBatches/s = 148.70 avgBatchSize = 5.91 (7.5 secs)
numSearchThreads = 10: 10 / 10 positions, visits/s = 977.17 nnEvals/s = 811.45 nnBatches/s = 163.91 avgBatchSize = 4.95 (8.3 secs)
numSearchThreads = 20: 10 / 10 positions, visits/s = 1171.74 nnEvals/s = 1005.02 nnBatches/s = 102.87 avgBatchSize = 9.77 (7.0 secs)
numSearchThreads = 16: 10 / 10 positions, visits/s = 1131.03 nnEvals/s = 946.03 nnBatches/s = 120.74 avgBatchSize = 7.84 (7.2 secs)
numSearchThreads = 24: 10 / 10 positions, visits/s = 1193.19 nnEvals/s = 1045.82 nnBatches/s = 89.61 avgBatchSize = 11.67 (6.9 secs)
numSearchThreads = 32: 10 / 10 positions, visits/s = 1227.85 nnEvals/s = 1097.93 nnBatches/s = 71.11 avgBatchSize = 15.44 (6.7 secs)


Ordered summary of results:

numSearchThreads =  5: 10 / 10 positions, visits/s = 674.36 nnEvals/s = 567.57 nnBatches/s = 228.00 avgBatchSize = 2.49 (11.9 secs) (EloDiff baseline)
numSearchThreads = 10: 10 / 10 positions, visits/s = 977.17 nnEvals/s = 811.45 nnBatches/s = 163.91 avgBatchSize = 4.95 (8.3 secs) (EloDiff +125)
numSearchThreads = 12: 10 / 10 positions, visits/s = 1076.47 nnEvals/s = 878.78 nnBatches/s = 148.70 avgBatchSize = 5.91 (7.5 secs) (EloDiff +158)
numSearchThreads = 16: 10 / 10 positions, visits/s = 1131.03 nnEvals/s = 946.03 nnBatches/s = 120.74 avgBatchSize = 7.84 (7.2 secs) (EloDiff +168)
numSearchThreads = 20: 10 / 10 positions, visits/s = 1171.74 nnEvals/s = 1005.02 nnBatches/s = 102.87 avgBatchSize = 9.77 (7.0 secs) (EloDiff +173)
numSearchThreads = 24: 10 / 10 positions, visits/s = 1193.19 nnEvals/s = 1045.82 nnBatches/s = 89.61 avgBatchSize = 11.67 (6.9 secs) (EloDiff +172)
numSearchThreads = 32: 10 / 10 positions, visits/s = 1227.85 nnEvals/s = 1097.93 nnBatches/s = 71.11 avgBatchSize = 15.44 (6.7 secs) (EloDiff +167)


Based on some test data, each speed doubling gains perhaps ~250 Elo by searching deeper.
Based on some test data, each thread costs perhaps 7 Elo if using 800 visits, and 2 Elo if using 5000 visits (by making MCTS worse).
So APPROXIMATELY based on this benchmark, if you intend to do a 5 second search:
numSearchThreads =  5: (baseline)
numSearchThreads = 10:  +125 Elo
numSearchThreads = 12:  +158 Elo
numSearchThreads = 16:  +168 Elo
numSearchThreads = 20:  +173 Elo (recommended)
numSearchThreads = 24:  +172 Elo
numSearchThreads = 32:  +167 Elo

If you care about performance, you may want to edit numSearchThreads in C:\baduk\test/default_gtp.cfg based on the above results!
If you intend to do much longer searches, configure the seconds per game move you expect with the '-time' flag and benchmark again.
If you intend to do short or fixed-visit searches, use lower numSearchThreads for better strength, high threads will weaken strength.
If interested see also other notes about performance and mem usage in the top of C:\baduk\test/default_gtp.cfg

2023-06-09 09:42:15+0200: GPU 1 finishing, processed 48514 rows 7881 batches


So the KataGo OpenCL version in C:\baduk\test\katago.exe does run on the command line. However... now, in KaTrain I use

C:\baduk\test\katago.exe gtp

The following processes with their options are running:

Code:
"C:\baduk\katrain\KaTrain.exe"
C:\Windows\system32\cmd.exe /c "C:\baduk\test\katago.exe gtp"
\??\C:\Windows\system32\conhost.exe 0x4
C:\baduk\test\katago.exe  gtp


Start KaTrain
ERROR: Unexpected exception Expecting value: line 1 column 1 (char 0) while processing KataGo output b'? unknown command'
Komi: 6.5
Rules: Japanese

I set Black Human - White AI then click a black move.
ERROR: <remains as before>
Analyzing move...

The dGPU has 0% load now. In KaTrain general settings Override, what is the correct command for running a KataGo file, net and CFG that are not already installed in the Baduk AI Megapack directory and its subdirectories?

Top
 Profile  
 
Offline
 Post subject: Re: KaTrain Questions
Post #36 Posted: Fri Jun 09, 2023 6:11 am 
Lives in sente
User avatar

Posts: 1308
Liked others: 14
Was liked: 153
Rank: German 1 Kyu
RobertJasiek wrote:
In KaTrain general settings, I set Override with my used model path and name:

C:\katago\katago.exe gtp -model 'C:\katago\b18.bin.gz' -config 'C:\katago\gtp_custom.cfg'

KataGo Engine Failed: exception: Could not open file 'C:\katago\gtp_custom.cfg' - does not exist or invalid permissions
KATAGO-INTERNAL-ERROR

The permissions are the same as in C:\baduk\katrain. The three files exist in C:\katago. Is the command syntax correct?

Why didn't you use ...

C:\katago\katago.exe gtp -model C:\katago\b18.bin.gz -config C:\katago\gtp_custom.cfg

???

The use of inverted commas is superfluous, as the directory / file name does NOT contain spaces.
If ever, you have to use the following syntax, as far as I know:

C:\katago\katago.exe gtp -model "C:\katago\b18.bin.gz" -config "C:\katago\gtp_custom.cfg"

_________________
The really most difficult Go problem ever: https://igohatsuyoron120.de/index.htm
Igo Hatsuyōron #120 (really solved by KataGo)

Top
 Profile  
 
Offline
 Post subject: Re: KaTrain Questions
Post #37 Posted: Fri Jun 09, 2023 6:21 am 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
Thank you, I will try your syntax later today!

As to why: I could not find anything in KaTrain manuals yet but only some sample syntax in KataGo manuals with inverted commas. Therefore, I had to test various syntaxes and there are many possible combinations how syntaxes can look. Apparently, I must have missed to test the one you just suggest.

Top
 Profile  
 
Offline
 Post subject: Re: KaTrain Questions
Post #38 Posted: Fri Jun 09, 2023 11:47 am 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
I have made the following failing attempts to submit a working command to KaTrain. What is the correct syntax? What are my mistakes? What are KaTrain's or KataGo's bugs?


ATTEMPT 1

This directory has just Katago 1_13_0 OpenCL.

C:\katago\katago.exe gtp -model C:\katago\b18.bin.gz -config C:\katago\gtp_custom.cfg

Processes:
Code:
"C:\baduk\KaTrain\KaTrain.exe"
C:\Windows\system32\cmd.exe /c "C:\katago\katago.exe gtp -model C:\katago\b18.bin.gz -config C:\katago\gtp_custom.cfg"
\??\C:\Windows\system32\conhost.exe 0x4
C:\katago\katago.exe  gtp -model C:\katago\b18.bin.gz -config C:\katago\gtp_custom.cfg

KaTrain: ERROR line 1 column 1 (char 0). When trying to play: GPU load 0%.


ATTEMPT 2

This is my test directory with Katago 1_13_0 OpenCL and all files merged.

C:\baduk\test\katago.exe gtp -model C:\baduk\test\b18.bin.gz -config C:\baduk\test\gtp_custom.cfg

Click on Update Settings: "KaTrain v.1.12.3 (Keine Rückmeldung)" meaning "KaTrain v.1.12.3 (no reply)" with the process KaTrain <0.01 CPU load

Restart KaTrain, General & Engine Settings, press ESC, then these processes are running:

Code:
"C:\baduk\KaTrain\KaTrain.exe"
C:\Windows\system32\cmd.exe /c "C:\baduk\test\katago.exe gtp -model C:\baduk\test\b18.bin.gz -config C:\baduk\test\gtp_custom.cfg"
\??\C:\Windows\system32\conhost.exe 0x4
C:\baduk\test\katago.exe  gtp -model C:\baduk\test\b18.bin.gz -config C:\baduk\test\gtp_custom.cfg

KaTrain: ERROR line 1 column 1 (char 0). When trying to play: GPU load 0%.


ATTEMPT 3

From now on, I test KaTrain's Override for Baduk AI Megapack's lizzie directory, whose files work unless called by the Override command.

C:\baduk\lizzie\katago.exe gtp -model C:\baduk\lizzie\KataGo40b.gz

Processes:
Code:
"C:\baduk\KaTrain\KaTrain.exe"
C:\Windows\system32\cmd.exe /c "C:\baduk\lizzie\katago.exe gtp -model C:\baduk\lizzie\KataGo40b.gz"
\??\C:\Windows\system32\conhost.exe 0x4
C:\baduk\lizzie\katago.exe  gtp -model C:\baduk\lizzie\KataGo40b.gz

KaTrain: ERROR line 1 column 1 (char 0). When trying to play: GPU load 0%.


ATTEMPT 4

C:\baduk\lizzie\katago.exe gtp -model C:\baduk\lizzie\KataGo40b.gz -config C:\baduk\lizzie\analysis_config.cfg

Click on Update Settings: "KaTrain v.1.12.3 (no reply)" with the process KaTrain <0.01 CPU load

Restart KaTrain: ERROR KataGo Engine Failed: exception: Could not find key 'logAllGTPCommunication' in config file C:\baduk\lizzie\analysis_config.cfg
KATAGO-INTERNAL-ERROR

Press ESC


ATTEMPT 5

C:\baduk\lizzie\katago.exe -model C:\baduk\lizzie\KataGo40b.gz -config C:\baduk\lizzie\analysis_config.cfg

Click on Update Settings, Restart KaTrain, ERROR line 1 column 1 (char 0).

Trying to play: The ERROR vanishes. Analyzing move... appears. GPU load 0%. Katago.exe is the only process.


ATTEMPT 6

If you wonder why I try (partial) Linux slashes from now on: KaTrain's settings write
Path to KataGo model file = C:/baduk/lizzie/KataGo40b.gz

C:\baduk\lizzie\katago.exe -model C:/baduk/lizzie/KataGo40b.gz -config C:\baduk\lizzie\analysis_config.cfg

Click on Update Settings, Start new game, ERROR vanishes, trying to play: Analyzing move... appears. GPU load 0%. Katago.exe is the only process.


ATTEMPT 7

C:\baduk\lizzie\katago.exe -model C:/baduk/lizzie/KataGo40b.gz -config C:/baduk/lizzie/analysis_config.cfg

Click on Update Settings, ERROR line 1 column 1 (char 0), Start new game, ERROR vanishes, Analyzing move... appears. GPU load 0%. Katago.exe is the only process.


ATTEMPT 8

C:/baduk/lizzie/katago.exe -model C:/baduk/lizzie/KataGo40b.gz -config C:/baduk/lizzie/analysis_config.cfg

Click on Update Settings, Start new game, ERROR vanishes, trying to play: Analyzing move... appears. GPU load 0%. Katago.exe is the only process.

Top
 Profile  
 
Offline
 Post subject: Re: KaTrain Questions
Post #39 Posted: Fri Jun 09, 2023 3:47 pm 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
KaTrain

I have made two more command line tests in KaTrain and both have failed.

KaTrain must be buggy!

Attempt 9

"C:\katago\katago.exe" gtp -model "C:\katago\b18.bin.gz" -config "C:\katago\gtp_custom.cfg"

Attempt 10

"C:\baduk\test\katago.exe gtp" -model "C:\baduk\test\b18.bin.gz" -config "C:\baduk\test\gtp_custom.cfg"


Lizzie

Next, I have tried Lizzie and got it to work within one minute with the following command line in the Lizzie Engine settings:

C:\katago\katago.exe gtp -model C:\katago\b18.bin.gz -config C:\katago\gtp_custom.cfg

Playing works. GPU load 96%. Processes:

Code:
C:\baduk\LizzieYZY\jre\java11\bin\javaw.exe   ..\LizzieYZY\jre\java11\bin\javaw.exe  -jar .\lizzie.jar
C:\katago\katago.exe   "C:\katago\katago.exe" gtp -model "C:\katago\b18.bin.gz" -config C:\katago\gtp_custom.cfg
C:\Windows\System32\conhost.exe   \??\C:\Windows\system32\conhost.exe 0x4

Top
 Profile  
 
Offline
 Post subject: Re: KaTrain Questions
Post #40 Posted: Fri Jun 09, 2023 7:00 pm 
Lives with ko

Posts: 128
Liked others: 148
Was liked: 29
Rank: British 3 kyu
KGS: thirdfogie
Robert,

Thanks for your detailed posts on these topics. My small comment is that Lizzie
may also be buggy, at least on my system. If you have time, please run the following
test.

1. Load a game into Lizzie for analysis by KataGo.
2. Select a small number of visits, for example 50, by typing a50.
3. Let the analysis run to the end.

On my system, the resulting evaluation graph displays a downward red line marking
every move as a mistake by Black, but no upward red lines for White. I suspect
this is a bug in Lizzie not KataGo, but I don't know how to prove it.

As the number of visits increases, the discrepancy slowly disappears.
It is still noticeable with 1000 visits but not at 7000 visits per move.
I normally use 7000.

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 54 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group