New AI Computer

RobertJasiek · #1

Introduction

Due to the mining boom etc., I could not buy an AI computer for RTX 3000 but had to wait three years to buy one with RTX 4000. Meanwhile, I have learnt a lot but proper installation of AI software is the next hurdle. I may not be the only newbie for an AI computer so other newbies might encounter similar difficulties. In this thread, we can discuss possible solutions.

Hardware

One major aspect is hardware choice. Besides consoles (really?), eGPUs (why?), online services (restricted functionality) or AI servers (astronomic speeds and prices), one's own computer is the serious ordinary approach. The major options are mobile devices (from now on tagged "notebooks"), mini PCs and desktops.

Mini PCs

There are very few models of mini PC with suitable dGPU. Mini PCs with latest chips started to appear 22 months delayed but this may change for the current chip generation as mini PCs with 780M iGPU are already available with only 4 ~ 5 months delay - a record. However, Intel NUCs with dGPUs are not attractive. The only noteworth mini PCs with dGPUs are Minisforum with 3070TI Laptop GPU for less than €1000. Cheap, reasonably silent but only mediocre speed (last generation's reasonable notebook speed below the unreasonably priced 3080TI Laptop notebooks). However, whether Intel or Minisforum, the only available such mini PCs have huge skulls on their chassis. I have asked Minisforum: they will do the same for RTX 4000 Laptop. Buy it if you like such or don't if you don't! I don't. One must not cover the chassis by huge labels because the metal chassis is essential for cooling. This leaves notebooks or desktops.

Notebooks

Last generation's notebooks tended to be too loud. Some of this generation's notebooks have reasonable noise but their prices are exaggerated by ca. €1000 ~ €1900 (some high end notebooks even by several thousand euros). Notebook manufacturers have learnt from Nvidia's Ampere greed and now try to do the same until the market settles or sorts out all excessively priced models. Besides manufacturers cheat: after currency conversion and tax, effective notebook prices outside the USA can be €100 ~ €1900 higher than the prices of the essentially same models in the USA. Manufacturers distribute old, excessively huge power bricks in some regions of the world, set applelesque surcharges for RAM, SSD, Windows and whatnot, offer too short warranty periods or bad warranty conditions, and do whatever they can to rip off the endconsumers and insult their intelligence. Buy a last generation notebook with mediocre speed or wait another 6, 12 or 24 months until prices, condition, service and treatment become reasonable again. Rare exceptions confirm the rule. Expect a well chosen notebook to cost at least €1000 ~ €1300 more than a similarly configured desktop without peripherals and with 1.5x the speed.

Besides these aspects, notebooks are compromises on quality and features. Regardless of the price, currently it is impossible to get a notebook with the same level of both as a desktop. The compromise of a desktop is, of course, immobility and possibly DIY assembly. Notebooks have some or all of these shortcomings: weak chassis, weak keyboard, bad keyboard layout (small arrow keys, missing page navigation keys etc.), a possibly mechanical keyboard is not mechanical for many keys, wrong keyboard locale, bad display ratio, mirroring display, flickering display (pulse width modulation or the like), dark display, faulty display pixels, only OLED offered, weak display hinge, missing or bad battery replacement, short battery life in ordinary use, under load or without power plug, missing or bad or extremely bad maintenance, loud, very loud or extremely loud under GPU load, not silent in idle, coil whine, wrong GPU manufacturer, slow GPU, badly chosen CPU, too little RAM installed or installable, slow RAM, slow SSD, without or insufficient operating system, bad CPU or GPU pasting, without vapor chamber, without liquid metal on CPU or GPU, too small fans, bad airflow, bad air boundaries, insufficient or buggy UEFI, insufficient or buggy drivers, insufficient or buggy system control software, crapware, crap labels, weak package, ugly chassis attributes, camera bump (at least notch is Apple-only for notebooks thus far), permanent annoying lights, permanent data theft and violation of data protection and privacy laws etc.

RTX 4070 Laptop is the speed of RTX 3070TI Laptop so mediocre speed of last generation. If this is sufficient for you, you might consider Asus Flow X13, X16 *, Z13 *, ProArt StudioBook, Strix G17 *, Vivobook Pro 16X, Dell Alienware x14 R2 (4060), Gigabyte Aero 16 OLED, MSI Creator Z16 HX, Z17 HX, Schenker Vision Pro 16, Stealth 14 Studio, Stealth 16 Studio *, where * denotes such models for which fan settings exist that allow reasonable relative GPU speed at acceptable noise (at most 43 dB) but I rely on tests and cannot guarantee it.

If you want typing on a mechanical keyboard with an experience at least somewhat similar to a desktop keyboard, the candidates are Dell Alienware m16 (less so because smaller chassis allows less heat dissipation), m18 (but Alienwares require difficult and risky, almost complete disassembly for cleaning of the large fans so that eventually one can destroy the notebook while trying to maintain it, besides the system control software is buggy, the alien heads a matter of taste, non-US mechanical keyboards are delayed by months and non-US prices are extraordinarily excessive rip offs), Medion Erazer Beast X40 (too loud), Major X20 (too loud), MSI Titan GT77 HX (way too loud and very expensive), XMG Neo 16 E23 (too loud without external water cooling etc.), 17 E23 (might, or might not, be too loud without external water cooling but trustworthy noise tests have not been made yet etc.).

Among the remaining notebooks with possibly / hopefully acceptable noise, I considered especially Acer Predator Helios 18 (wobbly keyboard, like earlier, similar Acer notebooks but far from desktop typing experience), Triton 17X (wobbly keyboard), Asus Strix G18 (coil whine, at most 32GB RAM, wobbly keyboard, camera bump, permanent annoying lights etc.), Scar 16 (about as before, expensive), Scar 18 (about as before, expensive), Lenovo Legion Pro 7i Gen 8 (a bit too loud, suboptimal keyboard layout / typing, rip off prices unless on sale, only 32 GB RAM, features worse than last generation), ThinkPad P1 G6 (small arrow keys), PCSpecialist Recoil VII 17 (no independent noise tests yet, no German mechanical keyboard, GPU without liquid metal, suboptimal keyboard layout)

Hence, my conclusion has been: none of the notebooks offers, or will offer in the forseeable future, an acceptable compromise. Your opinion may differ, especially if you can accept RTX 3070TI / 4070 Laptop speed and possibly a 16:9 display ratio for 1:1 go boards. I am going the desktop route.

Desktops

Last generation, I wanted to buy an RTX 3080 10 GB, 320W, three 8-pin connectors and, for silence, a 1200W PSU (Corsair HX 1200i has been discontinued despite contrary promises so the option was BeQuiet Dark Power Pro 12 1200W for roughly €340) to be operated semi-passively around 50% loud. This generation, I have bought an RTX 4070, 200W, one 8-pin so that the passive Seasonic Prime TX-700 Fanless is possible, as I do not want to over-clock. Besides a few other reasonably silent models, the most silent 4070 graphics cards appear to be Asus TUF 4070 12G (or O12G if operated at 200W-) with 200W and one 8-pin and MSI 4070 Gaming X Trio with, IIRC, 215W and one 12-pin to two 8-pins. Asus has accumulated a bad reputation (including trying to rip off me with RTX 3080 and faulty motherboards) while the MSI card uses 12-pin, which tends to melt if used on an RTX 4090. Although 4070 is not as power-hungry, I do not want to take any risk and dislike splitted cables. The one 8-pin at just 200W of the Asus card is ideal, and the quality of Asus graphics cards has not been doubted seriously in recent times. The build quality is excellent. The included magnetic stand serves as an intermediate solution but I will get or build a more solid stand. I would avoid AMD or Intel GPUs for go AI, which profits from Nvidia CUDA and tensor cores, and Nvdia drivers are more stable.

Low noise is my primary concern. Theoretically, passive desktops are possible but very hard to build and you might have to preorder half a year in advance to pay €800 - €1000 for the chassis alone. Mini-ITX is another route but your choice of components is limited, the noise can be a bit higher and building is hard. Forget about watercooling graphics cards - all you achieve is moving the noise from the card to a radiator on top of the desktop or external; watercooling is for show-off Youtubers, rich overclockers or ultra-rich data scientists needing to store 4+ graphics cards in the chassis. Unless you choose a 180+W CPU, AIO-watercooling it serves no purpose other than possibly a cleaner interior for more RGB space, if this is your preference, but beware that water can leak etc. The reasonable approach is a 65W AMD CPU (Intel lies too much about thermals, AMD also lies but very cautiously) and a good CPU air cooler. I wanted to buy Scythe Fuma 2 Rev B but it went out of stock at the beginning of 2023, Scythe Fuma 3 is about to appear but available only from mid July is some parts of the world and its noise still untested. I have chose a BeQuiet Darkrock Pro 4, whose assembly is somewhat advanced. Surely a very heavy overkill but good for low noise especially for moderate CPU use, as is the case for GPU load of go AIs when my Ryzen 7700 (no X in the name, 65W, 8C, 16T) runs around 16%. I guess a 6-core, 12-thread CPU would also do but 8 cores are more future-proof and lower percentage allows lower noise. I would have taken a 15W CPU if such a reasonable option had been available for a desktop motherboard. Of course, there are also some good Noctua CPU coolers of large or intermediate size or others, which can operate at reasonably low noise, especially if used for modest CPU loads. My case fans are Arctic P14 PWM, but I think that the best BeQuiet fans would perform similarly. Put 3 at the front and 1 at the rear. Connect each individually with 4 pins directly to your motherboard, which thus must have at least four system fan headers. After first tuning, my UEFI fan curves (choose a motherboard with suitable firmware and VRM [not a typo, this means voltage regulator modules, not to be confused with VRAM] coolers, of course!) are:

GPU: Default
CPU: 0°C/20% 45°C/35% 70°C/45% 80°C/100%
Each case fan: 40°C/50% 55°C/50% 75°C/60% 85°C/100%

Note that the case fans operate at slightly different RPMs despite equal fan curves so that there are no unnecessary interferences. Otherwise, set slightly different fan curves in PWM modes, of course. At idle and default, the values are GPU fans 0 RPM, CPU fans ~480 RPM and case fans 975+ RPM. GPU and CPU have ~41°C. Under GPU load (Furmark or KataGo), subjectively the PC is as silent as when idle. I guess it might be roughly 37dB, the frequency is low and it sounds like a remote room fan. The values are:

KataGO:
CPU ~16% 67-°C 620 RPM,
Chassis fans 1080- RPM,
dGPU 94% 64-°C hotspot 76,6-°C 1078- RPM
iGPU 0%

Furmark:
CPU mostly 3-% rarely 20-% 48~51,5°C 515~528 RPM
Chassis fans 966- RPM
dGPU 99~100% 68-°C hotspot 82-°C 1211- RPM
iGPU with load but thermically neglectable

Only if I stress test my PC with 100% CPU (such as CPU-Z stress test) 99~100% GPU load (Furmark), the CPU fan reaches about 1200 RPM and the chassis fans about 1400 RPM, which is noticably less silent and comes with a bit of chassis excitement but would still be acceptable, albeit it is not the very silent experience of KataGo or Furmark. In conclusion, the noise of a well configured desktop is as low as I hoped for. You might achieve such levels with a well chosen RTX 3070 TI / 4070 Laptop notebook if operating the GPU at 70% TDP using MSI Afterburner or the like. So far, I am running my GPU at 100% but I might try lower power targets later whether they impact noise or speed significantly. Less TDP might allow even slower chassis and CPU fans. They matter because I cannot hear the GPU fans even if they run at 1100 RPM. Do not forget to get a suitable, large airflow mesh chassis, which must allow the passive PSU to emit air upwards to free space or through holes. Mine is Fractal Meshify 2. Although testers tested that the no longer available S2 had even better airflow, we may consider this superstitious as surely the airflow here is good. Maybe fine differences matter when one uses high TDP OC components. Chassis is also a matter of taste but beware that even some hyped models, such as Torrent, have had their case burning issues. The more something burns, the more it is hyped or bundled.

SSD: take whatever you want, they are cheap. I prefer low latency and avoid model series with bad firmware reputation, so do inform yourself. RAM is a difficult issue. Obviously it must be dual channel with two identical sticks. The most importantly, it must fit well under any overly large CPU cooler! So check the dimensions. DDR5 and 4 sticks still do not seem to work well. Take 2 sticks of the right size and call it a day. You absolutely must check motherboard compatibility (also of the CPU for the firmware version) at the motherboard or RAM manufacturers' webpages! Everybody and the specifications will tell you that RAM XMP / EXPO overclocking is the norm but I have had no luck and no energy to achieve at least partial RAM overclocking. The exact combination of RAM models, motherboard, firmware and CPU matters and no four RAM sticks are the same. You may as well save €20 ~ 40 and get your JEDEC sticks! Anyway, stability matters more than alleged 2% faster speed. CPU pasting: do not bath your PC and do not put so little that heat will not distribute the paste everywhere on the main surface.

You can spend umlimited time on informing yourself in advance on to configure and build your desktop but, nevertheless, manuals always miss one essential aspect and inevitable you will still meet difficulties... Are all components compatible by size, standard and type? How to route cables not over potentially hot components? Which side of a fan is its intake? Which cable to connect in which direction to which port? Is each cable connected sufficiently firmly? Can each cable still be connected after installing the CPU cooler or are some cables better connected early? When to insert the USB stick for the firmware update? Should I learn unattended Windows installation to enable offline, local user account and no annoying data theft settings? Must the display be connected initially to the motherboard or graphics card? In which slots to put the RAM sticks? How slow should I operate so that no screws or heavy parts fall onto the motherboard? Should I get am electric screwdriver and with lots of modules to avoid many hours of removing and installing case fans? Which is the right screwdriver for the right screw? Will I have enough cable binders? Does the CPU cooler with pad or paste or do I need to buy some? Do I have all needed USB sticks and cables? What too cheap Windows license is legal in my country? I have made one mistake: after four days of assembly, I pressed the On button but nothing happened because the mainboard plug was not inserted properly.

For the drivers, get the graphics card driver from the dGPU manufacturer, the appropriate chipset driver from the CPU manufacturer, first let Windows do its job, then install these two drivers, then look in the Windows device manager for Unknown devices or devices with exclamation marks, for which you find drivers on your new computer, at the mainboard manufacturer's webpages or elsewhere.

Currently, new mainboards are ca. €70 overpriced and some oher components slightly overpriced so one might pay ca. 7% more than for the last generation in a hypothetical world of then available graphics cards at MSRPs. If, however, you wait, the next crisis might come. There is never a truly right moment for purchasing a computer, unless you can buy all at the same time at a sale when you actually need it. At least I got my GPU for the price I wanted to spend 3 years ago, with the same speed but now with much better efficiency. For upgrading a GPU, current times are terrible. For buying a first GPU, current times are good if you can accept the price level for the desired speed.

SOFTWARE

KataGo needs a GUI, such as KaTrain or Lizzie. Only if you only have a CPU, use KataGo Eigen. If you just want to get some KataGo running on a GPU, start with its OpenCL version and the main GUIs by installing Baduk AI Megapack. Next, try CUDA. Installing CUDA or Tensor can be difficult, see discussions elsewhere or later.

Instead of wasting time on various crapware, consider CPU-Z for stress test CPU, Furmark for stress test GPU, Katago on dGPU with long time settings and AI player for almost stress test GPU, HWiINFO64 to monitor loads, temperatures and fan speeds, the mainboard UEFI to set fan speeds, Windows | memory diagnostics to test RAM, OCCT to test VRAM, Afterburner for tuning the dGPU.

RobertJasiek · #2

DRIVERS, SOFTWARE, CORES and LIBRARIES

We have our new Windows computer with a dedicated graphics card. Firstly, install its driver. For a Nvidia graphics card, we can install the Gaming driver, which is updated more frequently and provides a few additional features for 3D games, or the Studio driver, which emphasises long stability and provides a few additional features for work software. Whichever kind of driver we prefer, it must fit the operating system, such as Windows 11 64-bit, and usually should be the newest stable version.

We need both an engine and a GUI (graphics user interface). An engine is a go AI (artificial intelligence) software, such as KataGo, and generates moves. A GUI, such as Lizzie or KaTrain, displays the go board. The GUI calls the engine so that both run simultaneously and interact with each other. We interact with the GUI.

Every graphics card supports OpenCL as an application interface between software and graphics card. AMD graphics cards only support OpenCL. Nvidia graphics cards support OpenCL and have CUDA cores, tensor cores and RT (raytracing) cores. KataGo supports OpenCL, CUDA cores and tensor cores. KataGo supports OpenCL easily. It needs little more than its OpenCL.dll library file. It can also use tensor cores, but we need not tell it to do so. If KataGo shall use CUDA cores, it needs both Nvidia's CUDA libraries and Nvidia's CuDNN libraries, whose installation we discuss later. If KataGo shall make best use of tensor cores, it needs Nvidia's TensorRT libraries, whose installation we discuss later. Hence, there can be different kinds of cores and different kinds of libraries, which enable software to use some of the cores at all. Some libraries enable more efficient use of particular cores.

GETTING STARTED

We have our new computer and want to quickly test whether its dedicated graphics card allows us to run a GUI and KataGo. We might not want to start by testing all - OpenCL, CUDA and tensor cores, CUDA, CuDNN and TensorRT libraries - at once. A convenient start is the Baduk Megapack at https://github.com/wonsiks/BadukMegapack , which comes as an installer of currently version 4.18.0 (on 2023-06-12) for Windows 11 64-bit, installs a couple of GUIs and instances of KataGo.

For now and the sake of simplicity, we keep the recommended installation directory C:\baduk and are logged in with a Windows adminstrator user. Some go software programmers are at home in the Linux world and do not respect the Windows security conventions of using applications installed to the write-protected C:\Program Files or C:\Program Files (x86) directories as a Windows standard user. Later, such go softwares want to write files in their installation directories. We postpone related management of Windows security but might disable internet connections when logged in as a Windows administrator user.

The installation process of Baduk Megapack comes with a surprise: a command line window protocols various things and interacts with us so that it can initially tune KataGo and adjust it at least roughly to our dedicated graphics card. For now, most of these settings can be answered somehow. If we can leave some parameter empty at its default, we just do so. However, there is one absolutely essential question. One or a few graphic card devices are listed and each has a number 0, 1, etc. We write and only write the stated Device number of our dedicated graphics card. For example, the text Found OpenCL Device 1 = RTX 4070 indicates that we must write 1 . (If you have several dedicated graphics cards, list them. I would, however, not include the integrated graphics card so that your CPU remains cooler and the software has fewer reasons to exhibit any bugs. Overclockers may have a different opinion.) After answering the query, we are patient and watch the initial tuning progress.

After installation of Baduk Megapack, we can try KaTrain. The Hamburger menu (click on three horizontal bars) gives access to General & Engine Settings. Click Download KataGo version and select the OpenCL instance of KataGo, whose path is C:\baduk\lizzie\katago.exe . Click Download Models and choose one of the *.gz or *.bin.gz files. Afterwards, there should be entries in the three rows Path to KataGo executable, Path to KataGo config file and Path to KataGo model file. Override Engine Command remains empty for now. Adjust the Maximum time for analysis. (A small value lets us see an operating AI quickly while a large value lets us check GPU usage and running processes in suitable tools.) Click on Update Settings and possibly wait for a few minutes. Close this tab by ESC. If KaTrain freezes at this moment, kill its process and restart KaTrain. Then we may find that General & Engine Settings have the right entries. In the Player Setup, set, for example, Black Human and White AI; press ESC. Click on the board and the engine should reply with its moves. The Windows task manager (CTRL + ALT + DEL) notices some temperature increase of a dedicated graphics card or its load for Furmark but has trouble noticing the GPU load of advanced software. We can use a tool, such as HWiNFO64, to monitor GPU load. We know that the go engine works and uses our dedicated graphics card if HWiNFO64 indicates its 50% ~ 100%, typically 94% ~ 96% load while pondering.

KATAGO INSTALLATION

For a proper installation of KataGo, we go to its webpage https://github.com/lightvector/KataGo and download from https://github.com/lightvector/KataGo/releases its Windows versions for OpenCL and, if needed for a Nvidia graphics card, CUDA and TensorRT. Typically, the download files come as compressed ZIP archives. Here, we face a minor difficulty: do we need the files with or without bs29? bs29 is for board sizes up to 29x29. For us starters, we avoid such extras and chose the files without bs29. The download files have names like these:

Code:

katago-v1.13.0-opencl-windows-x64.zip
katago-v1.13.0-cuda11.2-windows-x64.zip
katago-v1.13.1-trt8.5-cuda11.2-windows-x64.zip

These are meaningful file names but we must be able to decipher them. 1.13.0 or 1.13.1 is KataGo's version number. x64 denotes 64-bit Windows. opencl is the KataGo version for OpenCL. cuda11.2 is the KataGo version for the CUDA and CuDNN libraries in their versions 11.2. trt8.5-cuda11.2 is the KataGo version for the TensorRT library in its version 8.5, which relies on installed CUDA and CuDNN libraries in their versions 11.2. Although a KataGo download file contains some DLLs, it does not contain the CUDA, CuDNN and TensorRT library files, which we must seek separately from other sources.

It can sometimes happen that a download page of KataGo does not contain all three versions. In this case, we must visit several subpages of KataGo's webpage to get them all.

Furthermore, on KataGo's webpage, we find and download from https://katagotraining.org/networks/ a model file, which is a pretrained neural net. Usually, newer model files are better than older model files. However, there is the additional aspect that models come in different block sizes. In the early days, larger block sizes indicated stronger models. Currently, this is not the case but the block size 18 is the strongest for typical usage. We recognise it by b18 early in its name, such as kata1-b18c384nbt-s6386600960-d3368371862.bin.gz . Model files are compressed as *.bin.gz or *.gz. With our use, we do not decompress them - instead, we simply use them. The tail of a long file name might be just random digits. However, for our convenience, we may rename the file to, say, b18.bin.gz .

Install the contents of each ZIP file to its own directory. That is, use, for example, the Windows Explorer to unpack a particular ZIP and then copy the contained files and any folders to its installation folder. For example, create the directories

Code:

C:\katago_OpenCL
C:\katago_CUDA
C:\katago_TensorRT

and install the appropriate files to their directory. Furthermore, copy the model file b18.bin.gz to each of the three directories. This wastes disk space but later eases calling the model file. Alternatively, we can store model files in their separate directory and write its different path when calling one of them.

Before we can use either of these three versions of KataGo, we need three to five further preparation steps:

1) For KataGo TensorRT, get another software containing libraries.

2) For KataGo CUDA or For KataGo TensorRT, copy the missing library files.

3) Initial benchmark of KataGo.

4) Initial tuning of KataGo.

5) In a GUI, set the command line for calling KataGo.

The following describes this procedure for each version of KataGo.

KATAGO OpenCL

The installation directory, say C:\katago_OpenCL, already contains a copy of the needed OpenCL.dll library file. Therefore, we continue with step 3 of the procedure.

Open the Windows command line, that is C:\Windows\System32\cmd.exe . Go to the right directory using the command

Code:

cd \katago_OpenCL

There, execute the following command (or adjust the file name if you have chosen a different one):

Code:

katago.exe benchmark -model b18.bin.gz

katago calls the program katago.exe in the current directory. The parameter benchmark does not carry a minus sign because it does not call any object. The parameter -model carries a minus sign because it calls an object: our model file. Therefore, KataGo can benchmark for the model file that will be used later when we will use KataGo. Execution of the command takes a while. Eventually, KataGo creates the subdirectories and file \gtp_logs, \KataGoData and \KataGoData\opencltuning\tune11_gpuNVIDIAGeForceRTX4070_x19_y19_c384_mv11.txt or a similar file name.

As step 4 of the procedure, we are still in the same directory and execute the following command:

Code:

katago.exe genconfig -model b18.bin.gz -output gtp_custom.cfg

The parameter genconfig calls the tuning function for our model file b18.bin.gz and will eventually create the config file gtp_custom.cfg in the same directory. First, the tuning interacts with us, as we already know. When asked, we must specify the right device. Now, however, for another question, we must also choose a useful number of visits. On a modern grapgics card, this might be:

Code:

10000

When the tuning starts, we notice whether it proceeds smoothly or is way too slow. If necessary, we can interrupt execution by CTRL C and execute the command afresh with then a much smaller number of visits. Otherwise, we are patient and let the tuning do its job. It writes the appropriate values in the created config file. We close the command line window.

As step 5 of the procedure, we start Lizzie or KaTrain to set the command line for calling KataGo. If we use Lizzie, we go to Settings | Engine | Engine 0, delete the earlier command line and enter this command line:

Code:

C:\katago_OpenCL\katago.exe gtp -model C:\katago_OpenCL\b18.bin.gz -config C:\katago_OpenCL\gtp_custom.cfg

For more variation on Lizzie's syntax of the command line, see https://www.lifein19x19.com/viewtopic.php?f=18&t=19196 . Lizzie is one of the GUIs communicating to KataGo in the gtp mode. Therefore the command has the gtp parameter and uses the gtp_custom.cfg config file. Optionally, alter Max Game Thinking Time. Click OK. Choosen Game | NewGame(N) and so on. You should be able to play against the AI. If necessary, close and restart Lizzie.

If we use KaTrain, in General & Engine Settings, we set this Override command line:

Code:

C:\katago_OpenCL\katago.exe analysis -model C:\katago_OpenCL\b18.bin.gz -config C:\katago_OpenCL\analysis_config.cfg

For more variation on KaTrain's syntax of the command line, see https://www.lifein19x19.com/viewtopic.php?f=18&t=19195 . KaTrain is one of the GUIs communicating to KataGo in the analysis mode. Therefore the command has the analysis parameter and uses the analysis_config.cfg config file. Click Update Settings and ESC. Unless closing / killing the process and restarting KaTrain is necessary, you should be able to start a new game and play against the AI.

KATAGO CUDA

Our installation directory is C:\katago_CUDA . We start at step 2 of the procedure. We copy from C:\baduk\lizzie to C:\katago_CUDA the following five files, of which KataGo CUDA uses all:

Code:

cublas64_11.dll
cublasLt64_11.dll
cudnn_cnn_infer64_8.dll
cudnn_ops_infer64_8.dll
cudnn64_8.dll

In the benchmark step 3 of the procedure, we open the Windows command line, change directory to C:\katago_CUDA and execute this command:

Code:

katago.exe benchmark -model b18.bin.gz

KataGo creates the subdirectory \gtp_logs .

As step 4 of the procedure, we are still in the same directory and execute the following command, during whose dialog we write the right CUDA Device number and afterwards set the number of visits to, for example, 10000:

Code:

katago.exe genconfig -model b18.bin.gz -output gtp_custom.cfg

KataGo creates the files C:\katago_CUDA\gtp_custom.cfg and, for example, C:\katago_CUDA\gtp_logs\20230609-072808-6A18D901.log .

In step 5 of the procedure, we tell Lizzie the Engine command line

Code:

C:\katago_CUDA\katago.exe gtp -model C:\katago_CUDA\b18.bin.gz -config C:\katago_CUDA\gtp_custom.cfg

or tell KaTrain the Override command line

Code:

C:\katago_CUDA\katago.exe analysis -model C:\katago_CUDA\b18.bin.gz -config C:\katago_CUDA\analysis_config.cfg

KATAGO TensorRT

Our installation directory is C:\katago_TensorRT . It also needs files that are not readily available yet. We begin with step 1 of the procedure and download LizzieYZY as a separate software available at https://github.com/yzyray/lizzieyzy . The downloaded file is a ZIP archive, which we unpack in the Windows Explorer.

As step 2 of the procedure, the unpacked archive contains the subfolder \katago_tensorRT , from which we copy the following files to C:\katago_TensorRT :

Code:

cublas64_11.dll         *
cublasLt64_11.dll      *
cudart64_110.dll
cudnn_cnn_infer64_8.dll
cudnn_ops_infer64_8.dll      *
cudnn64_8.dll         *
msvcr110.dll
nvinfer.dll         *
nvinfer_builder_resource.dll
nvrtc64_112_0.dll
nvrtc-builtins64_114.dll

* means that I have seen in a ProcessExplorer that KataGo uses these libraries. I do not know yet whether the other copied files are also needed. They do, however, no harm except for occupying disk space possibly unnecessarily.

As step 3 of the procedure, we open the Windows command line, change directory to C:\katago_TensorRT and execute this command:

Code:

katago.exe benchmark -model b18.bin.gz

As step 4 of the procedure, we are still in the same directory and execute the following command, during whose dialog we write the right GPU Device number and afterwards set the number of visits to, for example, 10000:

Code:

katago.exe genconfig -model b18.bin.gz -output gtp_custom.cfg

In step 5 of the procedure, we tell Lizzie the Engine command line

Code:

C:\katago_TensorRT\katago.exe gtp -model C:\katago_TensorRT\b18.bin.gz -config C:\katago_TensorRT\gtp_custom.cfg

or tell KaTrain the Override command line

Code:

C:\katago_TensorRT\katago.exe analysis -model C:\katago_TensorRT\b18.bin.gz -config C:\katago_TensorRT\analysis_config.cfg

On the first start, KataGo TensorRT needs two minutes or more in KaTrain or 30 seconds or more in Lizzie. At later starts, the delay is a few seconds in KaTrain and 10 seconds in Lizzie on my computer. On recent computers, the delays may be worth it because usually KataGo TensorRT is the fastest version of KataGo during go move generation by far.

NVIDIA LIBRARIES

So far, we have created some duplicate files. Some of them are huge so much disk space is wasted. Furthermore, at least on my computer, KataGo CUDA has been slow so far and one of the possible reasons is a too old library file. Instead of manually copying individual library files, the usual but even more complicated way seeks them from Nvidia's webpage, where first one must register. I have downloaded tremendous installers and installation manuals from there but not installed any yet. We can get Nvidia CUDA libraries from https://developer.nvidia.com/cuda-zone , Nvidia CuDNN libraries from https://developer.nvidia.com/cudnn and Nvidia TensorRT libraries from https://developer.nvidia.com/tensorrt . We need local executables for Windows 11 or at least Windows 10 of the right versions. I do not know yet whether the newest versions are suitable or whether we need the exact versions declared in KataGo's download file names. If you see GA and EA variants of a version, GA seems to be the revision. For a version, Nvidia often offers several subversions, of which we might choose the latest. It is possible that Nvidia's installers also mess with drivers or install developer softwares, which we players do not need. We might find installed libraries and copy them or refer to them by a Windows PATH environment variable. If we look at the individual library files above, we notice some numbers in the file names, which might denote version numbers.

MISCELLANEOUS

If this manual enables you to install some GUIs and the three versions of KataGo in one day, this would be about ten times faster than I needed without it. However, we are not done yet. Further tuning of each version and additional care for the analysis variant are needed. We can run the genconfig tuning several times with different numbers, such as 5000, 10000, 20000, 30000, of visits, save the created config files under different file names, and compare or modify the values in these config files. We might also let analyse board positions and compare numbers of visits to judge about different config parameters.

RobertJasiek · #3

INSTALLATION OF NVIDIA LIBRARIES FOR WINDOWS

Download

From https://developer.nvidia.com/cuda-zone download cuda_11.6.2_511.65_windows.exe (CUDA 11.6.2).

From https://developer.nvidia.com/cudnn download cudnn-windows-x86_64-8.9.1.23_cuda11-archive.zip (CUDNN 8.9.1 for CUDA 11).

From https://developer.nvidia.com/tensorrt download TensorRT-8.5.2.2.Windows10.x86_64.cuda-11.8.cudnn8.6.zip (TensorRT-8.5.2.2 for CUDA 11). As an alternative for the latter, from https://github.com/yzyray/lizzieyzy download 2023-06-15-windows64+katago.zip (LizzieYZY_2_5_3).

If necessary, locate links to archived download files.

These Nvidia download file versions work for KataGo CUDA 1_13_0 and KataGo TensorRT 1_13_1 on my computer. Another user has reported that TensorRT 8.5.3.1 works for him. The KataGo download file names give hints on Nvidia download file versions but, currently on 2023-06-15, the only safe advice is use of files for the main version CUDA 11 (not 12) for Windows 10 or 11 (if 11 is not offered, choose Windows 10 files) as local EXE. For some downloads, you may need to register at Nvidia's webpage, answer a query (Why is every enduser an organisation?!) and receive confirmation emails.

FATE

Installed download files might, or might not, work. This depends on hardware, the Windows and programs installation, the Nvidia graphics card driver version, the Nvidia CUDA library download file version, the Nvidia CUDNN library download file version, the Nvidia TensorRT library download file version, the KataGO CUDA download file version and the KataGo TensorRT download file version. Trial and error may be needed. If an installation of downloads fails, uninstall and try a different installation. The concept of libraries is modularity but, in practice, it is limited. Downloading files with close release dates has a greater chance of success. Choose a CUDNN version for a CUDA version. Choose a TensorRT version for a CUDA version and, so only the theory, for a CUDNN version. In particular, finding a working TensorRT version can be difficult. You might start with the newest subversion and, if necessary, try subseqent subversions one after another. If this fails, also try some sub-subversions. Nvidia provides version compatibility information but such is flawed. Keep your motivation because TensorRT can be significantly faster than OpenCL or CUDA!

Even if you establish some working installation, it can still be very wrong by resulting in slow speed (up to 1/6 of what it should be) of KataGo CUDA or KataGo TensorRT. Without reference to earlier speeds, you might not know whether it is slow or fast. However, CUDA libraries might (but need not) be faster than OpenCL, and TensorRT libraries should be the fastest. If the relative order is obviously wrong or some benchmarks or gtpconfig runs last forever, you know that some KataGo library version must run too slowly. Most likely, it is not KataGo's or your graphic card's fault but is the fault of an improper combination of Nvidia download files. In that case, trial and error continue. I have experienced it all. Installation is already very difficult but this trial and error process can make it even much more difficult. At least, now you know what to look for if you follow this manual and things go wrong nevertheless.

Preparation and General

Do not have a) any other versions of Nvidia CUDA, CuDNN or TensorRT installed or b) any such additional files copied to C:\katago_CUDA or C:\katago_TensorRT.

Create: C:\Program Files\CUDA

Install CUDA and CUDNN before TensorRT. We also put all CUDNN and TensorRT binaries there so that we only need to reference one path in the Windows system's Path environment variables. Alternative, more complicated methods are possible.

*************************************************************************************************

CUDA and CUDNN INSTALLATION

CUDA Installer

Start cuda_11.6.2_511.65_windows.exe as administrator.

Confirm a temporary file path.

Choose Custom installation.

Not selected if not needed or already installed: Driver components | Nvidia Display Driver, Other components | Nvidia PhysX.

Only select CUDA | Runtime | Libraries <all>.

For at least two graphics cards or additionally desired software, selecting more can be necessary. Then, choosing other installation paths and paths in environment variables might also be necessary below.

Instead of the installation path for CUDA Development, replace C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\11.2 and set the more convenient path: C:\Program Files\CUDA

Paths

In Windows start menu, search environment variables (German: Umgebungsvariablen), go there and verify that the installer has created these Windows system-wide (not: for the current user) environment variables (or with a different version number):

Code:

CUDA_PATH           C:\Program Files\CUDA
CUDA_PATH_V11_6     C:\Program Files\CUDA
Path                C:\Program Files\CUDA\bin

If one of the first two is missing, use New to add the missing item, if necessary, fitting your CUDA version.

If you cannot see the third in Path, double-click on the Path row and check if it is missing.

Path contains other paths, such as %SystemRoot%\system32 . Do not accidentally delete prior entries.

If the third item is missing in Path, use New to add it.

Click OK thrice.

Restart Windows.

CUDNN Installation

In a temporary directory, extract: cudnn-windows-x86_64-8.9.1.23_cuda11-archive.zip

Move all from \bin to C:\Program Files\CUDA\bin

Move all from \include to C:\Program Files\CUDA\include

Move all from the differing source directory \lib to C:\Program Files\CUDA\lib\x64

Exceptionally Needed Installation

Nvidia's installer may have found some file, such as zlibwapi.dll, already installed on your computer and therefore not install it in the Path-referenced directory C:\Program Files\CUDA\bin. Locate and copy the file, for example, as follows:

Copy "C:\Program Files (x86)\ASUS\ArmouryDevice\dll\ArmourySocketServer\zlibwapi.dll" to C:\katago_CUDA

KataGo CUDA

Now use a GUI with KataGo CUDA.

*************************************************************************************************

TensorRT INSTALLATION

<Complete TensorRT Installation Variant>

In a temporary directory, extract: TensorRT-8.5.2.2.Windows10.x86_64.cuda-11.8.cudnn8.6.zip

Move all from \bin to C:\Program Files\CUDA\bin

Move all from \include to C:\Program Files\CUDA\include

Move all DLL files from the differing source directory \lib to C:\Program Files\CUDA\bin

Move all LIB files from the differing source directory \lib to C:\Program Files\CUDA\lib\x64

Move the other directories to C:\Program Files\CUDA

<Short TensorRT Installation Variant>

In a temporary directory, extract: TensorRT-8.5.2.2.Windows10.x86_64.cuda-11.8.cudnn8.6.zip

Copy \lib\nvinfer.dll and \lib\nvinfer_builder_resource.dll to C:\Program Files\CUDA\bin

<Alternative TensorRT Installation Variant>

In a temporary directory, extract 2023-06-15-windows64+katago.zip (LizzieYZY_2_5_3). In a directory for TensorRT, locate these exactly same two files:

Copy nvinfer.dll and nvinfer_builder_resource.dll to C:\Program Files\CUDA\bin

KataGo TensorRT

Now use a GUI with KataGo TensorRT.

*************************************************************************************************

TYPICAL FILES IN C:\Program Files\CUDA\bin

CUDA Files (1,68 GB)

Code:

cublas64_11.dll   
cublasLt64_11.dll   
cudart32_110.dll   
cudart64_110.dll   
cufft64_10.dll   
cufftw64_10.dll   
curand64_10.dll   
cusolver64_11.dll   
cusolverMg64_11.dll   
cusparse64_11.dll   
nppc64_11.dll   
nppial64_11.dll   
nppicc64_11.dll   
nppidei64_11.dll   
nppif64_11.dll   
nppig64_11.dll   
nppim64_11.dll   
nppist64_11.dll   
nppisu64_11.dll   
nppitc64_11.dll   
npps64_11.dll   
nvblas64_11.dll   
nvjpeg64_11.dll   
nvrtc-builtins64_116.dll   
nvrtc64_112_0.dll   

CUDNN Files (1.08 GB)

Code:

cudnn64_8.dll   
cudnn_adv_infer64_8.dll   
cudnn_adv_train64_8.dll   
cudnn_cnn_infer64_8.dll   
cudnn_cnn_train64_8.dll   
cudnn_ops_infer64_8.dll   
cudnn_ops_train64_8.dll   

TensorRT Files (0.85 GB)

Code:

nvinfer.dll   
nvinfer_builder_resource.dll   
nvinfer_plugin.dll
nvonnxparser.dll   
nvparsers.dll   
trtexec.exe

EDIT: minor corrections.

NordicGoDojo · #4

If having Windows as the operating system is not a requirement, a Mac mini is easy to recommend. For 750€ you get a silent computer that is more than powerful enough for daily use and running go AI (note: you need to buy a screen and peripherals separately). It may not be quite as fast as a RTX4000, but its power parity will be incomparable.

Then, for getting an AI program running, you'd install homebrew, write one line in the console (brew install katrain), and let the computer do the rest.

RobertJasiek · #5

Is it possible to assess visits/s on a Mac and what value does one get on what Mac model? (Just curios. I am a Windows "fan".)

RobertJasiek · #6

MAJOR TUNING

RTX 4070 (Asus TUF 12G, Quiet mode, 200W TDP, 100% power target) + Ryzen 7700 (8C, 16T) + 64 GB DDR5-RAM JEDEC

18-Block-Model = kata1-b18c384nbt-s6386600960-d3368371862

genconfig (unless benchmark), recommended (+ or ++ if not the highest), - is default GB and time

KataGo 1_13_0 OpenCL

Code:

visits  threads  visits/s    GB    s     remarks

   800     20    1184.02+     -    -     benchmark

 10000     24    1672.61++    -    -     benchmark

 10000     40    1793.91+     -    -     \System32\OpenCL.dll

 10000     48    1874.83     30    -

 10000     40    1808.78+     -    -

 50000     24    1964.08      -    -

100000     40    2203.24+ ### -    -

KataGo 1_13_0 CUDA

CUDA + CUDNN of Megapack

Code:

visits  threads  visits/s    GB    s     remarks

   800     40     450.17+     -    -

   800     48     450.66+     -    -     benchmark

  2000     32     497.30+     -    -

 10000     40     683.34+     -    -

 10000     48     743.29+     -    -

 10000     48     752.09+     -    -     benchmark

CUDA_11_6_2 + CUDNN_8_9_1_23

Code:

visits  threads  visits/s    GB    s     

 10000     80    3184.53      -    -     benchmark

 10000     80    3334.43      -    -

 50000     80    3832.44      -    -

100000     64    3983.82 ###  -    -

KataGo 1_13_1 TensorRT

CUDA + CUDNN of Megapack + TensorRT_8_5_2_2

Code:

visits  threads  visits/s    GB    s     

   800     40    2879.17+     -    -

 10000     80    5161.87+     -    -

 10000     64    4662.43      -    -

 10000     40    4322.18++   64    1

 20000     64    4905.87+     -    -

 30000     64    5119.17+     -    -

CUDA_11_6_2 + CUDNN_8_9_1_23 + TensorRT_8_5_2_2

Code:

 10000     40    4473.86+     -    -

 10000     80    4603.14      -    -

 30000     64    5077.34+     -    -

 40000     80    5431.83      -    -

 50000     64    5299.24      -    -

 60000     80    5496.85      -    -

 80000     64    5823.38      -    -

100000     96    6321.13      -    -

120000     96    6443.15      -    -

140000     80    6494.54 ###  -    -

160000     64    6244.78      -    -

Rather optimised visits/s for visits as only changed parameter

Code:

visits/s     KataGo

24+     OpenCL

82      CUDA

54      TensorRT

Factors comparing different speeds as visits/s

Code:

43     Worst default installation versus best KataGo and Nvidia library installation with rather optimised visits

82     RTX 4090 versus RTX 3050 (2560x1440 Time Spy Graphics)

30     Different combinations of Nvidia file versions (worst case for rather optimised visits)

95     TensorRT : OpenCL   (rather optimised visits)

17     Default visits versus rather optimised visits as only changed parameter (worst case)

07     RTX 4090 versus RTX 4070 (2560x1440 Time Spy Graphics)

81     CUDA : OpenCL       (rather optimised visits)

63     TensorRT : CUDA     (rather optimised visits)

RobertJasiek · #7

Tuning Revisited

Let me also describe major tuning in words. Choose one of the strongest model nets and a KataGo version, then tune for this combination. Each variant of KataGo (OpenCL, CUDA or TensorRT) needs its own tuning, or simply go for TensorRT as the fastest variant on modern graphics cards (except for GUI launches).

For KataGo benchmark, use the -v parameter to specify visits, such as -v 10000 for that many visits. For each execution of KataGo genconfig, also specify the visits when asked. Increase in large steps to locate the order of magnitude where visits/s (visits per second) are maximal. Save or write down the recommended number of threads, or eventually use the most appropriate CFG file for the currently tuned KataGo variant. (And fine tune it with other parameters.)

Do not listen to naysayers denying the value of deviating from defaults, tuning, installing TensorRT and finding good Nvidia library versions! Good installation combined with good tuning can result in a speed improvement up to almost three times as large as the speed difference between RTX 3050 and 4090. The latter is comparable to replacing bad files distributed in a GUI installer to good files selected well from Nvidia's webpage. TensorRT might be thrice as fast as OpenCL. Just tuning the number of threads parameter amounts to a speed factor similar to replacing an RTX 4070 by a 4090.

In conclusion, tuning is much more relevant than replacing a comparatively slow by the fastest graphics cards! Spend a couple of days but do it!

Speeds of different hardwares

Except for too slow, old hardware, I have dug in the archives and found some numbers of visits/s or playouts in comparison to mine (all rounded):

Code:

Speed   Hardware

6500    RTX 4070 TensorRT

4000    RTX 4070 CUDA

3000    2 * RTX 2080TI [1]

2200    RTX 4070 OpenCL

0580    5700XT [2]

0300    iPad_Pro/M1 [3]

0200    iPhone 13 pro [4]

0170    iPad/A12X [5]

Where do RTX 1000, RTX 3000, RTX Laptop cards and Macs fit? How about 3080TI Laptop?

[1] https://www.lifein19x19.com/viewtopic.p ... 39#p263039
goame CUDA b40 2*2080TI 64GB 100000 visits 1s, 40 threads (recommended) = 2832.08 visits/s, 80 threads = 3019.80 visits/s

[2] https://www.lifein19x19.com/viewtopic.p ... 24#p259624
dojo_b ? b40 5700XT 12GB ? ?, 16 threads = 583.43 visits/s

[3] https://www.lifein19x19.com/viewtopic.p ... 26#p267026
Limeztone: For an arbitrary mid game position (b40s985 net):
iPad_Pro/A12X: 14.37 playouts/s
iPad_Pro/M1: 297.63 playouts/s

[4] https://www.lifein19x19.com/viewtopic.p ... 14#p267614
https://www.lifein19x19.com/viewtopic.p ... 28#p268628
wineandgolover b40 iPhone 13 pro, nearly 200 visits/s

[5] https://www.lifein19x19.com/viewtopic.p ... 21#p238921
https://www.lifein19x19.com/viewtopic.p ... 73#p249273
y_ich iPad/A12X, 170 playouts/s
y_ich: The AI needs at least a few hundreds playouts to read simple ladders.

EDIT: typo.

NordicGoDojo · #8

RobertJasiek wrote:

Is it possible to assess visits/s on a Mac and what value does one get on what Mac model? (Just curios. I am a Windows "fan".)

This is tricky, because KaTrain – as far as I know – prunes its search and therefore does 'more with less'. IIRC, the author said that this is why he doesn't have the visits/s visible in the program, since it would give lower numbers that would be incomparable with other AI programs.

To get some idea, I checked A Master of Go on my iPad, which has an M1 chip. The newest 18-block KataGo network gets roughly 200 nodes per second there. The Mac mini I mentioned has an M2 chip, which Apple says is roughly 40% faster than M1 when it comes to its integrated Neural Engine. Assuming that KaTrain is able to make full use of the Neural Engine – which it should, since my meters show 100% GPU usage when running KaTrain – its analysis 'might' be worth an equivalent of 280 visits/s.

These days it is probably also worth noting that the power consumption of both M1 and M2 when doing AI analysis is in the range of 40W.

RobertJasiek · #9

Interesting, thanks!

According to HWiNFO64, my RTX 4070 at 100% power target consumes between 150 and 210W when running KataGo. Typically close to 200W but some operations and KataGo CUDA are a bit more modest. 90% instead of 96% GPU load has a great impact on whether it is closer to 150W or 200W. I guess that some 70% power target via Afterburner would result in consistently around 150W use. Such may be more important on notebooks. Let me assume 200W as representative on my desktop and use TensorRT. Add 65W for the APU (even if the iGPU is idle, it consumes much, like 35W CPU and 30W iGPU; desktop Ryzens are not efficient but only roughly keep their TDP). I ignore peanuts for other mainboard components. Then we have roughly these efficiencies:

Code:

visits/s/W   Hardware

24.5         200W-RTX 4070 + 65W-APU

05.0         40W-iPad M1

Hence, M1 consumes comparatively little even under full load but an RTX 4000 desktop with moderate APU is roughly 5 times as efficient while consuming 6.6 times as much power. I think RTX 4000 Laptop GPUs are, and especially can be set to be, even more power efficient. Modern dGPU-Chips are both power-hungry and, at that level, efficient. Of course, mobile devices have their good uses, too. It is just that one should not expect speed wonders from small form factors with necessarily limited TDPs.

RobertJasiek · **#10**

KataGo OpenCL uses libraries in its directory or, if OpenCL.dll is missing, that file from the system directory.

I have used Process Monitor to watch used files. Note that multiple GPUs, server use etc. may require more files.

Typically and besides system files, KataGo CUDA uses these files (or similarly named model, CFG, LOG files):

Code:

C:\katago_CUDA\b18.bin.gz
C:\katago_CUDA\gtp_custom_Nvidia_11.6.2_50000.cfg
C:\katago_CUDA\libcrypto-1_1-x64.dll
C:\katago_CUDA\libssl-1_1-x64.dll
C:\katago_CUDA\libz.dll
C:\katago_CUDA\libzip.dll
C:\katago_CUDA\msvcp140.dll
C:\katago_CUDA\vcruntime140.dll
C:\katago_CUDA\zlibwapi.dll
C:\katago_CUDA\gtp_logs\20230617-171446-07EFA493.log
C:\Program Files\CUDA\bin\cublas64_11.dll
C:\Program Files\CUDA\bin\cublasLt64_11.dll
C:\Program Files\CUDA\bin\cudnn_cnn_infer64_8.dll
C:\Program Files\CUDA\bin\cudnn_ops_infer64_8.dll
C:\Program Files\CUDA\bin\cudnn64_8.dll

Typically, KataGo TensorRT uses:

Code:

C:\katago_TensorRT\b18.bin.gz
C:\katago_TensorRT\gtp_custom.cfg
C:\katago_TensorRT\KataGoData\trtcache\trt-8502_gpu-e00748cc_tune-e98f11832326_exact19x19_batch32_fp16
C:\katago_TensorRT\KataGoData\trtcache\trt-8502_gpu-e00748cc_tune-e98f11832326_exact19x19_batch96_fp16
C:\katago_TensorRT\libcrypto-1_1-x64.dll
C:\katago_TensorRT\libssl-1_1-x64.dll
C:\katago_TensorRT\libz.dll
C:\katago_TensorRT\libzip.dll
C:\katago_TensorRT\msvcp140.dll
C:\katago_TensorRT\vcruntime140.dll
C:\katago_TensorRT\gtp_logs\20230617-120124-AF10C399.log
C:\Program Files\CUDA\bin\cublas64_11.dll
C:\Program Files\CUDA\bin\cublasLt64_11.dll
C:\Program Files\CUDA\bin\cudnn_ops_infer64_8.dll
C:\Program Files\CUDA\bin\cudnn64_8.dll
C:\Program Files\CUDA\bin\nvinfer.dll
C:\Program Files\CUDA\bin\nvinfer_builder_resource.dll

RobertJasiek · **#11**

Introduction

As mentioned earlier, I started using the programs as Windows administrator. Now that I know how to let them run, usage moves to a Windows standard user. This introduces a few extra hurdles but it is fairly easy to overcome them. I describe things presuming the earlier installation of Baduk Megapack. For individual installation of the GUIs, things should be similar.

KaTrain

KaTrain also wants write access as the Windows standard user to these folders:

Code:

C:\Users\<user_name>\.katrain
C:\Users\<user_name>\.kivy
C:\baduk

By default, such access rights are granted. Therefore, KaTrain can just be used.

Lizzie

Lizzie is a bit trickier. While Megapack created the desktop icon for the administrator and set some file's contents accordingly including update options, we must create a new desktop icon for the Windows standard user and there is no easy update option for him, which I do not need but your usage might differ.

In Explorer, go to the directory C:\baduk\lizzie, right-click on lizzie.ico and create the desktop icon. Right-click on this desktop icon. In Target write:

Code:

C:\baduk\LizzieYZY\jre\java11\bin\javaw.exe -jar C:\baduk\lizzie\lizzie.jar

(This is similar to setting up a desktop icon for CGoban, which also uses JavaRuntimeEnvironment.)

Now, you can use Lizzie as expected.

Security

For each of the folders, in which the GUIs or KataGo want write access (unless you always use different paths for logs and configuration files),

Code:

C:\baduk
C:\katago_CUDA
C:\katago_OpenCL
C:\katago_TensorRT

you might deny access by your possibly different Windows standard user, which exists for online access. Furthermore, you might supervise these folders for software execution. No write access by online users but execution right of these go folders by other users establish safety.

Even without these additional steps, it is good practice to perform everyday usage (such as using go programs) as a Windows standard user to restrict the scope of any harm by attacks on the computer. Needless to say, detailed information on a possible Windows security concept is on my webpage but note that Windows 11 makes software restrictions harder than described there: https://home.snafu.de/jasiek/windows_se ... ncept.html

EDIT: write access in other folders.

RobertJasiek · **#12**

Now, let me describe how get Sabaki or Sabaki52 running.

Sabaki

Sabaki uses a path: C:\Users\<user_name>\AppData\Roaming\Sabaki

Set your options:

Engines | Manage Engines... | General

Add engines:

Engines | Manage Engines... | Engines

Set a logging path, such as: C:\Users\<user_name>\AppData\Roaming\Sabaki\logs

Sabaki run with the installation Windows administrator account shows some preinstalled engines.

Sabaki run with a Windows standard user account initially shows an empty engines list.

Add

Code:

Name = katago_OpenCL

Path = C:\katago_OpenCL\katago.exe

Arguments = gtp -model b18.bin.gz -config gtp_custom.cfg

Initial commands =

Add

Code:

Name = katago_CUDA

Path = C:\katago_CUDA\katago.exe

Arguments = gtp -model b18.bin.gz -config gtp_custom.cfg

Initial commands =

Add

Code:

Name = katago_TensorRT

Path = C:\katago_TensorRT\katago.exe

Arguments = gtp -model b18.bin.gz -config gtp_custom.cfg

Initial commands =

Optionally set, for example, Initial commands = time_settings 0 10 1;

Prepare playing by attaching players or engines:

Engines | Attach...

Enter human player name or
Down-arrow left of black player name / Down-arrow right of white player name: select engine from drop-down list

Press OK.

Play:

F5 to start playing.

ESC to stop playing.

Players / engines can be changed during the game.

Sabaki52

Sabaki52 is for self-play of a black versus a white engine.

So far, I have tested Sabaki and Sabaki52 of Baduk Megapack.

Once engines are set in Sabaki, they can also be used in Sabaki52.

Engines | Show Engines Sidebar

Click on the circled arrow, select an engine for both players.

Optionally, click on the circled arrow again to set the white engine.

Click on the lightning symbol or press F5 to start / stop engine versus engine play.

Mark an engine, right-click, Detach to remove it from current play.

If the current list contains 1 engine, it is used for both players.

If the current list contains 2 engines, both are used for the two players.

If the current list contains 3 engines, only the first is used.

RobertJasiek · **#13**

q5go

Download q5go from: https://github.com/bernds/q5Go/releases

Installation: extract ZIP archive, copy to C:\Program Files\q5go as it is a 64b program.

q5go writes to C:\Users\<user_name>\AppData\Local\q5go\q5gorc

When setting up q5go for the first time for different Windows users and has been configured as below for one Windows user, \q5go can simply be copied to the same C:\Users\<user_name>\AppData\Local subdirectory of a different <user_name>.

In the main window, select Settings | Preferences | Computer Go | New... for each KataGo version and set:

KataGo OpenCL

Code:

Name:   katago_OpenCL

Executable:   C:\katago_OpenCL\katago.exe

Arguments:   gtp -model b18.bin.gz -config gtp_custom.cfg

KataGo CUDA

Code:

Name:   katago_CUDA

Executable:   C:\katago_CUDA\katago.exe

Arguments:   gtp -model b18.bin.gz -config gtp_custom.cfg

KataGo TensorRT

Code:

Name:   katago_TensorRT

Executable:   C:\katago_TensorRT\katago.exe

Arguments:   gtp -model b18.bin.gz -config gtp_custom.cfg

Optionally activate: Use for analysis

Click OK as necessary

Analysis | Play against engine from current position...

Enter human player name and select engine, select engine colour etc., click OK.

RobertJasiek · **#14**

Ogatak

Ogatak is a 64b program with a simple, clear board GUI, an emphasis on analysis and the possibility to play against the AI. Download from https://github.com/rooklift/ogatak/releases , extract the ZIP and copy to: C:\Program Files\Ogatak

Display in portrait position and full size window do not work properly.

Ogatak uses the directory C:\Users\<user_name>\AppData\Roaming\Ogatak

Ogatak can only manage one KataGo engine at a time. So set one of OpenCL, CUDA or TensorRT as follows:

KataGo OpenCL:

Code:

Setup | Locate KataGo...     C:\katago_OpenCL\katago.exe

Setup | Locate KataGo Locate analysis config...     C:\katago_OpenCL\analysis_config.cfg

Setup | Choose network...     C:\katago_OpenCL\b18.bin.gz

This lets Ogatak call:

Code:

C:\katago_OpenCL\katago.exe analysis -config C:\katago_OpenCL\analysis_config.cfg -model C:\katago_OpenCL\b18.bin.gz -quit-without-waiting

KataGo CUDA:

Code:

Setup | Locate KataGo...     C:\katago_CUDA\katago.exe

Setup | Locate KataGo Locate analysis config...     C:\katago_CUDA\analysis_config.cfg

Setup | Choose network...     C:\katago_CUDA\b18.bin.gz

This lets Ogatak call:

Code:

C:\katago_CUDA\katago.exe analysis -config C:\katago_CUDA\analysis_config.cfg -model C:\katago_CUDA\b18.bin.gz -quit-without-waiting

KataGo TensorRT:

Code:

Setup | Locate KataGo...     C:\katago_TensorRT\katago.exe

Setup | Locate KataGo Locate analysis config...     C:\katago_TensorRT\analysis_config.cfg

Setup | Choose network...     C:\katago_TensorRT\b18.bin.gz

This lets Ogatak call:

Code:

C:\katago_TensorRT\katago.exe analysis -config C:\katago_TensorRT\analysis_config.cfg -model C:\katago_TensorRT\b18.bin.gz -quit-without-waiting

Press Space to start / stop analysis.

F11 for engine self-play.

To play against the engine, set Misc | Engine plays Black or Misc | Engine plays White. Stop play by Space. Stop playing mode by Misc | Halt.

To start a new game, press CTRL N.

Of course, you might use various analysis tools and options.

RobertJasiek · **#15**

Having just played with KataGo a bit, there are quite a few moves I play for each of these two types:

a) KataGo's best move

b) hardly considered by KataGo at all

Both surprise me. I have neither expected to play a significant number of AI-correct moves (of which kyus and I guess some dans might miss quite a few) nor expected to play a significant number of AI-improper moves, which apparently drop AI-percentages by, say, 3%+.

(a) is good - my play is not outright hopeless:) (b) is also good - I can learn a lot from KataGo.

Is my early AI experience typical or does yours differ much?

pwaldron · **#16**

RobertJasiek wrote:

Is my early AI experience typical or does yours differ much?

That matches my experience, too.

My matches with AI were often 'clumpy'. That is, there would be a sequence of moves that matched the AI quite well, followed by a number of moves that the AI didn't like, followed again by another reasonable sequence. It was rather like stitching josekis together, punctuated by a mistake in direction. I've noticed pros have a similar effect, but their sequences of good moves are longer and the direction mistakes aren't so bad.

The other thing I found is playing the AI runs the risk of getting into a rut. The AI is impossibly strong and there's much to learn, but the lines of play it chooses are comparatively narrow and it beats you fairly quickly when playing even. Your games will see many 4-4 points with direct invasions, but not so many 5-3 or 5-4 openings. You also won't have any practice identifying and punishing unreasonable moves, nor are you likely to get much endgame practice since you'll lose faster than that.

I think experiencing the full breadth of go is still important at our level, and the AI only helps with part of it. Good luck, and enjoy.

kvasir · **#17**

RobertJasiek wrote:

Both surprise me. I have neither expected to play a significant number of AI-correct moves (of which kyus and I guess some dans might miss quite a few) nor expected to play a significant number of AI-improper moves, which apparently drop AI-percentages by, say, 3%+.

It shouldn't surprise you, most """good""" players play lot of good moves.

In my experience I often play mostly within 0.2 points of whatever katago picks for the first 100 moves (that is my first 50 moves). Usually, I play more good moves compared to 10000 playouts than 1000 playouts or the policy. I think katago has really improved in this regard ( :lol:

), it seems less dogmatic with every update and less likely to go for something risky that actually might not work.

Personally, I'm not concerned about picking the move that katago would play. That is a move that maybe suits katago, not myself. The move katago picks also changes with number of playouts, and even if you rerun it on the same position. Depending on playouts, repeating the analysis and other factors, katago will in most positions pick a move from a small set of moves. Most of the time there is a larger set of moves that is almost as good as anything that katago picks. Another thing to note is that katago will not only refute your moves but sometimes its own moves

Occasionally there are some poor moves in the set of moves that katago would pick (depending on playouts and other factors), it is not at all uncommon that the "best" move changes while it searches.

It is also not only a matter of skill but also of style if you choose to only play high quality moves or not. Even super strong Go playing program appear to consider some critical lines first and then end up refuting them, in my experience the earlier networks were more prone to do this; that is in a way a style choice. A different choice will suit different types of players.

lightvector · **#18**

For dan players, it's usually pretty doable depending on the game to get ~30-60% accuracy on guessing the exact move in strong pro games, just going one by one guessing every move on every turn. Obviously that would be even higher if discarding failed guesses due to cases where order of moves provably didn't matter or things like endgame moves where multiple choices are equivalent. And it would be even higher still if you could somehow probe the players' minds and consider yourself to have a successful match if you picked a move that they considered to be almost equally preferable to them but that they just happened not to choose.

Experience of playing a teaching game against pros or watching a pro live review an amateurs will often reveal that while the amateur players play a lot of exact-pro moves (for the same reason they would achieve a decent fraction of exact guesses), they'll also play many moves that wouldn't be considered even momentarily by a pro, many obviously-bad-to-a-pro shapes.

You can also experience the same from the stronger side if you've taught a player, e.g. 3-7 stones weaker (i.e. substantially weaker but not so weak that every move is a mistake from your perspective).

So in retrospect, not a big surprise that this picture doesn't change much with AI. After all, AIs aren't actually that much better than top pros on an individual-move basis, and sometimes worse on some moves. But still interesting of course how it turned out. I don't know I would have predicted very accurately at all all the little ways things turned out in terms of the actual specific strengths and weaknesses and styles.

John Fairbairn · **#19**

Very many years ago, GoGoD introduced the GoScorer program in a sort of coconut-shy sideshow at a London New Year Tournament. The idea was that a random game from the GoGoD database would be selected for you and you had to play through it to see how many pro moves you could guess. There was a prize for the winner who get the highest score. It was a huge hit, even though there was only one computer available and one particular individual who later (as I recall) became an insei hogged the machine to the intense irritation of many other people. We were naive. We didn't charge per go!

Again as I recall, the top score was about 82-83% (the above insei-to-be), but that was an outlier, and the usual range of dan players at the tournament was 40-60%. Scores were possibly distorted because the database was still small then and often comprised games that had been in magazines such as Go Review, and so the really dedicated players may have already seen some of them.

We later released the program with the database, and again it was big success. Bill Spight was one of its biggest fans.

But... There's always a but, and in this case it seems to have been the same problem as with AI. You can be told which moves you failed on, but not why.

We tried adding some bells and whistles, such as clues telling you which quarter, side or line of the board to look at, but they had little real value. The ultimate problem was that no-one ever seemed to improve by using the program. It was just the equivalent of a video game, the most well-known of that era still being ping-pong, I think, though Atari games soon came along. That is one reason I remain rather uninterested in the current AI bling. I only tried GoScorer myself in a concentrated way while writing it.

There were, however, a couple of side results that were of some interest, especially to people who were writing programs to actually play go and needed a small but decent list of candidate moves. One is that not only were moves nearly always adjacent to the previous few moves (which you'd expect anyway from any game involving contact play) but also that the ancient proverb "good moves and bad moves are bedfellows" (i.e. are on adjacent intersections) really did stand up an awful lot. Does this still apply with AI? A variation on this was also that move order rather than the actual moves tended to differ.

For the nostalgic among us, GoScorer is still embedded in GoGoD95. It's only merit now might be that it can be used by those who wish to memorise pro games, but that's not something I'd specially recommend.

RobertJasiek · **#20**

Previously, I have used Cgoban with a jar file. Now, I have tried the MSI installer. Cgoban suggests a wild installation directory disrespecting C:\Program Files so, of course, I have chosen a subdirectory of the latter. The installer has failed without adminstrative rights. I sort of expected such but MSI installers cannot be called with them in the Explorer.

My successful second attempt starts the command line with adminstrative rights so the MSI installer inherits them and then can install in C:\Program Files, as it should.

This is more convenient than the previously necessary installation of a suitable Java Runtime Environment then calling a jar file. However, some go programmers really need to learn respecting the Windows security design! The purpose of C:\Program Files (and C:\Program Files (x86) for 32b programs) is prohibited writing there by Windows standard users so that malware cannot be easily installed where later execution is expected.

New AI Computer

Who is online