Life In 19x19

Posted: **Tue Jun 06, 2023 10:42 pm**

* You don't need to do anything to make KataGo use tensor cores, and there's not a great way to be absolutely sure whether it does or not, unless you use OpenCL. The OpenCL version will tell you if it's using tensor cores or not as it runs the tuning the first time you run it. Look at the tuning output as it tunes each operation reporting various performance stats and you'll see a section where it tunes (or fails to tune) for tensor cores and whether or not it decides to use them. The CUDA and TensorRT versions just call Nvidia's libraries and those libraries determine what to do by their own underlying magic, and you just have to trust whatever they are doing. Unlike the OpenCL, KataGo has little to no control over it. If CUDA or TensorRT version runs and works without crashing, then that's probably what you're going to get and that's it. So I recommend not worrying about whether it uses tensor cores or whatever or even trying to find that out because *you* also have little to no control over such low-level details, just benchmark each thing and see what finally gives the best visits/s (./katago.exe benchmark).

* KataGo has *always* historically had separate TensorRT and CUDA versions, and the same is true now, it's just that v1.13.0 had a bug with TensorRT (*NOT* a bug with tensor cores, tensor cores and TensorRT are completely different things with no particular relationship to each other), so it got its own release v1.13.1 with a fix, whereas all other versions (OpenCL, Eigen, CUDA) were not because they had no difference. Use TensorRT if you want to attempt Nvidia black magic, which goes beyond even the CUDA version by doing some secret Nvidia proprietary magic optimization of different layers of operations and such, which neither you or I have control over, which if it works, might squeeze out a bit more performance at the cost of much longer startup and loading times. Otherwise, just use whatever version works for you. CUDA is fine if you've installed CUDA. OpenCL is fine too, and has a decent chance of working right out of the box, it comes zipped with all the DLLs already that it should need.

* It's been a long time since I tried to install this stuff on Windows. I think that once you have the right drivers, "installation" consists of just having the right DLLs in your path, which are some some CUDA and CUDNN dlls for CUDA, or some DLL with "nvinfer" in the name for TensorRT ("nvinfer" is Nvidia's technical name for TensorRT that it uses for filenames or some technical docs). So, don't trust me on this too much since it's been a while, but I think for example that one could even crudely find the appropriate DLLs from digging into the installation folders and just copy them into the katago executable directory (since Windows also normally considers the local directory of an exe to be a search location for DLLs for that exe).

Posted: **Wed Jun 07, 2023 1:06 am**

in order to run katago-v1.13.1-trt8.5-cuda11.2-windows-x64, I downloaded Lizzieyzy and copied the nvinfer.dll, nvinfer_builder_resource.dll from there. I didn't install anything else. I checked in Sabaki with different networks.
it works about twice as fast as opencl for versions v1.12.4. for v1.13 I haven't compared yet

Lizzieyzy https://github.com/yzyray/lizzieyzy/releases
2023-01-30-windows64+katago.zip ~1.8gb
https://drive.google.com/file/d/1fhad97 ... drive_link

Posted: **Wed Jun 07, 2023 2:38 am**

Thank you both! Now I have something to look for, try and benchmark. If I should install all the Nvidia stuff, I might then fetch the suitable DLLs and put them in KataGo's directory or see if setting PATH does the job.

Posted: **Wed Jun 07, 2023 4:14 pm**

v1.13 katago_tensorRT ~2.2 times faster than katago_opencl (GeForce GTX 1650, b18)

Posted: **Fri Jun 09, 2023 2:31 am**

Suppose C:\katago is my directory to KataGo OpenCL, C:\baduk\katrain is my directory to KaTrain and C:\baduk is my directory of Baduk AI Megapack.

Now that I could run KataGo OpenCL, CUDA and TensorRT each in its directory as

katago benchmark

on the command line, I want to start with KataGo OpenCL in KaTrain in Windows. In C:\katago I run

katago.exe genconfig -model b18.bin.gz -output gtp_custom.cfg

and answer the questions as follows:

KataGo creates gtp_custom.cfg:

Code: Select all

# Config for KataGo C++ GTP engine, i.e. "./katago.exe gtp"

# In this config, when a parameter is given as a commented out value,
# that value also is the default value, unless described otherwise. You can
# uncomment it (remove the pound sign) and change it if you want.

# ===========================================================================
# Command-line usage
# ===========================================================================
# All of the below values may be set or overridden via command-line arguments:
#
# -override-config KEY=VALUE,KEY=VALUE,...

# ===========================================================================
# Logs and files
# ===========================================================================
# This section defines where and what logging information is produced.

# Each run of KataGo will log to a separate file in this dir.
# This is the default.
logDir = gtp_logs
# Uncomment and specify this instead of logDir to write separate dated subdirs
# logDirDated = gtp_logs
# Uncomment and specify this instead of logDir to log to only a single file
# logFile = gtp.log

# Logging options
logAllGTPCommunication = true
logSearchInfo = true
logToStderr = false

# KataGo will display some info to stderr on GTP startup
# Uncomment the next line and set it to false to suppress that and remain silent
# startupPrintMessageToStderr = true

# Write information to stderr, for use in things like malkovich chat to OGS.
# ogsChatToStderr = false

# Uncomment and set this to a directory to override where openCLTuner files
# and other cached data is written. By default it saves into a subdir of the
# current directory on windows, and a subdir of ~/.katago on Linux.
# homeDataDir = PATH_TO_DIRECTORY

# ===========================================================================
# Analysis
# ===========================================================================
# This section configures analysis settings.
#
# The maximum number of moves after the first move displayed in variations
# from analysis commands like kata-analyze or lz-analyze.
# analysisPVLen = 15

# Report winrates for chat and analysis as (BLACK|WHITE|SIDETOMOVE).
# Most GUIs and analysis tools will expect SIDETOMOVE.
# reportAnalysisWinratesAs = SIDETOMOVE

# Extra noise for wider exploration. Large values will force KataGo to
# analyze a greater variety of moves than it normally would.
# An extreme value like 1 distributes playouts across every move on the board,
# even very bad moves.
# Affects analysis only, does not affect play.

# analysisWideRootNoise = 0.04

# ===========================================================================
# Rules
# ===========================================================================
# This section configures the scoring and playing rules. Rules can also be
# changed mid-run by issuing custom GTP commands.
#
# See https://lightvector.github.io/KataGo/rules.html for rules details.
#
# See https://github.com/lightvector/KataGo/blob/master/docs/GTP_Extensions.md
# for GTP commands.

koRule = SITUATIONAL  # options: SIMPLE, POSITIONAL, SITUATIONAL

scoringRule = AREA  # options: AREA, TERRITORY

taxRule = NONE  # options: NONE, SEKI, ALL

multiStoneSuicideLegal = true

hasButton = false

whiteHandicapBonus = 0  # options: 0, N, N-1

friendlyPassOk = true

# ===========================================================================
# Bot behavior
# ===========================================================================

# ------------------------------
# Resignation
# ------------------------------

# Resignation occurs if for at least resignConsecTurns in a row, the
# winLossUtility (on a [-1,1] scale) is below resignThreshold.
allowResignation = true
resignThreshold = -0.90
resignConsecTurns = 3

# By default, KataGo may resign games that it is confidently losing even if they
# are very close in score. Uncomment and set this to avoid resigning games
# if the estimated difference is points is less than or equal to this.
# resignMinScoreDifference = 10

# ------------------------------
# Handicap
# ------------------------------
# Assume that if black makes many moves in a row right at the start of the
# game, then the game is a handicap game. This is necessary on some servers
# and for some GUIs and also when initializing from many SGF files, which may
# set up a handicap game using repeated GTP "play" commands for black rather
# than GTP "place_free_handicap" commands; however, it may also lead to
# incorrect understanding of komi if whiteHandicapBonus is used and a server
# does not have such a practice. Uncomment and set to false to disable.
# assumeMultipleStartingBlackMovesAreHandicap = true

# Makes katago dynamically adjust in handicap or altered-komi games to assume
# based on those game settings that it must be stronger or weaker than the
# opponent and to play accordingly. Greatly improves handicap strength by
# biasing winrates and scores to favor appropriate safe/aggressive play.
# Does not affect analysis (lz-analyze, kata-analyze, used by programs like
# Lizzie) so analysis remains unbiased. Uncomment and set this to 0 to disable
# this and make KataGo play the same always.
# dynamicPlayoutDoublingAdvantageCapPerOppLead = 0.045

# Instead of "dynamicPlayoutDoublingAdvantageCapPerOppLead", you can comment
# that out and uncomment and set "playoutDoublingAdvantage" to a value between
# from -3.0 to 3.0 to set KataGo's aggression to a FIXED level. This affects
# analysis tools (lz-analyze, kata-analyze, used by programs like Lizzie).
# Negative makes KataGo behave as if it is much weaker than the opponent,
# preferring to play defensively. Positive makes KataGo behave as if it is
# much stronger than the opponent, prefering to play aggressively or even
# overplay slightly.
#
# If this and "dynamicPlayoutDoublingAdvantageCapPerOppLead" are both set
# then dynamic will be used for all games and this fixed value will be used
# for analysis tools.
# playoutDoublingAdvantage = 0.0

# Uncomment one of these when using "playoutDoublingAdvantage" to enforce
# that it will only apply when KataGo plays as the specified color and will be
# negated when playing as the opposite color.
# playoutDoublingAdvantagePla = BLACK
# playoutDoublingAdvantagePla = WHITE

# ------------------------------
# Passing and cleanup
# ------------------------------
# Make the bot never assume that its pass will end the game, even if passing
# would end and "win" under Tromp-Taylor rules. Usually this is a good idea
# when using it for analysis or playing on servers where scoring may be
# implemented non-tromp-taylorly. Uncomment and set to false to disable.
# conservativePass = true

# When using territory scoring, self-play games continue beyond two passes
# with special cleanup rules that may be confusing for human players. This
# option prevents the special cleanup phases from being reachable when using
# the bot for GTP play. Uncomment and set to false to enable entering special
# cleanup. For example, if you are testing it against itself, or against
# another bot that has precisely implemented the rules documented at
# https://lightvector.github.io/KataGo/rules.html
# preventCleanupPhase = true

# ------------------------------
# Miscellaneous behavior
# ------------------------------
# If the board is symmetric, search only one copy of each equivalent move.
# Attempts to also account for ko/superko, will not theoretically perfect for
# superko. Uncomment and set to false to disable.
# rootSymmetryPruning = true

# Uncomment and set to true to avoid a particular joseki that some networks
# misevaluate, and also to improve opening diversity versus some particular
# other bots that like to play it all the time.
# avoidMYTDaggerHack = false

# Prefer to avoid playing the same joseki in every corner of the board.
# Uncomment to set to a specific value. See "Avoid SGF patterns" section.
# By default: 0 (even games), 0.005 (handicap games)
# avoidRepeatedPatternUtility = 0.0

# Experimental logic to fight against mirror Go even with unfavorable komi.
# Uncomment to set to a specific value to use for both playing and analysis.
# By default: true when playing via GTP, but false when analyzing.
# antiMirror = true

# Enable some hacks that mitigate rare instances when passing messes up deeper searches.
# enablePassingHacks = true


# ===========================================================================
# Search limits
# ===========================================================================

# Terminology:
# "Playouts" is the number of new playouts of search performed each turn.
# "Visits" is the same as "Playouts" but also counts search performed on
# previous turns that is still applicable to this turn.
# "Time" is the time in seconds.

# For example, if KataGo searched 200 nodes on the previous turn, and then
# after the opponent's reply, 50 nodes of its search tree was still valid,
# then a visit limit of 200 would allow KataGo to search 150 new nodes
# (for a final tree size of 200 nodes), whereas a playout limit of of 200
# would allow KataGo to search 200 nodes (for a final tree size of 250 nodes).

# Additionally, KataGo may also move before than the limit in order to
# obey time controls (e.g. byo-yomi, etc) if the GTP controller has
# told KataGo that the game has is being played with a given time control.

# Limits for search on the current turn.
# If commented out or unspecified, the default is to have no limit.
# maxVisits = 500
# maxPlayouts = 300
# maxTime = 10.0

# Ponder on the opponent's turn?
ponderingEnabled = true
# maxTimePondering = 60.0

# ------------------------------
# Other search limits and behavior
# ------------------------------

# Approx number of seconds to buffer for lag for GTP time controls - will
# move a bit faster assuming there is this much lag per move.
lagBuffer = 1.0

# Number of threads to use in search
numSearchThreads = 40

# Play a little faster if the opponent is passing, for human-friendliness.
# Comment these out to disable them, such as if running a controlled match
# where you are testing KataGo with fixed compute per move vs other bots.
searchFactorAfterOnePass = 0.50
searchFactorAfterTwoPass = 0.25

# Play a little faster if super-winning, for human-friendliness.
# Comment these out to disable them, such as if running a controlled match
# where you are testing KataGo with fixed compute per move vs other bots.
searchFactorWhenWinning = 0.40
searchFactorWhenWinningThreshold = 0.95

# ===========================================================================
# GPU settings
# ===========================================================================
# This section configures GPU settings.
#
# Maximum number of positions to send to a single GPU at once. The default
# value is roughly equal to numSearchThreads, but can be specified manually
# if running out of memory, or using multiple GPUs that expect to share work.
# nnMaxBatchSize = <integer>

# Controls the neural network cache size, which is the primary RAM/memory use.
# KataGo will cache up to (2 ** nnCacheSizePowerOfTwo) many neural net
# evaluations in case of transpositions in the tree.
# Increase this to improve performance for searches with tens of thousands
# of visits or more. Decrease this to limit memory usage.
# If you're happy to do some math - each neural net entry takes roughly
# 1.5KB, except when using whole-board ownership/territory
# visualizations, where each entry will take roughly 3KB. The number of
# entries is (2 ** nnCacheSizePowerOfTwo). (E.g. 2 ** 18 = 262144.)
# You can compute roughly how much memory the cache will use based on this.
nnCacheSizePowerOfTwo = 20

# Size of mutex pool for nnCache is (2 ** this).
nnMutexPoolSizePowerOfTwo = 16

numNNServerThreadsPerModel = 1
openclDeviceToUseThread0 = 1


# ===========================================================================
# Root move selection and biases
# ===========================================================================
# Uncomment and edit any of the below values to change them from their default.

# If provided, force usage of a specific seed for various random things in
# the search. The default is to use a random seed.
# searchRandSeed = hijklmn

# Temperature for the early game, randomize between chosen moves with
# this temperature
# chosenMoveTemperatureEarly = 0.5

# Decay temperature for the early game by 0.5 every this many moves,
# scaled with board size.
# chosenMoveTemperatureHalflife = 19

# At the end of search after the early game, randomize between chosen
# moves with this temperature
# chosenMoveTemperature = 0.10

# Subtract this many visits from each move prior to applying
# chosenMoveTemperature (unless all moves have too few visits) to downweight
# unlikely moves
# chosenMoveSubtract = 0

# The same as chosenMoveSubtract but only prunes moves that fall below
# the threshold. This setting does not affect chosenMoveSubtract.
# chosenMovePrune = 1

# Number of symmetries to sample (without replacement) and average at the root
# rootNumSymmetriesToSample = 1

# Using LCB for move selection?
# useLcbForSelection = true

# How many stdevs a move needs to be better than another for LCB selection
# lcbStdevs = 5.0

# Only use LCB override when a move has this proportion of visits as the
# top move.
# minVisitPropForLCB = 0.15

# ===========================================================================
# Internal params
# ===========================================================================
# Uncomment and edit any of the below values to change them from their default.

# Scales the utility of winning/losing
# winLossUtilityFactor = 1.0

# Scales the utility for trying to maximize score
# staticScoreUtilityFactor = 0.10
# dynamicScoreUtilityFactor = 0.30

# Adjust dynamic score center this proportion of the way towards zero,
# capped at a reasonable amount.
# dynamicScoreCenterZeroWeight = 0.20
# dynamicScoreCenterScale = 0.75

# The utility of getting a "no result" due to triple ko or other long cycle
# in non-superko rulesets (-1 to 1)
# noResultUtilityForWhite = 0.0

# The number of wins that a draw counts as, for white. (0 to 1)
# drawEquivalentWinsForWhite = 0.5

# Exploration constant for mcts
# cpuctExploration = 1.0
# cpuctExplorationLog = 0.45

# Parameters that control exploring more in volatile positions, exploring
# less in stable positions.
# cpuctUtilityStdevPrior = 0.40
# cpuctUtilityStdevPriorWeight = 2.0
# cpuctUtilityStdevScale = 0.85

# FPU reduction constant for mcts
# fpuReductionMax = 0.2
# rootFpuReductionMax = 0.1
# fpuParentWeightByVisitedPolicy = true

# Parameters that control weighting of evals based on the net's own
# self-reported uncertainty.
# useUncertainty = true
# uncertaintyExponent = 1.0
# uncertaintyCoeff = 0.25

# Explore using optimistic policy
# rootPolicyOptimism = 0.2
# policyOptimism = 1.0

# Amount to apply a downweighting of children with very bad values relative
# to good ones.
# valueWeightExponent = 0.25

# Slight incentive for the bot to behave human-like with regard to passing at
# the end, filling the dame, not wasting time playing in its own territory,
# etc., and not play moves that are equivalent in terms of points but a bit
# more unfriendly to humans.
# rootEndingBonusPoints = 0.5

# Make the bot prune useless moves that are just prolonging the game to
# avoid losing yet.
# rootPruneUselessMoves = true

# Apply bias correction based on local pattern keys
# subtreeValueBiasFactor = 0.45
# subtreeValueBiasWeightExponent = 0.85

# Use graph search rather than tree search - identify and share search for
# transpositions.
# useGraphSearch = true

# How much to shard the node table for search synchronization
# nodeTableShardsPowerOfTwo = 16

# How many virtual losses to add when a thread descends through a node
# numVirtualLossesPerThread = 1

# Improve the quality of evals under heavy multithreading
# useNoisePruning = true

# ===========================================================================
# Avoid SGF patterns
# ===========================================================================
# The parameters in this section provide a way to avoid moves that follow
# specific patterns based on a set of SGF files loaded upon startup.
# Uncomment them to use this feature. Additionally, if the SGF file
# contains the string %SKIP% in a comment on a move, that move will be
# ignored for this purpose.

# Load SGF files from this directory when the engine is started
# (only on startup, will not reload unless engine is restarted)
# avoidSgfPatternDirs = path/to/directory/with/sgfs/
# You can also surround the file path in double quotes if the file path contains trailing spaces or hash signs.
# Within double quotes, backslashes are escape characters.
# avoidSgfPatternDirs = "path/to/directory/with/sgfs/"

# Penalize this much utility per matching move.
# Set this negative if you instead want to favor SGF patterns instead of
# penalizing them. This number does not need to be large, even 0.001 will
# make a difference. Values that are too large may lead to bad play.
# avoidSgfPatternUtility = 0.001

# Optional - load only the newest this many files
# avoidSgfPatternMaxFiles = 20

# Optional - Penalty is multiplied by this per each older SGF file, so that
# old SGF files matter less than newer ones.
# avoidSgfPatternLambda = 0.90

# Optional - pay attention only to moves made by players with this name.
# For example, set it to the name that your bot's past games will show up
# as in the SGF, so that the bot will only avoid repeating moves that itself
# made in past games, not the moves that its opponents made.
# avoidSgfPatternAllowedNames = my-ogs-bot-name1,my-ogs-bot-name2

# Optional - Ignore moves in SGF files that occurred before this turn number.
# avoidSgfPatternMinTurnNumber = 0

# For more avoid patterns:
# You can also specify a second set of parameters, and a third, fourth,
# etc. by numbering 2,3,4,...
#
# avoidSgf2PatternDirs = ...
# avoidSgf2PatternUtility = ...
# avoidSgf2PatternMaxFiles = ...
# avoidSgf2PatternLambda = ...
# avoidSgf2PatternAllowedNames = ...
# avoidSgf2PatternMinTurnNumber = ...

KataGo creates this LOG file:

Code: Select all

2023-06-09 07:28:08+0200: Running with following config:
allowResignation = true
friendlyPassOk = true
hasButton = false
koRule = SITUATIONAL
lagBuffer = 1.0
logAllGTPCommunication = true
logDir = gtp_logs
logSearchInfo = true
logToStderr = false
multiStoneSuicideLegal = true
nnCacheSizePowerOfTwo = 20
nnMutexPoolSizePowerOfTwo = 16
numNNServerThreadsPerModel = 1
numSearchThreads = 6
openclDeviceToUseThread0 = 1
ponderingEnabled = true
resignConsecTurns = 3
resignThreshold = -0.90
scoringRule = AREA
searchFactorAfterOnePass = 0.50
searchFactorAfterTwoPass = 0.25
searchFactorWhenWinning = 0.40
searchFactorWhenWinningThreshold = 0.95
taxRule = NONE
whiteHandicapBonus = 0

2023-06-09 07:28:08+0200: Loading model and initializing benchmark...
2023-06-09 07:28:08+0200: nnRandSeed0 = 4385763048445920344
2023-06-09 07:28:08+0200: After dedups: nnModelFile0 = b18.bin.gz useFP16 auto useNHWC auto
2023-06-09 07:28:08+0200: Initializing neural net buffer to be size 19 * 19 exactly
2023-06-09 07:28:08+0200: Found OpenCL Platform 0: AMD Accelerated Parallel Processing (Advanced Micro Devices, Inc.) (OpenCL 2.1 AMD-APP (3516.0))
2023-06-09 07:28:08+0200: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator
2023-06-09 07:28:08+0200: Found OpenCL Platform 1: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 07:28:08+0200: Found 1 device(s) on platform 1 with type CPU or GPU or Accelerator
2023-06-09 07:28:08+0200: Found OpenCL Device 0: gfx1036 (Advanced Micro Devices, Inc.) (score 11000200)
2023-06-09 07:28:08+0200: Found OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) (score 11000300)
2023-06-09 07:28:08+0200: Creating context for OpenCL Platform: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 07:28:09+0200: Using OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) OpenCL 3.0 CUDA (Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32)
2023-06-09 07:28:09+0200: Loaded tuning parameters from: C:\katago/KataGoData/opencltuning/tune11_gpuNVIDIAGeForceRTX4070_x19_y19_c384_mv11.txt
2023-06-09 07:28:09+0200: OpenCL backend thread 0: Device 1 Model version 11
2023-06-09 07:28:09+0200: OpenCL backend thread 0: Device 1 Model name: kata1-b18c384nbt-s6386600960-d3368371862
2023-06-09 07:28:09+0200: OpenCL backend thread 0: Device 1 FP16Storage true FP16Compute false FP16TensorCores true FP16TensorCoresFor1x1 false
2023-06-09 07:36:57+0200: GPU 1 finishing, processed 456614 rows 74761 batches
2023-06-09 07:36:57+0200: nnRandSeed0 = 495893077473133403
2023-06-09 07:36:57+0200: After dedups: nnModelFile0 = b18.bin.gz useFP16 auto useNHWC auto
2023-06-09 07:36:57+0200: Initializing neural net buffer to be size 19 * 19 exactly
2023-06-09 07:36:58+0200: Found OpenCL Platform 0: AMD Accelerated Parallel Processing (Advanced Micro Devices, Inc.) (OpenCL 2.1 AMD-APP (3516.0))
2023-06-09 07:36:58+0200: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator
2023-06-09 07:36:58+0200: Found OpenCL Platform 1: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 07:36:58+0200: Found 1 device(s) on platform 1 with type CPU or GPU or Accelerator
2023-06-09 07:36:58+0200: Found OpenCL Device 0: gfx1036 (Advanced Micro Devices, Inc.) (score 11000200)
2023-06-09 07:36:58+0200: Found OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) (score 11000300)
2023-06-09 07:36:58+0200: Creating context for OpenCL Platform: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 07:36:58+0200: Using OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) OpenCL 3.0 CUDA (Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32)
2023-06-09 07:36:58+0200: Loaded tuning parameters from: C:\katago/KataGoData/opencltuning/tune11_gpuNVIDIAGeForceRTX4070_x19_y19_c384_mv11.txt
2023-06-09 07:36:58+0200: OpenCL backend thread 0: Device 1 Model version 11
2023-06-09 07:36:58+0200: OpenCL backend thread 0: Device 1 Model name: kata1-b18c384nbt-s6386600960-d3368371862
2023-06-09 07:36:59+0200: OpenCL backend thread 0: Device 1 FP16Storage true FP16Compute false FP16TensorCores true FP16TensorCoresFor1x1 false
2023-06-09 07:39:46+0200: GPU 1 finishing, processed 191655 rows 7196 batches

The command line LOG is:

Code: Select all

C:\katago>katago.exe genconfig -model b18.bin.gz -output gtp_custom.cfg

=========================================================================
RULES

What rules should KataGo use by default for play and analysis?
(chinese, japanese, korean, tromp-taylor, aga, chinese-ogs, new-zealand, bga, stone-scoring, aga-button):
new-zealand

=========================================================================
SEARCH LIMITS

When playing games, KataGo will always obey the time controls given by the GUI/tournament/match/online server.
But you can specify an additional limit to make KataGo move much faster. This does NOT affect analysis/review,
only affects playing games. Add a limit? (y/n) (default n):
n

NOTE: No limits configured for KataGo. KataGo will obey time controls provided by the GUI or server or match script
but if they don't specify any, when playing games KataGo may think forever without moving. (press enter to continue)


When playing games, KataGo can optionally ponder during the opponent's turn. This gives faster/stronger play
in real games but should NOT be enabled if you are running tests with fixed limits (pondering may exceed those
limits), or to avoid stealing the opponent's compute time when testing two bots on the same machine.
Enable pondering? (y/n, default n):y

Specify max num seconds KataGo should ponder during the opponent's turn. Leave blank for no limit:


=========================================================================
GPUS AND RAM

Finding available GPU-like devices...
Found OpenCL Device 0: gfx1036 (Advanced Micro Devices, Inc.) (score 11000200)
Found OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) (score 11000300)

Specify devices/GPUs to use (for example "0,1,2" to use devices 0, 1, and 2). Leave blank for a default SINGLE-GPU config:
1

By default, KataGo will cache up to about 3GB of positions in memory (RAM), in addition to
whatever the current search is using. Specify a different max in GB or leave blank for default:


=========================================================================
PERFORMANCE TUNING

Specify number of visits to use test/tune performance with, leave blank for default based on GPU speed.
Use large number for more accurate results, small if your GPU is old and this is taking forever:
10000

Specify number of seconds/move to optimize performance for (default 5), leave blank for default:

2023-06-09 07:28:08+0200: Running with following config:
allowResignation = true
friendlyPassOk = true
hasButton = false
koRule = SITUATIONAL
lagBuffer = 1.0
logAllGTPCommunication = true
logDir = gtp_logs
logSearchInfo = true
logToStderr = false
multiStoneSuicideLegal = true
nnCacheSizePowerOfTwo = 20
nnMutexPoolSizePowerOfTwo = 16
numNNServerThreadsPerModel = 1
numSearchThreads = 6
openclDeviceToUseThread0 = 1
ponderingEnabled = true
resignConsecTurns = 3
resignThreshold = -0.90
scoringRule = AREA
searchFactorAfterOnePass = 0.50
searchFactorAfterTwoPass = 0.25
searchFactorWhenWinning = 0.40
searchFactorWhenWinningThreshold = 0.95
taxRule = NONE
whiteHandicapBonus = 0

2023-06-09 07:28:08+0200: Loading model and initializing benchmark...

2023-06-09 07:28:08+0200: nnRandSeed0 = 4385763048445920344
2023-06-09 07:28:08+0200: After dedups: nnModelFile0 = b18.bin.gz useFP16 auto useNHWC auto
2023-06-09 07:28:08+0200: Initializing neural net buffer to be size 19 * 19 exactly
2023-06-09 07:28:08+0200: Found OpenCL Platform 0: AMD Accelerated Parallel Processing (Advanced Micro Devices, Inc.) (OpenCL 2.1 AMD-APP (3516.0))
2023-06-09 07:28:08+0200: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator
2023-06-09 07:28:08+0200: Found OpenCL Platform 1: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 07:28:08+0200: Found 1 device(s) on platform 1 with type CPU or GPU or Accelerator
2023-06-09 07:28:08+0200: Found OpenCL Device 0: gfx1036 (Advanced Micro Devices, Inc.) (score 11000200)
2023-06-09 07:28:08+0200: Found OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) (score 11000300)
2023-06-09 07:28:08+0200: Creating context for OpenCL Platform: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 07:28:09+0200: Using OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) OpenCL 3.0 CUDA (Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32)
2023-06-09 07:28:09+0200: Loaded tuning parameters from: C:\katago/KataGoData/opencltuning/tune11_gpuNVIDIAGeForceRTX4070_x19_y19_c384_mv11.txt
2023-06-09 07:28:09+0200: OpenCL backend thread 0: Device 1 Model version 11
2023-06-09 07:28:09+0200: OpenCL backend thread 0: Device 1 Model name: kata1-b18c384nbt-s6386600960-d3368371862
2023-06-09 07:28:09+0200: OpenCL backend thread 0: Device 1 FP16Storage true FP16Compute false FP16TensorCores true FP16TensorCoresFor1x1 false

=========================================================================
TUNING NOW
Tuning using 10000 visits.
Automatically trying different numbers of threads to home in on the best (board size 19x19):


Possible numbers of threads to test: 1, 2, 3, 4, 5, 6, 8, 10, 12, 16, 20, 24, 32,

numSearchThreads =  5: 10 / 10 positions, visits/s = 839.35 nnEvals/s = 563.10 nnBatches/s = 225.34 avgBatchSize = 2.50 (119.2 secs)
numSearchThreads = 12: 10 / 10 positions, visits/s = 1330.61 nnEvals/s = 877.08 nnBatches/s = 146.43 avgBatchSize = 5.99 (75.2 secs)
numSearchThreads = 10: 10 / 10 positions, visits/s = 1251.00 nnEvals/s = 797.66 nnBatches/s = 159.74 avgBatchSize = 4.99 (80.0 secs)
numSearchThreads = 20: 10 / 10 positions, visits/s = 1582.83 nnEvals/s = 1014.95 nnBatches/s = 101.76 avgBatchSize = 9.97 (63.3 secs)
numSearchThreads = 16: 10 / 10 positions, visits/s = 1458.56 nnEvals/s = 947.94 nnBatches/s = 118.78 avgBatchSize = 7.98 (68.7 secs)
numSearchThreads = 24: 10 / 10 positions, visits/s = 1662.11 nnEvals/s = 1076.06 nnBatches/s = 89.90 avgBatchSize = 11.97 (60.3 secs)
numSearchThreads = 32: 10 / 10 positions, visits/s = 1683.90 nnEvals/s = 1099.53 nnBatches/s = 68.60 avgBatchSize = 16.03 (59.6 secs)


Optimal number of threads is fairly high, increasing the search limit and trying again.

2023-06-09 07:36:57+0200: GPU 1 finishing, processed 456614 rows 74761 batches
2023-06-09 07:36:57+0200: nnRandSeed0 = 495893077473133403
2023-06-09 07:36:57+0200: After dedups: nnModelFile0 = b18.bin.gz useFP16 auto useNHWC auto
2023-06-09 07:36:57+0200: Initializing neural net buffer to be size 19 * 19 exactly
2023-06-09 07:36:58+0200: Found OpenCL Platform 0: AMD Accelerated Parallel Processing (Advanced Micro Devices, Inc.) (OpenCL 2.1 AMD-APP (3516.0))
2023-06-09 07:36:58+0200: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator
2023-06-09 07:36:58+0200: Found OpenCL Platform 1: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 07:36:58+0200: Found 1 device(s) on platform 1 with type CPU or GPU or Accelerator
2023-06-09 07:36:58+0200: Found OpenCL Device 0: gfx1036 (Advanced Micro Devices, Inc.) (score 11000200)
2023-06-09 07:36:58+0200: Found OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) (score 11000300)
2023-06-09 07:36:58+0200: Creating context for OpenCL Platform: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 07:36:58+0200: Using OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) OpenCL 3.0 CUDA (Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32)
2023-06-09 07:36:58+0200: Loaded tuning parameters from: C:katago/KataGoData/opencltuning/tune11_gpuNVIDIAGeForceRTX4070_x19_y19_c384_mv11.txt
2023-06-09 07:36:58+0200: OpenCL backend thread 0: Device 1 Model version 11
2023-06-09 07:36:58+0200: OpenCL backend thread 0: Device 1 Model name: kata1-b18c384nbt-s6386600960-d3368371862
2023-06-09 07:36:59+0200: OpenCL backend thread 0: Device 1 FP16Storage true FP16Compute false FP16TensorCores true FP16TensorCoresFor1x1 false


Possible numbers of threads to test: 16, 20, 24, 32, 40, 48, 64, 80, 96,

numSearchThreads = 64: 10 / 10 positions, visits/s = 1824.11 nnEvals/s = 1165.36 nnBatches/s = 29.23 avgBatchSize = 39.87 (55.2 secs)
numSearchThreads = 40: 10 / 10 positions, visits/s = 1808.78 nnEvals/s = 1128.90 nnBatches/s = 55.08 avgBatchSize = 20.50 (55.5 secs)
numSearchThreads = 48: 10 / 10 positions, visits/s = 1784.88 nnEvals/s = 1149.96 nnBatches/s = 44.81 avgBatchSize = 25.66 (56.3 secs)


Ordered summary of results:

numSearchThreads =  5: 10 / 10 positions, visits/s = 839.35 nnEvals/s = 563.10 nnBatches/s = 225.34 avgBatchSize = 2.50 (119.2 secs) (EloDiff baseline)
numSearchThreads = 10: 10 / 10 positions, visits/s = 1251.00 nnEvals/s = 797.66 nnBatches/s = 159.74 avgBatchSize = 4.99 (80.0 secs) (EloDiff +137)
numSearchThreads = 12: 10 / 10 positions, visits/s = 1330.61 nnEvals/s = 877.08 nnBatches/s = 146.43 avgBatchSize = 5.99 (75.2 secs) (EloDiff +157)
numSearchThreads = 16: 10 / 10 positions, visits/s = 1458.56 nnEvals/s = 947.94 nnBatches/s = 118.78 avgBatchSize = 7.98 (68.7 secs) (EloDiff +184)
numSearchThreads = 20: 10 / 10 positions, visits/s = 1582.83 nnEvals/s = 1014.95 nnBatches/s = 101.76 avgBatchSize = 9.97 (63.3 secs) (EloDiff +209)
numSearchThreads = 24: 10 / 10 positions, visits/s = 1662.11 nnEvals/s = 1076.06 nnBatches/s = 89.90 avgBatchSize = 11.97 (60.3 secs) (EloDiff +221)
numSearchThreads = 32: 10 / 10 positions, visits/s = 1683.90 nnEvals/s = 1099.53 nnBatches/s = 68.60 avgBatchSize = 16.03 (59.6 secs) (EloDiff +214)
numSearchThreads = 40: 10 / 10 positions, visits/s = 1808.78 nnEvals/s = 1128.90 nnBatches/s = 55.08 avgBatchSize = 20.50 (55.5 secs) (EloDiff +230)
numSearchThreads = 48: 10 / 10 positions, visits/s = 1784.88 nnEvals/s = 1149.96 nnBatches/s = 44.81 avgBatchSize = 25.66 (56.3 secs) (EloDiff +213)
numSearchThreads = 64: 10 / 10 positions, visits/s = 1824.11 nnEvals/s = 1165.36 nnBatches/s = 29.23 avgBatchSize = 39.87 (55.2 secs) (EloDiff +198)


Based on some test data, each speed doubling gains perhaps ~250 Elo by searching deeper.
Based on some test data, each thread costs perhaps 7 Elo if using 800 visits, and 2 Elo if using 5000 visits (by making MCTS worse).
So APPROXIMATELY based on this benchmark, if you intend to do a 5 second search:
numSearchThreads =  5: (baseline)
numSearchThreads = 10:  +137 Elo
numSearchThreads = 12:  +157 Elo
numSearchThreads = 16:  +184 Elo
numSearchThreads = 20:  +209 Elo
numSearchThreads = 24:  +221 Elo
numSearchThreads = 32:  +214 Elo
numSearchThreads = 40:  +230 Elo (recommended)
numSearchThreads = 48:  +213 Elo
numSearchThreads = 64:  +198 Elo

Using 40 numSearchThreads!
2023-06-09 07:39:46+0200: GPU 1 finishing, processed 191655 rows 7196 batches

=========================================================================
DONE

Writing new config file to gtp_custom.cfg
You should be now able to run KataGo with this config via something like:
katago.exe gtp -model 'b18.bin.gz' -config 'gtp_custom.cfg'

Feel free to look at and edit the above config file further by hand in a txt editor.
For more detailed notes about performance and what options in the config do, see:
https://github.com/lightvector/KataGo/blob/master/cpp/configs/gtp_example.cfg

In KaTrain general settings, I set Override with my used model path and name:

C:\katago\katago.exe gtp -model 'C:\katago\b18.bin.gz' -config 'C:\katago\gtp_custom.cfg'

KataGo Engine Failed: exception: Could not open file 'C:\katago\gtp_custom.cfg' - does not exist or invalid permissions
KATAGO-INTERNAL-ERROR

The permissions are the same as in C:\baduk\katrain. The three files exist in C:\katago. Is the command syntax correct?

If yes, KaTrain might want all files in the same directory. Therefore, my next attempt has been to merge all supposedly necessary files into the same directory C:\baduk\test.

Now, in command line I run:

C:\baduk\test>katago gtp

Now, in command line I run:

C:\baduk\test>katago benchmark

Code: Select all

2023-06-09 09:41:15+0200: Running with following config:
allowResignation = true
friendlyPassOk = true
hasButton = false
koRule = SITUATIONAL
lagBuffer = 1.0
logAllGTPCommunication = true
logDir = gtp_logs
logSearchInfo = true
logToStderr = false
multiStoneSuicideLegal = true
nnCacheSizePowerOfTwo = 20
nnMutexPoolSizePowerOfTwo = 16
numNNServerThreadsPerModel = 1
numSearchThreads = 40
openclDeviceToUseThread0 = 1
ponderingEnabled = true
resignConsecTurns = 3
resignThreshold = -0.90
scoringRule = AREA
searchFactorAfterOnePass = 0.50
searchFactorAfterTwoPass = 0.25
searchFactorWhenWinning = 0.40
searchFactorWhenWinningThreshold = 0.95
taxRule = NONE
whiteHandicapBonus = 0

2023-06-09 09:41:15+0200: Loading model and initializing benchmark...
2023-06-09 09:41:15+0200: Testing with default positions for board size: 19
2023-06-09 09:41:15+0200: nnRandSeed0 = 17739763996423611530
2023-06-09 09:41:15+0200: After dedups: nnModelFile0 = C:\baduk\test/default_model.bin.gz useFP16 auto useNHWC auto
2023-06-09 09:41:15+0200: Initializing neural net buffer to be size 19 * 19 exactly
2023-06-09 09:41:15+0200: Found OpenCL Platform 0: AMD Accelerated Parallel Processing (Advanced Micro Devices, Inc.) (OpenCL 2.1 AMD-APP (3516.0))
2023-06-09 09:41:15+0200: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator
2023-06-09 09:41:15+0200: Found OpenCL Platform 1: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 09:41:15+0200: Found 1 device(s) on platform 1 with type CPU or GPU or Accelerator
2023-06-09 09:41:15+0200: Found OpenCL Device 0: gfx1036 (Advanced Micro Devices, Inc.) (score 11000200)
2023-06-09 09:41:15+0200: Found OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) (score 11000300)
2023-06-09 09:41:15+0200: Creating context for OpenCL Platform: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 09:41:15+0200: Using OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) OpenCL 3.0 CUDA (Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32)
2023-06-09 09:41:15+0200: Loaded tuning parameters from: C:\baduk\test/KataGoData/opencltuning/tune11_gpuNVIDIAGeForceRTX4070_x19_y19_c384_mv11.txt
2023-06-09 09:41:16+0200: OpenCL backend thread 0: Device 1 Model version 11
2023-06-09 09:41:16+0200: OpenCL backend thread 0: Device 1 Model name: kata1-b18c384nbt-s6386600960-d3368371862
2023-06-09 09:41:16+0200: OpenCL backend thread 0: Device 1 FP16Storage true FP16Compute false FP16TensorCores true FP16TensorCoresFor1x1 false

2023-06-09 09:41:16+0200: Loaded config C:\baduk\test/default_gtp.cfg
2023-06-09 09:41:16+0200: Loaded model C:\baduk\test/default_model.bin.gz

Testing using 800 visits.
  If you have a good GPU, you might increase this using "-visits N" to get more accurate results.
  If you have a weak GPU and this is taking forever, you can decrease it instead to finish the benchmark faster.

You are currently using the OpenCL version of KataGo.
If you have a strong GPU capable of FP16 tensor cores (e.g. RTX2080), using the Cuda version of KataGo instead may give a mild performance boost.

Your GTP config is currently set to use numSearchThreads = 40
Automatically trying different numbers of threads to home in on the best (board size 19x19):

2023-06-09 09:41:16+0200: GPU 1 finishing, processed 5 rows 5 batches
2023-06-09 09:41:16+0200: nnRandSeed0 = 1537048183467396486
2023-06-09 09:41:16+0200: After dedups: nnModelFile0 = C:\baduk\test/default_model.bin.gz useFP16 auto useNHWC auto
2023-06-09 09:41:16+0200: Initializing neural net buffer to be size 19 * 19 exactly
2023-06-09 09:41:17+0200: Found OpenCL Platform 0: AMD Accelerated Parallel Processing (Advanced Micro Devices, Inc.) (OpenCL 2.1 AMD-APP (3516.0))
2023-06-09 09:41:17+0200: Found 1 device(s) on platform 0 with type CPU or GPU or Accelerator
2023-06-09 09:41:17+0200: Found OpenCL Platform 1: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 09:41:17+0200: Found 1 device(s) on platform 1 with type CPU or GPU or Accelerator
2023-06-09 09:41:17+0200: Found OpenCL Device 0: gfx1036 (Advanced Micro Devices, Inc.) (score 11000200)
2023-06-09 09:41:17+0200: Found OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) (score 11000300)
2023-06-09 09:41:17+0200: Creating context for OpenCL Platform: NVIDIA CUDA (NVIDIA Corporation) (OpenCL 3.0 CUDA 12.1.107)
2023-06-09 09:41:17+0200: Using OpenCL Device 1: NVIDIA GeForce RTX 4070 (NVIDIA Corporation) OpenCL 3.0 CUDA (Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32)
2023-06-09 09:41:17+0200: Loaded tuning parameters from: C:\baduk\test/KataGoData/opencltuning/tune11_gpuNVIDIAGeForceRTX4070_x19_y19_c384_mv11.txt
2023-06-09 09:41:17+0200: OpenCL backend thread 0: Device 1 Model version 11
2023-06-09 09:41:17+0200: OpenCL backend thread 0: Device 1 Model name: kata1-b18c384nbt-s6386600960-d3368371862
2023-06-09 09:41:18+0200: OpenCL backend thread 0: Device 1 FP16Storage true FP16Compute false FP16TensorCores true FP16TensorCoresFor1x1 false


Possible numbers of threads to test: 1, 2, 3, 4, 5, 6, 8, 10, 12, 16, 20, 24, 32,

numSearchThreads =  5: 10 / 10 positions, visits/s = 674.36 nnEvals/s = 567.57 nnBatches/s = 228.00 avgBatchSize = 2.49 (11.9 secs)
numSearchThreads = 12: 10 / 10 positions, visits/s = 1076.47 nnEvals/s = 878.78 nnBatches/s = 148.70 avgBatchSize = 5.91 (7.5 secs)
numSearchThreads = 10: 10 / 10 positions, visits/s = 977.17 nnEvals/s = 811.45 nnBatches/s = 163.91 avgBatchSize = 4.95 (8.3 secs)
numSearchThreads = 20: 10 / 10 positions, visits/s = 1171.74 nnEvals/s = 1005.02 nnBatches/s = 102.87 avgBatchSize = 9.77 (7.0 secs)
numSearchThreads = 16: 10 / 10 positions, visits/s = 1131.03 nnEvals/s = 946.03 nnBatches/s = 120.74 avgBatchSize = 7.84 (7.2 secs)
numSearchThreads = 24: 10 / 10 positions, visits/s = 1193.19 nnEvals/s = 1045.82 nnBatches/s = 89.61 avgBatchSize = 11.67 (6.9 secs)
numSearchThreads = 32: 10 / 10 positions, visits/s = 1227.85 nnEvals/s = 1097.93 nnBatches/s = 71.11 avgBatchSize = 15.44 (6.7 secs)


Ordered summary of results:

numSearchThreads =  5: 10 / 10 positions, visits/s = 674.36 nnEvals/s = 567.57 nnBatches/s = 228.00 avgBatchSize = 2.49 (11.9 secs) (EloDiff baseline)
numSearchThreads = 10: 10 / 10 positions, visits/s = 977.17 nnEvals/s = 811.45 nnBatches/s = 163.91 avgBatchSize = 4.95 (8.3 secs) (EloDiff +125)
numSearchThreads = 12: 10 / 10 positions, visits/s = 1076.47 nnEvals/s = 878.78 nnBatches/s = 148.70 avgBatchSize = 5.91 (7.5 secs) (EloDiff +158)
numSearchThreads = 16: 10 / 10 positions, visits/s = 1131.03 nnEvals/s = 946.03 nnBatches/s = 120.74 avgBatchSize = 7.84 (7.2 secs) (EloDiff +168)
numSearchThreads = 20: 10 / 10 positions, visits/s = 1171.74 nnEvals/s = 1005.02 nnBatches/s = 102.87 avgBatchSize = 9.77 (7.0 secs) (EloDiff +173)
numSearchThreads = 24: 10 / 10 positions, visits/s = 1193.19 nnEvals/s = 1045.82 nnBatches/s = 89.61 avgBatchSize = 11.67 (6.9 secs) (EloDiff +172)
numSearchThreads = 32: 10 / 10 positions, visits/s = 1227.85 nnEvals/s = 1097.93 nnBatches/s = 71.11 avgBatchSize = 15.44 (6.7 secs) (EloDiff +167)


Based on some test data, each speed doubling gains perhaps ~250 Elo by searching deeper.
Based on some test data, each thread costs perhaps 7 Elo if using 800 visits, and 2 Elo if using 5000 visits (by making MCTS worse).
So APPROXIMATELY based on this benchmark, if you intend to do a 5 second search:
numSearchThreads =  5: (baseline)
numSearchThreads = 10:  +125 Elo
numSearchThreads = 12:  +158 Elo
numSearchThreads = 16:  +168 Elo
numSearchThreads = 20:  +173 Elo (recommended)
numSearchThreads = 24:  +172 Elo
numSearchThreads = 32:  +167 Elo

If you care about performance, you may want to edit numSearchThreads in C:\baduk\test/default_gtp.cfg based on the above results!
If you intend to do much longer searches, configure the seconds per game move you expect with the '-time' flag and benchmark again.
If you intend to do short or fixed-visit searches, use lower numSearchThreads for better strength, high threads will weaken strength.
If interested see also other notes about performance and mem usage in the top of C:\baduk\test/default_gtp.cfg

2023-06-09 09:42:15+0200: GPU 1 finishing, processed 48514 rows 7881 batches

So the KataGo OpenCL version in C:\baduk\test\katago.exe does run on the command line. However... now, in KaTrain I use

C:\baduk\test\katago.exe gtp

The following processes with their options are running:

Start KaTrain
ERROR: Unexpected exception Expecting value: line 1 column 1 (char 0) while processing KataGo output b'? unknown command'
Komi: 6.5
Rules: Japanese

I set Black Human - White AI then click a black move.
ERROR: <remains as before>
Analyzing move...

The dGPU has 0% load now. In KaTrain general settings Override, what is the correct command for running a KataGo file, net and CFG that are not already installed in the Baduk AI Megapack directory and its subdirectories?

Posted: **Fri Jun 09, 2023 6:11 am**

RobertJasiek wrote:In KaTrain general settings, I set Override with my used model path and name:

C:\katago\katago.exe gtp -model 'C:\katago\b18.bin.gz' -config 'C:\katago\gtp_custom.cfg'

KataGo Engine Failed: exception: Could not open file 'C:\katago\gtp_custom.cfg' - does not exist or invalid permissions
KATAGO-INTERNAL-ERROR

The permissions are the same as in C:\baduk\katrain. The three files exist in C:\katago. Is the command syntax correct?

Why didn't you use ...

C:\katago\katago.exe gtp -model C:\katago\b18.bin.gz -config C:\katago\gtp_custom.cfg

???

The use of inverted commas is superfluous, as the directory / file name does NOT contain spaces.
If ever, you have to use the following syntax, as far as I know:

C:\katago\katago.exe gtp -model "C:\katago\b18.bin.gz" -config "C:\katago\gtp_custom.cfg"

Posted: **Fri Jun 09, 2023 6:21 am**

Thank you, I will try your syntax later today!

As to why: I could not find anything in KaTrain manuals yet but only some sample syntax in KataGo manuals with inverted commas. Therefore, I had to test various syntaxes and there are many possible combinations how syntaxes can look. Apparently, I must have missed to test the one you just suggest.

Posted: **Fri Jun 09, 2023 11:47 am**

I have made the following failing attempts to submit a working command to KaTrain. What is the correct syntax? What are my mistakes? What are KaTrain's or KataGo's bugs?

ATTEMPT 1

This directory has just Katago 1_13_0 OpenCL.

C:\katago\katago.exe gtp -model C:\katago\b18.bin.gz -config C:\katago\gtp_custom.cfg

Processes:

KaTrain: ERROR line 1 column 1 (char 0). When trying to play: GPU load 0%.

ATTEMPT 2

This is my test directory with Katago 1_13_0 OpenCL and all files merged.

C:\baduk\test\katago.exe gtp -model C:\baduk\test\b18.bin.gz -config C:\baduk\test\gtp_custom.cfg

Click on Update Settings: "KaTrain v.1.12.3 (Keine Rückmeldung)" meaning "KaTrain v.1.12.3 (no reply)" with the process KaTrain <0.01 CPU load

Restart KaTrain, General & Engine Settings, press ESC, then these processes are running:

KaTrain: ERROR line 1 column 1 (char 0). When trying to play: GPU load 0%.

ATTEMPT 3

From now on, I test KaTrain's Override for Baduk AI Megapack's lizzie directory, whose files work unless called by the Override command.

C:\baduk\lizzie\katago.exe gtp -model C:\baduk\lizzie\KataGo40b.gz

Processes:

KaTrain: ERROR line 1 column 1 (char 0). When trying to play: GPU load 0%.

ATTEMPT 4

C:\baduk\lizzie\katago.exe gtp -model C:\baduk\lizzie\KataGo40b.gz -config C:\baduk\lizzie\analysis_config.cfg

Click on Update Settings: "KaTrain v.1.12.3 (no reply)" with the process KaTrain <0.01 CPU load

Restart KaTrain: ERROR KataGo Engine Failed: exception: Could not find key 'logAllGTPCommunication' in config file C:\baduk\lizzie\analysis_config.cfg
KATAGO-INTERNAL-ERROR

Press ESC

ATTEMPT 5

C:\baduk\lizzie\katago.exe -model C:\baduk\lizzie\KataGo40b.gz -config C:\baduk\lizzie\analysis_config.cfg

Click on Update Settings, Restart KaTrain, ERROR line 1 column 1 (char 0).

Trying to play: The ERROR vanishes. Analyzing move... appears. GPU load 0%. Katago.exe is the only process.

ATTEMPT 6

If you wonder why I try (partial) Linux slashes from now on: KaTrain's settings write
Path to KataGo model file = C:/baduk/lizzie/KataGo40b.gz

C:\baduk\lizzie\katago.exe -model C:/baduk/lizzie/KataGo40b.gz -config C:\baduk\lizzie\analysis_config.cfg

Click on Update Settings, Start new game, ERROR vanishes, trying to play: Analyzing move... appears. GPU load 0%. Katago.exe is the only process.

ATTEMPT 7

C:\baduk\lizzie\katago.exe -model C:/baduk/lizzie/KataGo40b.gz -config C:/baduk/lizzie/analysis_config.cfg

Click on Update Settings, ERROR line 1 column 1 (char 0), Start new game, ERROR vanishes, Analyzing move... appears. GPU load 0%. Katago.exe is the only process.

ATTEMPT 8

C:/baduk/lizzie/katago.exe -model C:/baduk/lizzie/KataGo40b.gz -config C:/baduk/lizzie/analysis_config.cfg

Click on Update Settings, Start new game, ERROR vanishes, trying to play: Analyzing move... appears. GPU load 0%. Katago.exe is the only process.

Posted: **Fri Jun 09, 2023 3:47 pm**

KaTrain

I have made two more command line tests in KaTrain and both have failed.

KaTrain must be buggy!

Attempt 9

"C:\katago\katago.exe" gtp -model "C:\katago\b18.bin.gz" -config "C:\katago\gtp_custom.cfg"

Attempt 10

"C:\baduk\test\katago.exe gtp" -model "C:\baduk\test\b18.bin.gz" -config "C:\baduk\test\gtp_custom.cfg"

Lizzie

Next, I have tried Lizzie and got it to work within one minute with the following command line in the Lizzie Engine settings:

C:\katago\katago.exe gtp -model C:\katago\b18.bin.gz -config C:\katago\gtp_custom.cfg

Playing works. GPU load 96%. Processes:

Posted: **Fri Jun 09, 2023 7:00 pm**

Robert,

Thanks for your detailed posts on these topics. My small comment is that Lizzie
may also be buggy, at least on my system. If you have time, please run the following
test.

1. Load a game into Lizzie for analysis by KataGo.
2. Select a small number of visits, for example 50, by typing a50.
3. Let the analysis run to the end.

On my system, the resulting evaluation graph displays a downward red line marking
every move as a mistake by Black, but no upward red lines for White. I suspect
this is a bug in Lizzie not KataGo, but I don't know how to prove it.

As the number of visits increases, the discrepancy slowly disappears.
It is still noticeable with 1000 visits but not at 7000 visits per move.
I normally use 7000.

Posted: **Sat Jun 10, 2023 2:37 am**

It works for me. If I wait long enough the message "GTP ready, beginning main protocol loop" is printed and I can type GTP commands and the response is quick. I'm sure I can set the shell to echo what I type to make it more convenient.

However, you had this error message "ERROR: Unexpected exception Expecting value: line 1 column 1 (char 0) while processing KataGo output b'? unknown command'" which would indicate that you sent an unknown command. Maybe, at least in that one case you typed something that was not a command? There are other possibilities like for example character set mismatch but it seems less likely, but not impossible

You can at least test if it works by typing something like this on the command line. It is more reliable than waiting for the prompt and typing when it presents itself.

Code: Select all

echo name | .\katago-v1.13.1-trt8.5-cuda11.2-windows-x64\katago.exe gtp -model .\models\b18c384nbt-optimisticv13-s5971M.bin.gz -config .\kataecho name | .\katago-v1.13.1-trt8.5-cuda11.2-windows-x64\katago.exe gtp -model .\models\b18c384nbt-optimisticv13-s5971M.bin.gz -config .\katago-v1.13.1-trt8.5-cuda11.2-windows-x64\analysis_config.cfg

Which should eventually output as follows (not so long anymore, now that I have 13.1

).

Code: Select all

KataGo v1.13.1
Using Japanese rules initially, unless GTP/GUI overrides this
Initializing board with boardXSize 19 boardYSize 19
Loaded config .\katago-v1.13.1-trt8.5-cuda11.2-windows-x64\analysis_config.cfg
Loaded model .\models\b18c384nbt-optimisticv13-s5971M.bin.gz
Model name: kata1-b18c384nbt-softplusfixv13-s5971481344-d3261785976
GTP ready, beginning main protocol loop
= KataGo

If you get parsing errors then you need to make sure that your config file is correct (does it work with other commands) and that all paths are correct on the command line. If you get "command unknown" then KataGo and GTP are working but the command as transmitted was not found.

If you can use this "echo <gtp-command> | katago.exe gtp ..." pattern then I think that can tell you more about what is going on.

Posted: **Sat Jun 10, 2023 4:06 am**

kvasir, exactly what works for you? At exactly which procedural step do you have to wait long enough?

You seem to be using KataGo 1-13-1 TrensorRT while I have tried KataGo 1-13-0 OpenCl with KaTrain.

I may try the echo chamber if I understand its syntax correctly.

Posted: **Sat Jun 10, 2023 5:04 am**

Now, I am confused. I thought you had a problem using GTP with KataGo, that is what I was saying works and how you can test it. If you are having a problem using GTP with Katrain then that has a reason. Katrain doesn't use GTP.

KataGo and Katrain don't communicate using GTP.

You can configure the KataGo executable used by Katrain, including by giving a path to the executable file. You can also override the whole command used to start KataGo but this can easily fail if you don't know what Katrain wants and this is also not needed to change the executable used or the model file and config file. Since Katrain and KataGo don't communicate using GTP you wouldn't override the command with something like "katago.exe gtp ..." but with something else. I think something like: katago.exe analysis -model MMMM -config CCCC -analysis-threads XXXX -override-config "homeDataDir=DDDD" might be necessary since that is what Katrain uses if you don't try to override the command yourself. Basically, overriding the command is tricky and unnecessary.

Maybe I still didn't understand what the problem was. I hope it is useful information that GTP has nothing to do with Katrain.

Posted: **Sat Jun 10, 2023 5:52 am**

That KaTrain does not communicate via GTP with KataGo is new to me.

Does this mean for the Override command in KaTrain that it a) may never contain the GTP flag and b) may never contain -config referring to a gtp-version of a CFG file?

I will try to understand your additional remarks later. Meanwhile, let me report on the echo tests below. Maybe this gives you a better idea of the problem or what is not the problem so that your suggestions on what to write in the KaTrain Override command can become more specific and then easier for me to understand.

Below, I execute some commands in the Windows command line and state the outputs.

******************************************************

C:\echo is the KataGo OpenCL folder.

Code: Select all

C:\echo>echo name | katago.exe gtp -model b18.bin.gz -config .\kataecho name | katago.exe gtp -model b18.bin.gz -config gtp_custom.cfg

Code: Select all

C:\echo>katago.exe gtp -help

Code: Select all

C:\echo>echo -h | katago.exe gtp -model b18.bin.gz -config .\kataecho -h | katago.exe gtp -model b18.bin.gz -config gtp_custom.cfg

Code: Select all

C:\echo>echo -version | katago.exe gtp -model b18.bin.gz -config .\kataecho -version | katago.exe gtp -model b18.bin.gz -config gtp_custom.cfg

Posted: **Sat Jun 10, 2023 6:20 am**

Sorry. I seem to have copied the command onto itself, probably right after I checked that it was correct. I always have trouble spotting mistakes in the edit window. I think it is that the small, old and unusual font is hard for me to read. Also the cursor in the edit window often vanishes.

It was supposed to be like shown next and it is only a way to make sure that he right GTP command is sent to the katago program.

Code: Select all

echo name | .\katago-v1.13.1-trt8.5-cuda11.2-windows-x64\katago.exe gtp -model .\models\b18c384nbt-optimisticv13-s5971M.bin.gz -config .\katago-v1.13.1-trt8.5-cuda11.2-windows-x64\analysis_config.cfg

RobertJasiek wrote:Does this mean for the Override command in KaTrain that it a) may never contain the GTP flag and b) may never contain -config referring to a gtp-version of a CFG file?

The override should be something like this if used:

Code: Select all

katago.exe analysis -model MMMM -config CCCC -analysis-threads XXXX -override-config "homeDataDir=DDDD"

a) If you override it then it has to use the "analysis" subcommand (not "gtp") and have the "-model" and "-config" flags but I don't know if "-analysis-threads" or the rest is necessary.

b) Katrain looks for some configurations in the CFG file and reports errors if they are not there, or did so in older versions. I think this is the only limitation on the CFG file but they have to be acceptable to KataGo of course.

Life In 19x19

KaTrain Questions

Re: KaTrain Questions

Re: KaTrain Questions

Re: KaTrain Questions

Re: KaTrain Questions

Re: KaTrain Questions

Re: KaTrain Questions

Re: KaTrain Questions

Re: KaTrain Questions

Re: KaTrain Questions

Re: KaTrain Questions

Re: KaTrain Questions

Re: KaTrain Questions

Re: KaTrain Questions

Re: KaTrain Questions

Re: KaTrain Questions