RTX 4070 Ryzen 7700
katago benchmark -model kata1-b18c384nbt-s6386600960-d3368371862.bin.gz
KataGo 1_13_0 OpenCL
numSearchThreads = 20
visits/s = 1184.02
nnEvals/s = 1011.44
nnBatches/s = 103.52
avgBatchSize = 9.77
6.9 secs
EloDiff +171 (recommended)
KataGo 1_13_0 CUDA (some files copied)
numSearchThreads = 40
visits/s = 450.17
nnEvals/s = 413.07
nnBatches/s = 21.03
avgBatchSize = 19.64
18.6 secs
EloDiff +464 (recommended)
KataGo 1_13_0 CUDA Megapack/Lizzie (most files copied)
numSearchThreads = 32
visits/s = 419.52
nnEvals/s = 374.08
nnBatches/s = 24.14
avgBatchSize = 15.50
19.8 secs
EloDiff +461 (recommended)
KataGo 1_13_1 TensorRT (all missing files copied from LizzieYZY)
numSearchThreads = 40
visits/s = 2879.17
nnEvals/s = 2627.19
nnBatches/s = 132.87
avgBatchSize = 19.77
2.9 secs
EloDiff +343 (recommended)
Code:
Engine OpenCL CUDA TensorRT
visits/s 1184.02 450.17 2879.17
Speed I 2.63 1 6.40
Speed II 1 0.38 2.43
1) Is visits/s a good measure of KataGo speed for a given position and model net, or what other value should I compare?
2) Why is CUDA much slower than OpenCL?
3) Does visits mean playouts?
4) How good or bad are these values compared to other desktop or laptop GPUs?
5) Do I interpret these results correctly that TensorRT gives me the fastest engine, except for any launch delays?
6) What do the other measured values tell me?