KataGo V1.3

Vargo · **#181**

50 game test : KG 1.3.4 g170e-b20c256x2-s3354994176-d716845198 v. LZ#270
twogtp 1.5.1, 3200 visits for LZ and 1600 visits for KG, all games by resignation, no error, no duplicate game
KataGo wins 31-19 = 62%

details : (KG always appears as W, because of the command -alternate)

the games : (KG is W in the even-numbered games)

Attachment:

KG134v1600_LZ270v3200.rar [44.41 KiB]
Downloaded 364 times

And · **#182**

Vargo it is possible without an “-alternate” in one bat file, first the KataGo plays white, then black, there is no confusion with color:

Code:

gogui-twogtp -black "C:\\Users\\jm\\gogui151\\LZ017\\leelaz.exe ..." -white "C:\\Users\\jm\\gogui151\\kata134\\katago.exe  ..." -games 25 -sgffile C:\\Users\\jm\\gogui151\\kata134_white_LZ270b.dat -auto -komi 7.5
gogui-twogtp -black "C:\\Users\\jm\\gogui151\\kata134\\katago.exe  ..." -white "C:\\Users\\jm\\gogui151\\LZ017\\leelaz.exe ..." -games 25 -sgffile C:\\Users\\jm\\gogui151\\kata134_black_LZ270b.dat -auto -komi 7.5

Vargo · **#183**

And wrote:

it is possible without an “-alternate” in one bat file

You're right, with -alternate, results are is a bit strange to read, but with -alternate, there's only one stat file, with all results and details on the same page, I like that. So... I don't know what's best :scratch:

Another 50 game test :
KG 1.3.4 g170-b20-s335-d716 (1600 visits)
v.
LZ #270 (6400 visits)

LZ wins 29-21 (58%)
twogtp 1.5.1, all games by resignation, no error, no duplicate game

Maybe I'm wrong, but winning 84% at time parity could mean KG_1600visits is 1+ rank stronger than LZ270_1600visits, winning 62% against LZ at 3200 visits seems consistent with 1+ rank, and winning 42% against LZ at 6400 visits seems consistent too (?) but these tests are only 50 games each...

details :

Attachment:

KG134_1600v_LZ270_6400v.rar [46.72 KiB]
Downloaded 341 times

lightvector · **#184**

Just a reminder again - it's generally good to report how many threads were used (and also of course any major options changed from defaults if any).

Setting the number of visits alone is insufficient to establish a fixed reproducible level of strength. The number of threads used to search that number of visits also influences the strength, with more threads slightly decreasing strength.

Vargo · **#185**

Config file generated by "genconfig", only change is : maxVisits= 1600
gpu : 1x GTX1080, numSearchThreads = 16

lightvector · **#186**

Cool yep, just saying that going forward when you post big headline test results, you might also want to make it a habit to report things like number of threads right in-line in the post too (and which might make it more likely for others to do so too). Number of threads is definitely far less important than the number of visits, but also can be still somewhat influential, which many people might not have realized before.

lightvector · **#187**

Also... I really should have said this earlier, but thanks for running all these tests! If I came off as critical earlier in previous posts, I apologize, I didn't intend that.

Between real life and other things, I have surprisingly little time left over after working on KataGo and maintaining the current run to run matches and test KataGo a lot myself, so it's fun to see people running matches and experiments like these.

Vargo · **#188**

lightvector wrote:

came off as critical...

Not at all, you're right, I'll try to run longer tests, and give more details about the settings.
I'm very grateful for KG, and I hope you can manage KG and all the rest !

ez4u · **#189**

I think the remaining question in these results is whether randomization works the same for the two bots? Does the -r parameter for LZ work the same as randomization in the Katago config file. Alternatively does the use of randomization make LZ systematically weaker than Katago in these matches? How do you construct a match without duplicate games but with no doubts about getting the best performance from both bots?

ez4u · **#190**

My mistake!!! I misremembered that the "-r" parameter is the one for randomness, but it is not. That is "-m".

However, instead of the question on randomness, don't we have an issue with when they resign? Vargo's gogui2gtp command is setting "-r 20", which AFAIK tells LZ to resign if the winrate falls below 20%. Meanwhile in Katago's default configuration...

Code:

# Resignation occurs if for at least resignConsecTurns in a row,
# the winLossUtility (which is on a [-1,1] scale) is below resignThreshold.
allowResignation = true
resignThreshold = -0.90
resignConsecTurns = 3

Hence Katago will only resign if it is below 5% (because the scale is -1 to 1?, or is it 10%?) for three moves in a row?

In any case, this does not seem to be symmetrical between the engines. My naive interpretation is that this represents an advantage in match conditions for Katago. Am I reading this correctly?

And · **#191**

lightvector please explain what the notation that KataGo displays mean, or at least some (T W S c L N LCB P WF PSV)

lightvector · **#192**

These are debug fields for development and testing and they're mostly not intended for casual users. I do not promise them to stay the same - they may change format or meaning in the future with no warning or explanation.

If you really care:
* Total utility
* Winloss utility
* Score utility
* Lead in points
* Nodes (e.g. visits)
* LCB value in utility
* Policy prior
* Weighting factor
* Move selection value (similar to visits, but post-processed in a few ways).

But as I said, this particular output is for debugging and testing, not for users, and because of that I don't want the responsibility of having to document it officially or explain any updates to it if it changes. Please look directly at the source code (searchresults.cpp) if you want more details. :razz:

lightvector · **#193**

ez4u wrote:

My mistake!!! I misremembered that the "-r" parameter is the one for randomness, but it is not. That is "-m".

However, instead of the question on randomness, don't we have an issue with when they resign? Vargo's gogui2gtp command is setting "-r 20", which AFAIK tells LZ to resign if the winrate falls below 20%. Meanwhile in Katago's default configuration...

Code:

# Resignation occurs if for at least resignConsecTurns in a row,
# the winLossUtility (which is on a [-1,1] scale) is below resignThreshold.
allowResignation = true
resignThreshold = -0.90
resignConsecTurns = 3

Hence Katago will only resign if it is below 5% (because the scale is -1 to 1?, or is it 10%?) for three moves in a row?

In any case, this does not seem to be symmetrical between the engines. My naive interpretation is that this represents an advantage in match conditions for Katago. Am I reading this correctly?

Yep, the scale is -1 to 1, so it means 5%. It would have to be a weird -1 to 0 scale for it to mean 10%. And yeah, to conduct a careful comparison between different bots you'd want to take care with the resignation and temperature settings (and optimal threading and tuning, and playouts vs visits, etc). Generally, since winrates are trained on at self-play conditions and are affected by the neural net's own Bayesian-like uncertainty, whereas match conditions are played with less noise and the search helps reduce uncertainty in a way that the raw net cannot do alone, it's very rare for the game to turn around after around 5%-10% winrate in match conditions (far less than 5% to 10% of the time). I agree 20% is a bit on the high side. Although I haven't studied the exact false-resign rate so I'm not entirely sure about this, despite that it should be less than 20% of resigned games, it might still be high enough to be noticeable.

And · **#194**

30 block networks do not work (GT610):
"...Testing 70 different configs
Uncaught exception: OpenCL error at C:\Data\Data\Coding\Python\KataGo\cpp\neuralnet\opencltuner.cpp, func err, line 555, error CL_OUT_OF_RESOURCES"
the screen goes blank for a couple of seconds and the inscription appears: "Display driver stopped responding and has recovered"
after that the tuning is interrupted.
I renamed the opencltuning file for 20 blocks (tune6_gpuGeForceGT610_x19_y19_c320_mv8.txt), it started working. but not sure if this is a good solution

lightvector · **#195**

When you copied the config manually to make it work, is there a number of threads that reproduces the same error when you try to use it, or try to run the benchmark? Specifically, with a large enough number of threads, causing a large enough batch size, does your GPU run out of resources again?

And · **#196**

in default_gtp.cfg file if searchnumSearchThreads = 6 or 8, works (tuning does not occur, the message "gtp ready" appears, genmove b - ok). if = 12, 16, 32 the test starts, tuning does not occur, the message "gtp ready" appears, genmove b - the KataGo stops working. that is, the same error is not.
deleted dummy file tune6_gpuGeForceGT610_x19_y19_c320_mv8.txt, set searchnumSearchThreads = 1, same error as at the beginning.
set searchnumSearchThreads = 32, g170e 20 block s3.35G - tuning ok

And · **#197**

30 block network works without problems for more than 12 hours (match engines). Auto tuning does not work. can I edit the opencltuning file obtained for a 20 or 40 block network, and what needs to be replaced? Could this lead to an error or performance degradation?

lightvector · **#198**

And wrote:

in default_gtp.cfg file if searchnumSearchThreads = 6 or 8, works (tuning does not occur, the message "gtp ready" appears, genmove b - ok). if = 12, 16, 32 the test starts, tuning does not occur, the message "gtp ready" appears, genmove b - the KataGo stops working. that is, the same error is not.
deleted dummy file tune6_gpuGeForceGT610_x19_y19_c320_mv8.txt, set searchnumSearchThreads = 1, same error as at the beginning.
set searchnumSearchThreads = 32, g170e 20 block s3.35G - tuning ok

And wrote:

30 block network works without problems for more than 12 hours (match engines). Auto tuning does not work. can I edit the opencltuning file obtained for a 20 or 40 block network, and what needs to be replaced? Could this lead to an error or performance degradation?

Given that you found 6 or 8 threads works, but 12 threads or more threads doesn't work, when using the 30 block networks (even if you skip the tuning), I'm going to guess that running the 30 block net is simply right on the borderline of what your GPU can handle.

I'm surprised the tuning, of all things, would cause it to fail as well, since the tuning allocates an amount of memory a little smaller than actual usage - it tunes using operations equivalent to a batch size of 2, in theory. But maybe there's something about the way the tuning is implemented that makes it more resource intensive, perhaps the fact that it also tries a lot of sub-optimal computational configurations too in the process of trying to find the best one?

Anyways, you just said that the 30 block networks works fine for you for smaller numbers of threads if you just use the 20 or 40 block net's tuning file. It's probably not optimal, but I don't see the point in trying to fiddle with it more if you can't run the tuning and the net anyways is borderline almost unable-to-be-handled by your GPU. You can try the 40 block network instead if you like. The 40 block net should be *less* resource intensive than the 30 block net regarding resource limits despite being more blocks, since the convolutions it does are smaller (256 channels, instead of 320).

Other than that, I think there's nothing for you to do here. If you want to run the 30 block with large numbers of threads (large batch size), I guess you might simply need a better GPU.

And · **#199**

thanks! for some reason I'm interested in the 30 block network. on another computer I have a GTX 1650. but part of the time I’m at a computer with a GT 610. if I understood correctly, I can use the opencltuning file of 20 blocks network

ez4u · **#200**

@lightvector As you continue your training, which of the new nets seems stronger the 30-block or the 40-block? What seem to be the differences between them?

KataGo V1.3

Who is online