KataGo V1.3

Vargo · Post by **Vargo** » Wed Apr 01, 2020 7:13 am

50 game test : KG 1.3.4 g170e-b20c256x2-s3354994176-d716845198 v. LZ#270
twogtp 1.5.1, 3200 visits for LZ and 1600 visits for KG, all games by resignation, no error, no duplicate game
KataGo wins 31-19 = 62%

details : (KG always appears as W, because of the command -alternate)

the games : (KG is W in the even-numbered games)

KG134v1600_LZ270v3200.rar: (44.41 KiB) Downloaded 508 times

And · Post by **And** » Wed Apr 01, 2020 8:00 am

Vargo it is possible without an “-alternate” in one bat file, first the KataGo plays white, then black, there is no confusion with color:

Code: Select all

gogui-twogtp -black "C:\\Users\\jm\\gogui151\\LZ017\\leelaz.exe ..." -white "C:\\Users\\jm\\gogui151\\kata134\\katago.exe  ..." -games 25 -sgffile C:\\Users\\jm\\gogui151\\kata134_white_LZ270b.dat -auto -komi 7.5
gogui-twogtp -black "C:\\Users\\jm\\gogui151\\kata134\\katago.exe  ..." -white "C:\\Users\\jm\\gogui151\\LZ017\\leelaz.exe ..." -games 25 -sgffile C:\\Users\\jm\\gogui151\\kata134_black_LZ270b.dat -auto -komi 7.5

Vargo · Post by **Vargo** » Thu Apr 02, 2020 5:56 am

And wrote:it is possible without an “-alternate” in one bat file

You're right, with -alternate, results are is a bit strange to read, but with -alternate, there's only one stat file, with all results and details on the same page, I like that. So... I don't know what's best

Another 50 game test :
KG 1.3.4 g170-b20-s335-d716 (1600 visits)
v.
LZ #270 (6400 visits)

LZ wins 29-21 (58%)
twogtp 1.5.1, all games by resignation, no error, no duplicate game

Maybe I'm wrong, but winning 84% at time parity could mean KG_1600visits is 1+ rank stronger than LZ270_1600visits, winning 62% against LZ at 3200 visits seems consistent with 1+ rank, and winning 42% against LZ at 6400 visits seems consistent too (?) but these tests are only 50 games each...

details :

KG134_1600v_LZ270_6400v.rar: (46.72 KiB) Downloaded 499 times

lightvector · Post by **lightvector** » Thu Apr 02, 2020 6:06 am

Just a reminder again - it's generally good to report how many threads were used (and also of course any major options changed from defaults if any).

Setting the number of visits alone is insufficient to establish a fixed reproducible level of strength. The number of threads used to search that number of visits also influences the strength, with more threads slightly decreasing strength.

Vargo · Post by **Vargo** » Thu Apr 02, 2020 6:57 am

Config file generated by "genconfig", only change is : maxVisits= 1600
gpu : 1x GTX1080, numSearchThreads = 16

lightvector · Post by **lightvector** » Thu Apr 02, 2020 7:31 am

Cool yep, just saying that going forward when you post big headline test results, you might also want to make it a habit to report things like number of threads right in-line in the post too (and which might make it more likely for others to do so too). Number of threads is definitely far less important than the number of visits, but also can be still somewhat influential, which many people might not have realized before.

lightvector · Post by **lightvector** » Thu Apr 02, 2020 10:04 pm

Also... I really should have said this earlier, but thanks for running all these tests! If I came off as critical earlier in previous posts, I apologize, I didn't intend that.

Between real life and other things, I have surprisingly little time left over after working on KataGo and maintaining the current run to run matches and test KataGo a lot myself, so it's fun to see people running matches and experiments like these.

Vargo · Post by **Vargo** » Fri Apr 03, 2020 3:02 am

lightvector wrote:came off as critical...

Not at all, you're right, I'll try to run longer tests, and give more details about the settings.
I'm very grateful for KG, and I hope you can manage KG and all the rest !

ez4u · Post by **ez4u** » Fri Apr 03, 2020 5:26 am

I think the remaining question in these results is whether randomization works the same for the two bots? Does the -r parameter for LZ work the same as randomization in the Katago config file. Alternatively does the use of randomization make LZ systematically weaker than Katago in these matches? How do you construct a match without duplicate games but with no doubts about getting the best performance from both bots?

ez4u · Post by **ez4u** » Fri Apr 03, 2020 7:25 pm

My mistake!!! I misremembered that the "-r" parameter is the one for randomness, but it is not. That is "-m".

However, instead of the question on randomness, don't we have an issue with when they resign? Vargo's gogui2gtp command is setting "-r 20", which AFAIK tells LZ to resign if the winrate falls below 20%. Meanwhile in Katago's default configuration...

Code: Select all

# Resignation occurs if for at least resignConsecTurns in a row,
# the winLossUtility (which is on a [-1,1] scale) is below resignThreshold.
allowResignation = true
resignThreshold = -0.90
resignConsecTurns = 3

Hence Katago will only resign if it is below 5% (because the scale is -1 to 1?, or is it 10%?) for three moves in a row?

In any case, this does not seem to be symmetrical between the engines. My naive interpretation is that this represents an advantage in match conditions for Katago. Am I reading this correctly?

And · Post by **And** » Mon Apr 06, 2020 9:19 am

lightvector please explain what the notation that KataGo displays mean, or at least some (T W S c L N LCB P WF PSV)

lightvector · Post by **lightvector** » Mon Apr 06, 2020 7:53 pm

These are debug fields for development and testing and they're mostly not intended for casual users. I do not promise them to stay the same - they may change format or meaning in the future with no warning or explanation.

If you really care:
* Total utility
* Winloss utility
* Score utility
* Lead in points
* Nodes (e.g. visits)
* LCB value in utility
* Policy prior
* Weighting factor
* Move selection value (similar to visits, but post-processed in a few ways).

But as I said, this particular output is for debugging and testing, not for users, and because of that I don't want the responsibility of having to document it officially or explain any updates to it if it changes. Please look directly at the source code (searchresults.cpp) if you want more details.

lightvector · Post by **lightvector** » Mon Apr 06, 2020 8:25 pm

ez4u wrote:My mistake!!! I misremembered that the "-r" parameter is the one for randomness, but it is not. That is "-m".
However, instead of the question on randomness, don't we have an issue with when they resign? Vargo's gogui2gtp command is setting "-r 20", which AFAIK tells LZ to resign if the winrate falls below 20%. Meanwhile in Katago's default configuration...
Code: Select all
# Resignation occurs if for at least resignConsecTurns in a row,
# the winLossUtility (which is on a [-1,1] scale) is below resignThreshold.
allowResignation = true
resignThreshold = -0.90
resignConsecTurns = 3
Hence Katago will only resign if it is below 5% (because the scale is -1 to 1?, or is it 10%?) for three moves in a row?

In any case, this does not seem to be symmetrical between the engines. My naive interpretation is that this represents an advantage in match conditions for Katago. Am I reading this correctly?

Yep, the scale is -1 to 1, so it means 5%. It would have to be a weird -1 to 0 scale for it to mean 10%. And yeah, to conduct a careful comparison between different bots you'd want to take care with the resignation and temperature settings (and optimal threading and tuning, and playouts vs visits, etc). Generally, since winrates are trained on at self-play conditions and are affected by the neural net's own Bayesian-like uncertainty, whereas match conditions are played with less noise and the search helps reduce uncertainty in a way that the raw net cannot do alone, it's very rare for the game to turn around after around 5%-10% winrate in match conditions (far less than 5% to 10% of the time). I agree 20% is a bit on the high side. Although I haven't studied the exact false-resign rate so I'm not entirely sure about this, despite that it should be less than 20% of resigned games, it might still be high enough to be noticeable.

And · Post by **And** » Tue Apr 07, 2020 9:05 am

30 block networks do not work (GT610):
"...Testing 70 different configs
Uncaught exception: OpenCL error at C:\Data\Data\Coding\Python\KataGo\cpp\neuralnet\opencltuner.cpp, func err, line 555, error CL_OUT_OF_RESOURCES"
the screen goes blank for a couple of seconds and the inscription appears: "Display driver stopped responding and has recovered"
after that the tuning is interrupted.
I renamed the opencltuning file for 20 blocks (tune6_gpuGeForceGT610_x19_y19_c320_mv8.txt), it started working. but not sure if this is a good solution

lightvector · Post by **lightvector** » Tue Apr 07, 2020 6:54 pm

When you copied the config manually to make it work, is there a number of threads that reproduces the same error when you try to use it, or try to run the benchmark? Specifically, with a large enough number of threads, causing a large enough batch size, does your GPU run out of resources again?

Life In 19x19

KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3