Question about KataGo

lightvector · Post by **lightvector** » Mon Apr 27, 2020 2:14 pm

So the main reason for going to 5x5 would be if you thought 5x5 would also be good for center fights in regular Go. Or if you thought that the kinds of shapes and tactics and strategies that arise on the cyclic board would lend themselves to be better handled by large convolutions compared to small ones. Not having anything to do with the mechanics of the net's ability to process the board, but simply in terms of how it affects the net's broader ability to learn the strategy. If that is what you meant, then I agree, that's an interesting question.

But mind also one of the standard arguments for 3x3 instead of 5x5 if you merely want the net to be able to handle bigger "radius" effects: two 3x3s achieves the same radius as a single 5x5, but with fewer parameters needed (3*3 = 9 which is less than half of 5*5 = 25). So if you merely think that longer-distance interactions than normal are strategically important, then the first instinct might be to just add more blocks to make the net deeper, and the second instinct might be to do some sort of more specialized long-distance large convolutions (dilated, 1xN + Nx1 for much larger N, large low-rank) rather than 5x5. Which would be all pretty interesting experimental research.

What do you think?

lightvector · Post by **lightvector** » Mon Apr 27, 2020 3:20 pm

gcao wrote: Before looking into the CPP part, is there anything else in the python code that has to be touched? You mentioned train.py however I don't see any code that is relevant. I also updated board.py (see PR below) however you said it's not used for training.

https://github.com/gcao/KataGo/pull/1/f ... ded39c54d7

Perhaps there's nothing else in the python code that needs attention, maybe you don't need to touch train.py if the model has been updated. And yes, board.py is not used.

A big note of caution about the particular way you implemented the board.py change: I suspect this will not work in the C++ code, and perhaps already it possibly doesn't work in the python code if board.py were to actually be used. The problem is that sometimes there is more than one level of nesting of function calls that *each* uses this adj array - or a function might be recursive. If you simply replace this adj array, deeper layers will overwrite the adj array that the shallow layer needed and then when you return to the shallow layers you will have incorrect offsets.

Edit: Nevermind, I think I see I'm being stupid - there are also places where you included it inside the loop, rather than outside. Which I think always works...? Okay cool.

gcao · Post by **gcao** » Thu Apr 30, 2020 12:50 pm

I googled around but couldn't find much information regarding cyclic boundary in both CUDA and OpenCL. So I was going to make the changes you suggested in OpenCL code. Looks like I should update https://github.com/lightvector/KataGo/b ... ernels.cpp but don't know how. If you can give a little hint that'll be great.

lightvector wrote:Next, check the CUDA docs to see if they have cyclic convolution. If they do, it should be easy, otherwise, probably go with OpenCL and modify the winograd algorithms so that when the peek over the edge, they apply the appropriate modulo arithmetic lookup, instead of filling a 0.

gcao · Post by **gcao** » Sun May 31, 2020 8:04 pm

@lightvector

I'm looking into this again. Wonder whether below two places are where I need to update in order to support Daoqi in OpenCL. Do you mind giving some hints please? Thank you.

https://github.com/lightvector/KataGo/b ... s.cpp#L195
https://github.com/lightvector/KataGo/b ... s.cpp#L357

lightvector · Post by **lightvector** » Sun May 31, 2020 8:56 pm

Yep, I think those are the locations you need to transform into a cyclic boundary condition, loading the other side of the board instead of loading 0 when going past the edge.

There should also be a place you need to update within conv2dNCHW, which is the non-winograd convolution. There should be a very closely analogous place where it loads an input tile, and uses 0 if it's past the boundary.

gcao · Post by **gcao** » Mon Jun 01, 2020 5:50 am

Great! Thank you!

gcao · Post by **gcao** » Mon Jun 01, 2020 10:49 am

I updated the opencl kernels and have created a PR. When you get a chance, I would appreciate if you can take a little time to review the PR and see whether I missed any place. If not, I'll pull latest code to my repo and start training.

Whole PR: https://github.com/gcao/KataGo/pull/3/files

OpenCL kernel changes: https://github.com/gcao/KataGo/pull/3/f ... 028e9db1cd

Not sure whether I need to update this place as well.
https://github.com/lightvector/KataGo/b ... s.cpp#L125

Here is a short summary of the design of supporting Daoqi in the board.h/cpp and related code.

I moved the diag offsets to a separate field called diag_offsets. Both adj_offsets and diag_offsets take 8 values. The first 4 are for regular board positions, the last 4 are for edge positions.

In all places where adjacent points are computed, we use adj_offsets[0-3] to find adjacent points and check its value. If it's C_WALL, we use adj_offsets[4-7] to get the alternative adjacent points.

I hope this will work but if you find any issue with this, please do let me know. I don't want to start the training with any wrong design/implementation because I feel it's very hard to catch these issues during the training phase.

lightvector · Post by **lightvector** » Mon Jun 01, 2020 11:13 am

Cool!

I think it's very hard to try to catch bugs by reading the code for these kinds of changes - indexing code is easy to mess up in a way that a casual skim doesn't see, and a lot of kinds of bugs are from simply forgetting to change a place when it should be changed, which won't show up in a diff at all.

Your best bet is to test, test, test. Not by running the whole training loop, but by writing some code to interactively use the board in board.h to see if the rules are actually enforced the way they should be, create some sample positions to see if the ladder detection code works properly across the border, see if the pass-aliveness detection code correctly computes pass-alive groups across the border, etc. The existing tests for many of these things are in tests/testboardbasic.cpp, tests/testboardarea.cpp etc.

Similarly you can also test the neural net individual layers to see if they work as expected. You can look at cpp/tests/testnn.cpp for an example of this currently - it manually sets up some small input planes of simple floating point values, with some artificial convolution weights, applies the convolution, and compares them to the expected output (which I computed by hand when writing the test, pretty easy if all the numbers are simple).

Take a look at command/runtests.cpp to see the top-level code that calls down into these tests, many of which are probably broken now since they assume the original rules. Not that you need to worry about fixing them, but you can of course model after them in writing your own tests. Although sorry for the ad-hoc-ness and not using one of the more standardized C++ testing frameworks.

You can also do testing on the tensorflow side. Unfortunately TF1.5 and the complicated estimator interface makes it harder to test, but I think one way that should work is to either call out to or even just copy-paste the relevant functions that you modified, and then switch into eager mode ("v1.enable_eager_execution()") and then interactively in a python shell you can create some tensors with some values in them, call your convolution, and verify that the output tensor is the correct shape and eagerly computes the correct values.

Life In 19x19

Question about KataGo

Re: Question about KataGo

Re: Question about KataGo

Re: Question about KataGo

Re: Question about KataGo

Re: Question about KataGo

Re: Question about KataGo

Re: Question about KataGo

Re: Question about KataGo