It is currently Thu Mar 28, 2024 4:41 pm

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next
Author Message
Offline
 Post subject: Question about KataGo
Post #1 Posted: Sun Apr 05, 2020 6:45 am 
Dies in gote

Posts: 25
Liked others: 0
Was liked: 2
Rank: AGA 6D
Not sure whether this is the right place to ask this question. Please let me know if there is a better channel. Thanks.

I would like to adapt KataGo to train a NN model to play Daoqi(also known as Toroidal Go - https://senseis.xmp.net/?ToroidalGo). I assume it will be pretty straightforward. Just need to change all places that find the adjacent points. KataGo is pretty large with a lot of C++ and Python code. Can someone please give me some hints on which files I need to update? I am in the process of finding those places, However I don't want to miss any before I start to train. Thanks in advance for your help.


This post by gcao was liked by: Bill Spight
Top
 Profile  
 
Offline
 Post subject: Re: Question about KataGo
Post #2 Posted: Sun Apr 05, 2020 9:33 am 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
Do you have experience in OpenCL or CUDA? I think you will also need to be willing to dive into the low-level details of the GPU code and modify and rewrite some of the routines.

The problem is that even if you change all the normal C++ code to handle the rules correctly, the neural net is going to have serious issues handling the boundary where one side wraps to the other, because those two are going to be separated by the entire width of the board yet need to be treated as connected - even for a deep-enough net, this will be disastrous for the inductive bias of the net, and to the degree that it's still eventually learnable, it will waste tons of internal learning capacity in the net, since the net will have to keep re-learning every different shape both in the middle and on the boundary, and the corner, as well as to waste lots of channels of lots of blocks laboriously transferring information across the entire board between boundaries.

The way to fix this would be to make the convolution have a cyclic boundary condition rather than a zero-padded boundary condition, also known sometimes as "circular" convolution. This is certainly a thing, so you might not be working from scratch and might already be doable with the right function calls and such, but you'll still have to figure it out in at least one of OpenCL or CUDA+CUDNN (for the C++ engine), as well as how to do it in Tensorflow (for the python GPU training).

For the plain C++ stuff, you're going to want to look at the high-level overview here. Scroll lower down on this page to see a description of the major directories and files:
https://github.com/lightvector/KataGo/tree/master/cpp

I have not reviewed the code myself, but probably you will need changes in game, neuralnet, search.

The following may also help to find most of the key spots (but I'm not sure this will catch all of them):
grep 'adj_' . -r --include=*.{cpp,h}

For the python, you'll need to fix up mainly train.py and model.py - those are the two critical files for training. (board.py shouldn't actually be used in the self-play training loop, since no actual gameplay happens in python, and python doesn't even compute any features or properties of the board - it purely reads in data from the C++ and trains with it). The main way you'll need to fix them is to figure out how to do circular convolution in Tensorflow.


This post by lightvector was liked by: Bill Spight
Top
 Profile  
 
Offline
 Post subject: Re: Question about KataGo
Post #3 Posted: Sun Apr 05, 2020 11:40 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
The convolutions will be different, right?

I think that the representation of a toroidal board can be realized on a larger, finite board. For instance, I think that a square 3x3 board can be represented as a "Manhattan circle" of radius 3. I.e., the center point and all points within a Manhattan distance of 3 from it. L19 does not allow 7x7 sgf boards, so here is a 3x3 game on a 9x9 board.



Each play in duplicated on the corresponding points within the circle. :)

Using GoWrite I could not obviously capture the stones without deleting previous plays. Edit2: I have added a variation where the final capture is not so obvious.

Edit: I see that 11x11 toroidal go is played on the 19x19. For training purposes a Manhattan circle of radius 11 has only 265 points, which may be more efficient than a 19x19 square with 362 points. The main playing area is the same, but the convolutions may be easier for training on the smaller area. I don't know, myself. ;)

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: Question about KataGo
Post #4 Posted: Mon Apr 06, 2020 6:45 am 
Dies in gote

Posts: 25
Liked others: 0
Was liked: 2
Rank: AGA 6D
@lightvector, @Bill Spight
Thank you!

It shouldn't be surprising but I am surprised :lol: . I'll dig into the GPU and TF code to see whether I can do something. I have no experience with both of them so it'll take some time.

Top
 Profile  
 
Offline
 Post subject: Re: Question about KataGo
Post #5 Posted: Mon Apr 06, 2020 7:38 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
I am not happy with the board where you have to deduce the capture. Maybe adding 12 points to the "circle" will work to show every possible capture by surrounding with opposing stones. And for the 11x11 board, adding 44 points, which would bring the number of points up to 309.

I'll add a new SGF file here soon.



No, if you want to see all captures, you need a bigger board. :( Edit: It's worse than that. Or better, as the case may be, because it is impossible to fill all the liberties of some strings with stones of the opposite color. So we have to rely on there being on liberty beyond the edge of the board, at least for some strings. Maybe the Manhattan circle is good enough. :scratch:

I'll exit gracefully. :lol:

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.


Last edited by Bill Spight on Mon Apr 06, 2020 2:34 pm, edited 2 times in total.
Top
 Profile  
 
Offline
 Post subject: Re: Question about KataGo
Post #6 Posted: Mon Apr 06, 2020 12:31 pm 
Dies in gote

Posts: 25
Liked others: 0
Was liked: 2
Rank: AGA 6D
@lightvector I wonder if for every self-play game I generate one or multiple supplemental games by adjusting the board center and rotation, will that be helpful without modifying any of the GPU/TF code.

Top
 Profile  
 
Offline
 Post subject: Re: Question about KataGo
Post #7 Posted: Mon Apr 06, 2020 1:43 pm 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
No, I doubt that will fix the problem. It doesn't fix most of the specific points of the problem I mentioned - the fact that the perceptual field of the convolutions will not consider different edges to be adjacent, the wastage of learning capacity, etc. This is the sort of thing that is really desirable to just do properly once and for all, rather than taking some shortcut that will perpetually cause problems down the line.

And once you actually implement convolution with cyclic boundary conditions, augmentation by translating the data will even be entirely unnecessary. The neural net literally will be unable to tell the difference between any possible translation of the board, giving you a truly translation-invariant evaluation.

(augmentations by rotations and reflections will still help though, just like in regular Go).

Top
 Profile  
 
Offline
 Post subject: Re: Question about KataGo
Post #8 Posted: Mon Apr 06, 2020 1:52 pm 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
For example, start with the Tensorflow code on the python side. If Tensorflow doesn't have built-in support (google around, search the docs thoroughly) then switch it from SAME to VALID padding to disable automatic zero-padding on the convolution call in Tensorflow and then do a little bit of python slicing and concatenating to concatenate convolution_radius-many rows from the cyclicly on to the tensor. For example on a 19x19 board with a 5x5 convolution (radius 2), you'd do the appropriate slicing and concatenating to construct a 23x23 board where the middle 19x19 is the same but there's a 2-thick border of repeated stuff from the other side, and then after the 5x5 convolution, it would be 19x19 again. Put this on the place where 2d convolution is implemented in model.py, so that it happens on *every* convolution everywhere in the net.

Next, check the CUDA docs to see if they have cyclic convolution. If they do, it should be easy, otherwise, probably go with OpenCL and modify the winograd algorithms so that when the peek over the edge, they apply the appropriate modulo arithmetic lookup, instead of filling a 0.

Top
 Profile  
 
Offline
 Post subject: Re: Question about KataGo
Post #9 Posted: Mon Apr 06, 2020 2:27 pm 
Dies in gote

Posts: 25
Liked others: 0
Was liked: 2
Rank: AGA 6D
Good to know! So I don't have to waste time on that.

Top
 Profile  
 
Offline
 Post subject: Re: Question about KataGo
Post #10 Posted: Sun Apr 26, 2020 9:13 am 
Dies in gote

Posts: 25
Liked others: 0
Was liked: 2
Rank: AGA 6D
Sorry to bother you again with dummy questions. And thank you very much for creating this great project and thorough answers you have given.

I tried to do the cyclic operation in the conv2d function but run into an error later. Not sure whether I did something wrong here or there is other places I need to modify as well.

Here is the PR https://github.com/gcao/KataGo/pull/2/files

The error I got

Code:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1607, in _create_c_op
    c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimensions must be equal, but are 23 and 19 for 'swa_model/mul_3' (op: 'Mul') with input shapes: [?,23,23,96], [?,19,19,1].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/gcao/.vscode-insiders/extensions/ms-python.python-2020.4.74986/pythonFiles/lib/python/debugpy/wheels/debugpy/__main__.py", line 45, in <module>
    cli.main()
  File "/Users/gcao/.vscode-insiders/extensions/ms-python.python-2020.4.74986/pythonFiles/lib/python/debugpy/wheels/debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/Users/gcao/.vscode-insiders/extensions/ms-python.python-2020.4.74986/pythonFiles/lib/python/debugpy/wheels/debugpy/../debugpy/server/cli.py", line 267, in run_file
    runpy.run_path(options.target, run_name=compat.force_str("__main__"))
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/gcao/daoqi/KataGo/python/train.py", line 133, in <module>
    swa_model = Model(model_config,pos_len,placeholders={})
  File "/Users/gcao/daoqi/KataGo/python/model.py", line 116, in __init__
    self.build_model(config,placeholders)
  File "/Users/gcao/daoqi/KataGo/python/model.py", line 1002, in build_model
   
  File "/Users/gcao/daoqi/KataGo/python/model.py", line 725, in res_conv_block
    scale_initial_weights=1.0, emphasize_center_weight=None, emphasize_center_lr=None, reg=True
  File "/Users/gcao/daoqi/KataGo/python/model.py", line 517, in batchnorm_and_mask
    return (tensor + beta) * mask
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/ops/math_ops.py", line 899, in binary_op_wrapper
    return func(x, y, name=name)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/ops/math_ops.py", line 1206, in _mul_dispatch
    return gen_math_ops.mul(x, y, name=name)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 6701, in mul
    "Mul", x=x, y=y, name=name)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1770, in __init__
    control_input_ops)
  File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1610, in _create_c_op
    raise ValueError(str(e))
ValueError: Dimensions must be equal, but are 23 and 19 for 'swa_model/mul_3' (op: 'Mul') with input shapes: [?,23,23,96], [?,19,19,1].

Top
 Profile  
 
Offline
 Post subject: Re: Question about KataGo
Post #11 Posted: Sun Apr 26, 2020 9:31 am 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
Tensorflow is reporting exactly the mismatch in tensor dimensions in your exception - you have somehow ended up with a 23x23 size tensor for the width and height dimensions where it expects to be able to multiply it pointwise it with a tensor whose dimensions are 19x19.

Since 19x19 is the board size, that sounds like the right size, so seems like your change incorrectly results in a 23x23 tensor somewhere. Follow the backtrace it gives you for the lines the error occurs on in each function. Add debug statements to print out the shapes of tensors on each line, and you should be able to find the mistake you are making.

Keep in mind not all the convolutions are the same size. Some are 1x1, some are 3x3 and there is also one 5x5 convolution.

Top
 Profile  
 
Offline
 Post subject: Re: Question about KataGo
Post #12 Posted: Sun Apr 26, 2020 3:06 pm 
Dies in gote

Posts: 25
Liked others: 0
Was liked: 2
Rank: AGA 6D
I've updated the conv2d and dilated_conv2d to take diam parameter and add the cyclic boundary when necessary. I also updated diam to 5 wherever it's set to 3. However I noticed diam is set to 1 in many other places (e.g. p1/intermediate_conv, g1, p2 layers). Should I change the diameter in all places or it is not applicable to some?

PR: https://github.com/gcao/KataGo/pull/2/files

Top
 Profile  
 
Offline
 Post subject: Re: Question about KataGo
Post #13 Posted: Sun Apr 26, 2020 10:19 pm 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
Ummm, why are you changing the diameter anywhere? This will have a massive impact on the number of parameters in the neural net as well as its computational cost and capacity. :o :)

If you're going to experiment with such major changes like this to the architecture of the net, probably you should plan to do multiple runs, comparing such changes against baseline (leaving everything at the diameter it was). There are specific reasons for the diameters being what they are now, changing them is not going to be a good idea without careful testing to see the effect on learning efficiency and computational speed.

Edit: In particular having the vast majority of convolutions be 3x3 is true for AlphaZero and all major neural nets in board game AIs based on it since then, and 3x3 is more parameter-efficient and in some ways more computationally expressive per radius. Of course, not many people have tested 5x5, so it could be better, but if you're going to deviate from standard, you absolutely want to do it in a way you can compare to see if your change is any good.

Top
 Profile  
 
Offline
 Post subject: Re: Question about KataGo
Post #14 Posted: Mon Apr 27, 2020 6:47 am 
Dies in gote

Posts: 25
Liked others: 0
Was liked: 2
Rank: AGA 6D
I guess diam=5 makes it a 5x5 convolution as mentioned in one of your post. That's why I changed it to 5. I'm not sure 3x3 is good enough for the net to handle the cyclic boundaries properly, even though it's good for Go.

Anyway you are totally right I should compare diam=3 and diam=5 in order to see the impact to computational cost and training speed etc.

As the starting point, I'll change diam back to 3 so I don't have to worry about changing any other hyper parameters.

Before looking into the CPP part, is there anything else in the python code that has to be touched? You mentioned train.py however I don't see any code that is relevant. I also updated board.py (see PR below) however you said it's not used for training.

https://github.com/gcao/KataGo/pull/1/f ... ded39c54d7

Top
 Profile  
 
Offline
 Post subject: Re: Question about KataGo
Post #15 Posted: Mon Apr 27, 2020 2:11 pm 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
Quote:
I'm not sure 3x3 is good enough for the net to handle the cyclic boundaries properly, even though it's good for Go.


Ah. But once you have cyclic convolution in place, the neural net can't "tell" where the boundary is. Mathematically it is as if the boundaries don't exist at all. Every point on the board is 100% symmetrically treated by the convolution compared to every other, no matter how close or far to the boundary. :)

For example:

In one dimension, if we apply the diameter 3 convolution with weights [0 1 1], namely "each point becomes the sum of itself and its right neighbor" to the length 6 vector [1,0,2,3,0,0], we have:
[1,0,2,3,0,0] -> (your new cyclic padding) -> [0,1,0,2,3,0,0,1] -> convolve -> [1,2,5,3,0,1]

If we rotated our perspective around this cyclic size-6 board by 2 spaces rightward first and then tried the same convolution:
[2,3,0,0,1,0] -> (your new cyclic padding) -> [0,2,3,0,0,1,0,2] -> convolve -> [5,3,0,1,1,2]

Voila: [5,3,0,1,1,2] is the same as [1,2,5,3,0,1] if we rotate our perspective 2 spaces to the right.

Therefore, if 3x3 is enough to handle the the center of the board in normal Go, then it should be enough to handle the boundary as well in cyclic Go, because now there is no mathematical distinction as to where the boundary is. Points on the boundary also behave exactly like the center points would in normal Go, for the convolution operation.

Top
 Profile  
 
Offline
 Post subject: Re: Question about KataGo
Post #16 Posted: Mon Apr 27, 2020 2:14 pm 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
So the main reason for going to 5x5 would be if you thought 5x5 would also be good for center fights in regular Go. Or if you thought that the kinds of shapes and tactics and strategies that arise on the cyclic board would lend themselves to be better handled by large convolutions compared to small ones. Not having anything to do with the mechanics of the net's ability to process the board, but simply in terms of how it affects the net's broader ability to learn the strategy. If that is what you meant, then I agree, that's an interesting question.

But mind also one of the standard arguments for 3x3 instead of 5x5 if you merely want the net to be able to handle bigger "radius" effects: two 3x3s achieves the same radius as a single 5x5, but with fewer parameters needed (3*3 = 9 which is less than half of 5*5 = 25). So if you merely think that longer-distance interactions than normal are strategically important, then the first instinct might be to just add more blocks to make the net deeper, and the second instinct might be to do some sort of more specialized long-distance large convolutions (dilated, 1xN + Nx1 for much larger N, large low-rank) rather than 5x5. Which would be all pretty interesting experimental research.

What do you think? :)

Top
 Profile  
 
Offline
 Post subject: Re: Question about KataGo
Post #17 Posted: Mon Apr 27, 2020 3:20 pm 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
gcao wrote:
Before looking into the CPP part, is there anything else in the python code that has to be touched? You mentioned train.py however I don't see any code that is relevant. I also updated board.py (see PR below) however you said it's not used for training.

https://github.com/gcao/KataGo/pull/1/f ... ded39c54d7


Perhaps there's nothing else in the python code that needs attention, maybe you don't need to touch train.py if the model has been updated. And yes, board.py is not used.

A big note of caution about the particular way you implemented the board.py change: I suspect this will not work in the C++ code, and perhaps already it possibly doesn't work in the python code if board.py were to actually be used. The problem is that sometimes there is more than one level of nesting of function calls that *each* uses this adj array - or a function might be recursive. If you simply replace this adj array, deeper layers will overwrite the adj array that the shallow layer needed and then when you return to the shallow layers you will have incorrect offsets.

Edit: Nevermind, I think I see I'm being stupid - there are also places where you included it inside the loop, rather than outside. Which I think always works...? Okay cool. :)

Top
 Profile  
 
Offline
 Post subject: Re: Question about KataGo
Post #18 Posted: Thu Apr 30, 2020 12:50 pm 
Dies in gote

Posts: 25
Liked others: 0
Was liked: 2
Rank: AGA 6D
I googled around but couldn't find much information regarding cyclic boundary in both CUDA and OpenCL. So I was going to make the changes you suggested in OpenCL code. Looks like I should update https://github.com/lightvector/KataGo/b ... ernels.cpp but don't know how. If you can give a little hint that'll be great.

lightvector wrote:
Next, check the CUDA docs to see if they have cyclic convolution. If they do, it should be easy, otherwise, probably go with OpenCL and modify the winograd algorithms so that when the peek over the edge, they apply the appropriate modulo arithmetic lookup, instead of filling a 0.

Top
 Profile  
 
Offline
 Post subject: Re: Question about KataGo
Post #19 Posted: Sun May 31, 2020 8:04 pm 
Dies in gote

Posts: 25
Liked others: 0
Was liked: 2
Rank: AGA 6D
@lightvector

I'm looking into this again. Wonder whether below two places are where I need to update in order to support Daoqi in OpenCL. Do you mind giving some hints please? Thank you.

https://github.com/lightvector/KataGo/b ... s.cpp#L195
https://github.com/lightvector/KataGo/b ... s.cpp#L357

Top
 Profile  
 
Offline
 Post subject: Re: Question about KataGo
Post #20 Posted: Sun May 31, 2020 8:56 pm 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
Yep, I think those are the locations you need to transform into a cyclic boundary condition, loading the other side of the board instead of loading 0 when going past the edge.

There should also be a place you need to update within conv2dNCHW, which is the non-winograd convolution. There should be a very closely analogous place where it loads an input tile, and uses 0 if it's past the boundary.

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group