Accelerating Self-Play Learning in Go

dfan · Post by **dfan** » Thu Feb 28, 2019 8:51 am

Paper: https://arxiv.org/abs/1902.10565
Code: https://github.com/lightvector/KataGo

Very nice paper by lightvector detailing a lot of his experiments. In particular I'm very happy to see a lot of effort being put into novel methods of maximizing efficient learning rather than primarily duplicating DeepMind's research. Great work!

Bill Spight · Post by **Bill Spight** » Thu Feb 28, 2019 10:37 am

Hear, hear!

lightvector · Post by **lightvector** » Thu Feb 28, 2019 9:25 pm

By the way, I have not done any GUI work or anything, but if any devs on the GUI side are interested, KataGo is a GTP engine that tracks its belief about the expected score difference rather than only winrate, which I hear is a pretty popular feature request among Go players...

.

There's not currently a mechanism by which it reports that value over GTP (only dumping it into a log file), but it would be easy for me to add one if I knew what way to output it for some GUI that wanted to be able to display it.

In high handicap games like this one or this one, the utility for attempting to improve score is actually for a long time the sole force driving the search and the selection of moves beyond merely the policy prior, as the winning chance estimation remains solidly < 1% and doesn't distinguish between any moves until the game actually starts to become close. I have some doubts about whether invading 3-3 so much in high-handicap games is really such a good choice, but otherwise at least it does seem to play strong moves generally even when "objectively" dead lost.

bernds · Post by **bernds** » Thu Feb 28, 2019 10:01 pm

lightvector wrote:By the way, I have not done any GUI work or anything, but if any devs on the GUI side are interested, KataGo is a GTP engine that tracks its belief about the expected score difference rather than only winrate, which I hear is a pretty popular feature request among Go players... .

There's not currently a mechanism by which it reports that value over GTP (only dumping it into a log file), but it would be easy for me to add one if I knew what way to output it for some GUI that wanted to be able to display it.

For q5go, it would be nice to have a variant of the lz-analyze command which produces the same kind of information as Leela Zero does, plus one extra field with the expected score. I could use "known_command kata-analyze" first to determine which of the two variants to use.

(edit) Come to think of it, you could annotate the self-play games with the standard SGF V[] property, which is defined as the estimated score.

I managed to build it here, and it seems to work fine. Your CUDA requirements seem too high: I have CUDA 9.0.176 and cudnn-7.1. There were some ptx warnings about an experimental feature, but self-play produces reasonable results so I assume it's working.

I might send you some patches later that I needed to make the cmake setup work for me.

Awesome project! Now we just need to crowdsource a run with a few million games.

Elom · Post by **Elom** » Sat Mar 09, 2019 7:40 am

Life In 19x19

Accelerating Self-Play Learning in Go

Accelerating Self-Play Learning in Go

Re: Accelerating Self-Play Learning in Go

Re: Accelerating Self-Play Learning in Go

Re: Accelerating Self-Play Learning in Go

Re: Accelerating Self-Play Learning in Go