Accelerating Self-Play Learning in Go

For discussing go computing, software announcements, etc.
Post Reply
dfan
Gosei
Posts: 1599
Joined: Wed Apr 21, 2010 8:49 am
Rank: AGA 2k Fox 3d
GD Posts: 61
KGS: dfan
Has thanked: 891 times
Been thanked: 534 times
Contact:

Accelerating Self-Play Learning in Go

Post by dfan »

Paper: https://arxiv.org/abs/1902.10565
Code: https://github.com/lightvector/KataGo

Very nice paper by lightvector detailing a lot of his experiments. In particular I'm very happy to see a lot of effort being put into novel methods of maximizing efficient learning rather than primarily duplicating DeepMind's research. Great work!
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: Accelerating Self-Play Learning in Go

Post by Bill Spight »

Hear, hear! :clap: :salute: :bow: :bow: :bow:
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
lightvector
Lives in sente
Posts: 759
Joined: Sat Jun 19, 2010 10:11 pm
Rank: maybe 2d
GD Posts: 0
Has thanked: 114 times
Been thanked: 916 times

Re: Accelerating Self-Play Learning in Go

Post by lightvector »

By the way, I have not done any GUI work or anything, but if any devs on the GUI side are interested, KataGo is a GTP engine that tracks its belief about the expected score difference rather than only winrate, which I hear is a pretty popular feature request among Go players... :) .

There's not currently a mechanism by which it reports that value over GTP (only dumping it into a log file), but it would be easy for me to add one if I knew what way to output it for some GUI that wanted to be able to display it.

In high handicap games like this one or this one, the utility for attempting to improve score is actually for a long time the sole force driving the search and the selection of moves beyond merely the policy prior, as the winning chance estimation remains solidly < 1% and doesn't distinguish between any moves until the game actually starts to become close. I have some doubts about whether invading 3-3 so much in high-handicap games is really such a good choice, but otherwise at least it does seem to play strong moves generally even when "objectively" dead lost.
bernds
Lives with ko
Posts: 259
Joined: Sun Apr 30, 2017 11:18 pm
Rank: 2d
GD Posts: 0
Has thanked: 46 times
Been thanked: 116 times

Re: Accelerating Self-Play Learning in Go

Post by bernds »

lightvector wrote:By the way, I have not done any GUI work or anything, but if any devs on the GUI side are interested, KataGo is a GTP engine that tracks its belief about the expected score difference rather than only winrate, which I hear is a pretty popular feature request among Go players... :) .

There's not currently a mechanism by which it reports that value over GTP (only dumping it into a log file), but it would be easy for me to add one if I knew what way to output it for some GUI that wanted to be able to display it.
For q5go, it would be nice to have a variant of the lz-analyze command which produces the same kind of information as Leela Zero does, plus one extra field with the expected score. I could use "known_command kata-analyze" first to determine which of the two variants to use.

(edit) Come to think of it, you could annotate the self-play games with the standard SGF V[] property, which is defined as the estimated score.

I managed to build it here, and it seems to work fine. Your CUDA requirements seem too high: I have CUDA 9.0.176 and cudnn-7.1. There were some ptx warnings about an experimental feature, but self-play produces reasonable results so I assume it's working.

I might send you some patches later that I needed to make the cmake setup work for me.

Awesome project! Now we just need to crowdsource a run with a few million games.
Elom
Lives in sente
Posts: 827
Joined: Mon Aug 11, 2014 1:18 am
Rank: OGS 9kyu
GD Posts: 0
Universal go server handle: WindnWater, Elom
Location: UK
Has thanked: 568 times
Been thanked: 84 times

Re: Accelerating Self-Play Learning in Go

Post by Elom »

Wow.
On Go proverbs:
"A fine Gotation is a diamond in the hand of a dan of wit and a pebble in the hand of a kyu" —Joseph Raux misquoted.
Post Reply