Paper: https://arxiv.org/abs/1902.10565
Code: https://github.com/lightvector/KataGo
Very nice paper by lightvector detailing a lot of his experiments. In particular I'm very happy to see a lot of effort being put into novel methods of maximizing efficient learning rather than primarily duplicating DeepMind's research. Great work!
Accelerating Self-Play Learning in Go
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: Accelerating Self-Play Learning in Go
Hear, hear!

The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
lightvector
- Lives in sente
- Posts: 759
- Joined: Sat Jun 19, 2010 10:11 pm
- Rank: maybe 2d
- GD Posts: 0
- Has thanked: 114 times
- Been thanked: 916 times
Re: Accelerating Self-Play Learning in Go
By the way, I have not done any GUI work or anything, but if any devs on the GUI side are interested, KataGo is a GTP engine that tracks its belief about the expected score difference rather than only winrate, which I hear is a pretty popular feature request among Go players...
.
There's not currently a mechanism by which it reports that value over GTP (only dumping it into a log file), but it would be easy for me to add one if I knew what way to output it for some GUI that wanted to be able to display it.
In high handicap games like this one or this one, the utility for attempting to improve score is actually for a long time the sole force driving the search and the selection of moves beyond merely the policy prior, as the winning chance estimation remains solidly < 1% and doesn't distinguish between any moves until the game actually starts to become close. I have some doubts about whether invading 3-3 so much in high-handicap games is really such a good choice, but otherwise at least it does seem to play strong moves generally even when "objectively" dead lost.
There's not currently a mechanism by which it reports that value over GTP (only dumping it into a log file), but it would be easy for me to add one if I knew what way to output it for some GUI that wanted to be able to display it.
In high handicap games like this one or this one, the utility for attempting to improve score is actually for a long time the sole force driving the search and the selection of moves beyond merely the policy prior, as the winning chance estimation remains solidly < 1% and doesn't distinguish between any moves until the game actually starts to become close. I have some doubts about whether invading 3-3 so much in high-handicap games is really such a good choice, but otherwise at least it does seem to play strong moves generally even when "objectively" dead lost.
-
bernds
- Lives with ko
- Posts: 259
- Joined: Sun Apr 30, 2017 11:18 pm
- Rank: 2d
- GD Posts: 0
- Has thanked: 46 times
- Been thanked: 116 times
Re: Accelerating Self-Play Learning in Go
For q5go, it would be nice to have a variant of the lz-analyze command which produces the same kind of information as Leela Zero does, plus one extra field with the expected score. I could use "known_command kata-analyze" first to determine which of the two variants to use.lightvector wrote:By the way, I have not done any GUI work or anything, but if any devs on the GUI side are interested, KataGo is a GTP engine that tracks its belief about the expected score difference rather than only winrate, which I hear is a pretty popular feature request among Go players....
There's not currently a mechanism by which it reports that value over GTP (only dumping it into a log file), but it would be easy for me to add one if I knew what way to output it for some GUI that wanted to be able to display it.
(edit) Come to think of it, you could annotate the self-play games with the standard SGF V[] property, which is defined as the estimated score.
I managed to build it here, and it seems to work fine. Your CUDA requirements seem too high: I have CUDA 9.0.176 and cudnn-7.1. There were some ptx warnings about an experimental feature, but self-play produces reasonable results so I assume it's working.
I might send you some patches later that I needed to make the cmake setup work for me.
Awesome project! Now we just need to crowdsource a run with a few million games.
-
Elom
- Lives in sente
- Posts: 827
- Joined: Mon Aug 11, 2014 1:18 am
- Rank: OGS 9kyu
- GD Posts: 0
- Universal go server handle: WindnWater, Elom
- Location: UK
- Has thanked: 568 times
- Been thanked: 84 times
Re: Accelerating Self-Play Learning in Go
Wow.
On Go proverbs:
"A fine Gotation is a diamond in the hand of a dan of wit and a pebble in the hand of a kyu" —Joseph Raux misquoted.
"A fine Gotation is a diamond in the hand of a dan of wit and a pebble in the hand of a kyu" —Joseph Raux misquoted.