It is currently Thu Mar 28, 2024 5:31 am

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 5 posts ] 
Author Message
Offline
 Post subject: Accelerating Self-Play Learning in Go
Post #1 Posted: Thu Feb 28, 2019 8:51 am 
Gosei

Posts: 1590
Liked others: 886
Was liked: 527
Rank: AGA 3k Fox 3d
GD Posts: 61
KGS: dfan
Paper: https://arxiv.org/abs/1902.10565
Code: https://github.com/lightvector/KataGo

Very nice paper by lightvector detailing a lot of his experiments. In particular I'm very happy to see a lot of effort being put into novel methods of maximizing efficient learning rather than primarily duplicating DeepMind's research. Great work!


This post by dfan was liked by 7 people: apetresc, Bill Spight, Elom, ez4u, hyperpape, lightvector, Rémi
Top
 Profile  
 
Offline
 Post subject: Re: Accelerating Self-Play Learning in Go
Post #2 Posted: Thu Feb 28, 2019 10:37 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
Hear, hear! :clap: :salute: :bow: :bow: :bow:

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.


This post by Bill Spight was liked by: lightvector
Top
 Profile  
 
Offline
 Post subject: Re: Accelerating Self-Play Learning in Go
Post #3 Posted: Thu Feb 28, 2019 9:25 pm 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
By the way, I have not done any GUI work or anything, but if any devs on the GUI side are interested, KataGo is a GTP engine that tracks its belief about the expected score difference rather than only winrate, which I hear is a pretty popular feature request among Go players... :) .

There's not currently a mechanism by which it reports that value over GTP (only dumping it into a log file), but it would be easy for me to add one if I knew what way to output it for some GUI that wanted to be able to display it.

In high handicap games like this one or this one, the utility for attempting to improve score is actually for a long time the sole force driving the search and the selection of moves beyond merely the policy prior, as the winning chance estimation remains solidly < 1% and doesn't distinguish between any moves until the game actually starts to become close. I have some doubts about whether invading 3-3 so much in high-handicap games is really such a good choice, but otherwise at least it does seem to play strong moves generally even when "objectively" dead lost.

Top
 Profile  
 
Offline
 Post subject: Re: Accelerating Self-Play Learning in Go
Post #4 Posted: Thu Feb 28, 2019 10:01 pm 
Lives with ko

Posts: 259
Liked others: 46
Was liked: 116
Rank: 2d
lightvector wrote:
By the way, I have not done any GUI work or anything, but if any devs on the GUI side are interested, KataGo is a GTP engine that tracks its belief about the expected score difference rather than only winrate, which I hear is a pretty popular feature request among Go players... :) .

There's not currently a mechanism by which it reports that value over GTP (only dumping it into a log file), but it would be easy for me to add one if I knew what way to output it for some GUI that wanted to be able to display it.

For q5go, it would be nice to have a variant of the lz-analyze command which produces the same kind of information as Leela Zero does, plus one extra field with the expected score. I could use "known_command kata-analyze" first to determine which of the two variants to use.

(edit) Come to think of it, you could annotate the self-play games with the standard SGF V[] property, which is defined as the estimated score.

I managed to build it here, and it seems to work fine. Your CUDA requirements seem too high: I have CUDA 9.0.176 and cudnn-7.1. There were some ptx warnings about an experimental feature, but self-play produces reasonable results so I assume it's working.

I might send you some patches later that I needed to make the cmake setup work for me.

Awesome project! Now we just need to crowdsource a run with a few million games.

Top
 Profile  
 
Offline
 Post subject: Re: Accelerating Self-Play Learning in Go
Post #5 Posted: Sat Mar 09, 2019 7:40 am 
Lives in sente

Posts: 827
Location: UK
Liked others: 568
Was liked: 84
Rank: OGS 9kyu
Universal go server handle: WindnWater, Elom
Wow.

_________________
On Go proverbs:
"A fine Gotation is a diamond in the hand of a dan of wit and a pebble in the hand of a kyu" —Joseph Raux misquoted.

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group