It is currently Fri Jan 17, 2020 7:11 am

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 8 posts ] 
Author Message
Offline
 Post subject: Widening Leela Zero's search (nb: contains many images)
Post #1 Posted: Sun Oct 13, 2019 3:26 am 
Lives with ko

Posts: 287
Location: Adelaide, South Australia
Liked others: 109
Was liked: 145
Rank: Australian 2 dan
GD Posts: 200
Starting a new thread so as not to completely hijack the other conversation...

The challenge is to broaden Leela Zero's search to evaluate not just the best move, but also the near misses and plausible alternatives. A broader search will slightly weaken the playing strength but might make it easier to use as an analysis tool. Of course you can get those evaluations by clicking around in the interface (that was the point of the other conversation), but it could be easier.

My suggestion was to give ten visits (or 20, or 100) to every legal move before starting the usual Monte Carlo tree search. (An alternative approach has been tried, focussing on the top four moves: see Uberdude's post here.) It's not too hard to change the LZ code to do that: it takes about 30 extra lines. My implementation is here: experimental software, use at own risk!

So first of all, it's quite pretty to watch in action! In Lizzie, I've changed the "min playout ratio for stats" parameter from 0.1 to 0.01. Here's a 20-second animation of the first 6,000 or so playouts.
Attachment:
animation_small.gif
animation_small.gif [ 1.61 MiB | Viewed 601 times ]

(Apologies for the small image size. I'm waiting for advice on how to post a better picture here.)

So trying out on some real positions, the results aren't all that dramatic. Often it will explore two or three other moves that would otherwise have been ignored -- but when you let it run a bit longer, the "extra" moves disappear, and you end up with the same answers you would have got anyway. I guess this shows that LZ's method of filtering out suboptimal moves is actually doing a pretty good job! So far I've got the most interesting results by watching Lizzie in real time and pausing just as the dust clears, so to speak (i.e. when the animation above changes from hundreds of candidates to just a handful). And slightly older networks seem to be more "open minded" than the newer, stronger ones.

Some examples on a position I've been spending a bit of time with lately:

Click Here To Show Diagram Code
[go]$$c 501 opening problems, problem 1: Black to play
$$ ---------------------------------------
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . e . O . . X . . . . . . . . . |
$$ | . . . O . . . . . , . . . . X , X . . |
$$ | . . O . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . O . . . . . , . . . . . b a . . |
$$ | . . . . . . . . . . . . . . . d c . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . X . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . , . . . . . , . . . . . , O . . |
$$ | . . X . . . . . X . . f . O . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ ---------------------------------------[/go]


Last time I looked at this, most engines would pick two or three of 'a' through 'f' for analysis and would ignore everything else. With my new "LZ-minvisits", do we get any more variety?

  • GX-47 unmodified explores a, b, c and e
  • GX-47 + 10 visits for all moves explores all of a-f and one other option
  • LZ-242 explores 7 different options
  • LZ-242 + 10 explores 9 different options after 5,000 playouts. Using +30 visits instead of +10 actually narrows the results slightly (one option disappears)
  • LZ-157 unmodified is already exploring 13 different moves!
  • LZ-157 + 10 looks at 22 moves (and again drops a few options given more playouts)

More screen shots from Lizzie:
GX-47 with 33k playouts
Attachment:
501OP-1-GX47-33691po_cropped.jpg
501OP-1-GX47-33691po_cropped.jpg [ 79.59 KiB | Viewed 601 times ]

GX-47 +10 with 33k playouts
Attachment:
501OP-1-GX47+10-32841po_cropped.jpg
501OP-1-GX47+10-32841po_cropped.jpg [ 86.43 KiB | Viewed 601 times ]



Hmm, limited to three images per post by the looks of it. More to come soon.


This post by xela was liked by: Bill Spight
Top
 Profile  
 
Offline
 Post subject: Re: Widening Leela Zero's search (nb: contains many images)
Post #2 Posted: Sun Oct 13, 2019 3:29 am 
Lives with ko

Posts: 287
Location: Adelaide, South Australia
Liked others: 109
Was liked: 145
Rank: Australian 2 dan
GD Posts: 200
LZ-242 in action with the extra visits: note that 30 visits gives "worse" results than 10.
Unmodified, 5k playouts
Attachment:
501OP-1-LZ242-5224po_cropped.jpg
501OP-1-LZ242-5224po_cropped.jpg [ 86.48 KiB | Viewed 599 times ]

+10 with 5k playouts
Attachment:
501OP-1-LZ242+10-5089po_cropped.jpg
501OP-1-LZ242+10-5089po_cropped.jpg [ 89.22 KiB | Viewed 599 times ]

+30 with 14k playouts
Attachment:
501OP-1-LZ242+30-14428po_cropped.jpg
501OP-1-LZ242+30-14428po_cropped.jpg [ 91.51 KiB | Viewed 599 times ]

Top
 Profile  
 
Offline
 Post subject: Re: Widening Leela Zero's search (nb: contains many images)
Post #3 Posted: Sun Oct 13, 2019 3:32 am 
Lives with ko

Posts: 287
Location: Adelaide, South Australia
Liked others: 109
Was liked: 145
Rank: Australian 2 dan
GD Posts: 200
And LZ-157 with the extra visits. Here I've stayed with 10 visits, and shown how some of the options disappear between 5k and 30k playouts.
LZ-157 with 20k playouts
Attachment:
501OP-1-LZ157-20494po_cropped.jpg
501OP-1-LZ157-20494po_cropped.jpg [ 82.5 KiB | Viewed 599 times ]

LZ-157 +10 with 5k playouts
Attachment:
501OP-1-LZ242+10-5089po_cropped.jpg
501OP-1-LZ242+10-5089po_cropped.jpg [ 89.22 KiB | Viewed 599 times ]

LZ-157 +10 with 30k playouts
Attachment:
501OP-1-LZ157+10-30653po_cropped.jpg
501OP-1-LZ157+10-30653po_cropped.jpg [ 87.61 KiB | Viewed 599 times ]


This post by xela was liked by: Bill Spight
Top
 Profile  
 
Offline
 Post subject: Re: Widening Leela Zero's search (nb: contains many images)
Post #4 Posted: Sun Oct 13, 2019 12:28 pm 
Dies with sente

Posts: 74
Liked others: 0
Was liked: 23
xela wrote:
My suggestion was to give ten visits (or 20, or 100) to every legal move before starting the usual Monte Carlo tree search.
...
Often it will explore two or three other moves that would otherwise have been ignored -- but when you let it run a bit longer, the "extra" moves disappear, and you end up with the same answers you would have got anyway. I guess this shows that LZ's method of filtering out suboptimal moves is actually doing a pretty good job!

This may also be a sign of a potential problem with your approach. LZ relies strongly on its policy to direct further search, even when (from a human viewpoint) it has sufficient and much more reliable (since original policy = single net lookup) data about the value of a certain move (from the move's first visits).

Weakening the policy as visits accumulate is something that is currently under experimentation, but AFAIK plain old LZ will not necessarily make enough use of the values established by those first 10 visits (for low policy moves), since it tend to visit high policy moves and not high value moves (until late in search).

Top
 Profile  
 
Online
 Post subject: Re: Widening Leela Zero's search (nb: contains many images)
Post #5 Posted: Mon Oct 14, 2019 5:24 am 
Gosei
User avatar

Posts: 1503
Location: Ghent, Belgium
Liked others: 217
Was liked: 672
Rank: KGS 1d OGS 1d Fox 4d
KGS: Artevelde
OGS: Knotwilg
jann wrote:
xela wrote:
My suggestion was to give ten visits (or 20, or 100) to every legal move before starting the usual Monte Carlo tree search.
...
Often it will explore two or three other moves that would otherwise have been ignored -- but when you let it run a bit longer, the "extra" moves disappear, and you end up with the same answers you would have got anyway. I guess this shows that LZ's method of filtering out suboptimal moves is actually doing a pretty good job!

This may also be a sign of a potential problem with your approach. LZ relies strongly on its policy to direct further search, even when (from a human viewpoint) it has sufficient and much more reliable (since original policy = single net lookup) data about the value of a certain move (from the move's first visits).

Weakening the policy as visits accumulate is something that is currently under experimentation, but AFAIK plain old LZ will not necessarily make enough use of the values established by those first 10 visits (for low policy moves), since it tend to visit high policy moves and not high value moves (until late in search).


OK, I'm a slow learner on the whole AI revolution, so let me get this straight by rewording the short history as I know it.

1st gen AI (AlphaGo) was trained on human expert games, therefore included a bias for move candidates based on human expert bias
2nd gen AI (AlphaZero) was trained on the rules only and got rid of that bias. It exclusively relied on move value (win percentage) and reinforcement by exploring high value moves more (number of visits)
3rd gen AI (LZ et al) is no longer solely relying on semi-brute force techniques known as Monte Carlo Tree Search, having integrated lessons learnt from stage 2. I guess that's what you refer to as a policy. It's the return of bias, but no longer human expert bias.

That undermines a thought I had about AI: that it only speaks in sequences and there is no other way to articulate its language than replicating the sequences, adorned with human language to link it to human concepts.

If there's such a "policy", or a "bias", it must have a form, one that can be articulated by something else than sequences. Like "if there is a single stone on 4-4 in a quadrant with few other stones, explore 3-3 invasion".

Where is that policy hiding? Is it not possible to articulate it?
Is my rendition of AI history correct?

Top
 Profile  
 
Offline
 Post subject: Re: Widening Leela Zero's search (nb: contains many images)
Post #6 Posted: Mon Oct 14, 2019 5:37 am 
Lives with ko

Posts: 287
Location: Adelaide, South Australia
Liked others: 109
Was liked: 145
Rank: Australian 2 dan
GD Posts: 200
No, the policy network and the value network have been around since the first AlphaGo, the changes have mostly been in the method of training those two networks. But yes, "it speaks only in sequences" isn't quite true, there are ways to peer inside the brain.

For neural networks trained on images, there's been some interesting work on how to visualise the different layers of the network, so we can say that these neurons are recognising straight lines, those are recognising curves or shadows, these other ones are recognising arms and legs, and so on.

In go, I don't think we've figured out yet which parts of the network are recognising "corner enclosure" or "invasion" or "influence", but I think it's possible to go further in that direction. Lizzie can already display the policy network numbers for those moves that have been searched, so you can see the difference between the "at a glance" evaluation and the result of looking more deeply into a move.

Top
 Profile  
 
Offline
 Post subject: Re: Widening Leela Zero's search (nb: contains many images)
Post #7 Posted: Mon Oct 14, 2019 6:51 am 
Honinbo

Posts: 9328
Liked others: 2894
Was liked: 3135
xela wrote:
For neural networks trained on images, there's been some interesting work on how to visualise the different layers of the network, so we can say that these neurons are recognising straight lines, those are recognising curves or shadows, these other ones are recognising arms and legs, and so on.


Going back a decade or so, go researchers were talking about common destiny graphs (if I remember the terminology correctly) that distinguished rookwise connected stones of the same color and other groups that stood or fell together. If certain neurons trained on images can recognize straight lines and so on, surely certain neurons trained on go games can recognize rookwise connected stones, etc. :)

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Top
 Profile  
 
Offline
 Post subject: Re: Widening Leela Zero's search (nb: contains many images)
Post #8 Posted: Mon Oct 14, 2019 10:31 am 
Dies with sente

Posts: 74
Liked others: 0
Was liked: 23
Knotwilg wrote:
Where is that policy hiding? Is it not possible to articulate it?
Is my rendition of AI history correct?

Most today's bots ask their NNs two questions when facing a position: which moves are most likely best here (policy), and which side is ahead (value). Over these NN answers a search method is built, which does a non-exhaustive search, mostly looking at moves currently appear promising everywhere (and choosing the most visited top move later).

What's promising is, however, not easy to decide. During the search, when a position is encountered for the first time this depends entirely on NN policy. When a move received some visits, the average value from earlier visits is ALSO taken into account.

As for bot history, there are not that big theoretical differences between generations, particularly not for the search method (eg. policy and value were originally two different networks, which later changed into two separate outputs for a single network - for great practical gains but theoretically still the same). The significance of the "bias" from starting on human games is, at best, debatable - even the original AG did a lot of selfplay training, not to mention the Master version. AGZ's significance was to prove that it is possible to learn from selfplay ONLY, without starting from human knowledge. And there is not much theoretical difference between LZ and AGZ - LZ basically aims to be a minimalistic rewrite of AGZ.


This post by jann was liked by: jptavan
Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: Google [Bot] and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group