LZ's progression

For discussing go computing, software announcements, etc.
Tryss
Lives in gote
Posts: 502
Joined: Tue May 24, 2011 1:07 pm
Rank: KGS 2k
GD Posts: 100
KGS: Tryss
Has thanked: 1 time
Been thanked: 153 times

Re: LZ's progression

Post by Tryss »

Has LZ also built up a model of the game for itself? Has AlphaGo? I'm confused as to the AI aspect. I understand how it uses MTCS and NN to solve the computation problem, but there's no AI in there, is it?
If you want to simplify how AlphaGo/LZ works, it's kinda like this :

There is the intuitive part of LZ brain : the neural network. LZ see a position, and her intuition (the neural network) give her a list of candidate moves and a feeling of who's ahead.

And there is the reading process : the Monte Carlo search (even if it's not really Monte Carlo anymore, because there is no rollouts). LZ read the most promising moves, and use her intuition to evaluate the position

Her intuition (the neural network) is trained by feeding her millions of self-play games by previous versions of herself, she's told the result of these positions, and her intuition learn what's good (= what's work), and what's bad (what doesn't). And that's how her intuition get better over time.


Now, what's inside the neural network is quite mysterious, but that's not specific to go. It's a "problem" with all deep neural networks. You can train a network to tell if there's a dog in the picture with really high accuracy, but how exactly the neural network recognise the dog is not well understood.
User avatar
Knotwilg
Oza
Posts: 2432
Joined: Fri Jan 14, 2011 6:53 am
Rank: KGS 2d OGS 1d Fox 4d
GD Posts: 0
KGS: Artevelde
OGS: Knotwilg
Online playing schedule: UTC 18:00 - 22:00
Location: Ghent, Belgium
Has thanked: 360 times
Been thanked: 1021 times
Contact:

Re: LZ's progression

Post by Knotwilg »

Tryss wrote: And there is the reading process : the Monte Carlo search (even if it's not really Monte Carlo anymore, because there is no rollouts).
OK, now that's confusing: I thought I had to interpret the lower number in Lizzie as "plies", and these plies represent complete rollouts. So now I guess they are not full but partial rollouts, and there's a higher level evaluation than the score.
Tryss wrote:
Her intuition (the neural network) is trained by feeding her millions of self-play games by previous versions of herself, she's told the result of these positions, and her intuition learn what's good (= what's work), and what's bad (what doesn't). And that's how her intuition get better over time.

Now, what's inside the neural network is quite mysterious, but that's not specific to go. It's a "problem" with all deep neural networks. You can train a network to tell if there's a dog in the picture with really high accuracy, but how exactly the neural network recognise the dog is not well understood.
OK. So it's AI after all, not merely an inventive way to speed up reading. Only we get no insight in the "model" used for deciding on either the candidates (explore) or the evaluation of the plie (exploit)
User avatar
yakcyll
Dies with sente
Posts: 77
Joined: Thu Apr 19, 2018 6:40 am
Rank: EGF 3k
GD Posts: 0
Universal go server handle: yakcyll
Location: Warsaw, PL
Has thanked: 165 times
Been thanked: 18 times
Contact:

Re: LZ's progression

Post by yakcyll »

Knotwilg wrote:OK. So it's AI after all, not merely an inventive way to speed up reading. Only we get no insight in the "model" used for deciding on either the candidates (explore) or the evaluation of the plie (exploit)
It's not a way to speed up reading, but rather to use prior reading experience to its advantage. Another way, I think a more precise one, to think about the 'intelligence' or the 'intuition' part of a bot is that what it does is not selecting moves based on however 'feeling' could by applied to a program, but rather based on that experience (one could argue that's what intelligence is, but let's avoid that for now). Outside of training, in order to skip the MC search, it employs what's called a value network, which is a neural network used to evaluate positions.
Game tree is searched in simulations composed from 4 phases:
  • Selection — simulation traverses tree by selecting edges with maximum action value Q (how good this move is).
  • Expansion — if any node is expanded, it is processed once by SL (Supervised Learning) policy network to get prior probabilities for each legal action.
  • Evaluation — each node is evaluated by value network and by FR (Fast Rollout) policy.
  • Backup — action values Q are updated by values collected during evaluation step.
I recommend this article, it describes how AG works pretty well. Basically, there's no set of rules or knowledge it applies, directly or indirectly; that's our thing. The bot merely collects the data about board positions during learning and formats it so that it can utilize the experience quickly, on the fly - in the form of synaptic weights.
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

Re: LZ's progression

Post by Vargo »

moha wrote:Thanks, this will be interesting. I never saw a statistically significant 40b vs 15b match at more realistic time controls
Here it is :
20 game match between LZ0.15 #157 and #176
--visits=3201 for #176
--visits=12801 for #157
which amounts to approximately time parity (average of 3.03s/move for #176 and 3.4s/move for #157)
no pondering, twogtp V1.4.10, 2x1080Ti
Average game length : 256 moves

#176 wins 13:7
(65% , all games by resignation, 8 wins as W, 5 as B)

Even if 20 games is not enough, it seems you were right for the longer time settings ;-)
157isW.zip
(9.82 KiB) Downloaded 652 times
157isB.zip
(9.62 KiB) Downloaded 633 times
Gomoto
Gosei
Posts: 1733
Joined: Sun Nov 06, 2016 6:56 am
GD Posts: 0
Location: Earth
Has thanked: 621 times
Been thanked: 310 times

Re: LZ's progression

Post by Gomoto »

I recommend this article, it describes how AG works pretty well. Basically, there's no set of rules or knowledge it applies, directly or indirectly; that's our thing. The bot merely collects the data about board positions during learning and formats it so that it can utilize the experience quickly, on the fly - in the form of synaptic weights.
And now please explain how we humans use rules or knowledge to recognize for example an image.

Indeed there is no difference to the bots our brain merely collects the data during learning and formats it so that it can utilize the experience quickly, on the fly - in the form of synaptic ...

It is not that easy to define the difference.
Gomoto
Gosei
Posts: 1733
Joined: Sun Nov 06, 2016 6:56 am
GD Posts: 0
Location: Earth
Has thanked: 621 times
Been thanked: 310 times

Re: LZ's progression

Post by Gomoto »

And while there are no explicit rules in a neural network, we can check the data like sorin and find "rules" the AI adheres to. For example the josekis and moves it prefers in specific configurations.
moha
Lives in gote
Posts: 311
Joined: Wed May 31, 2017 6:49 am
Rank: 2d
GD Posts: 0
Been thanked: 45 times

Re: LZ's progression

Post by moha »

Vargo wrote:Even if 20 games is not enough, it seems you were right for the longer time settings ;-)
Thanks, nice to see 40b win at last. This may also answer your earlier question (why official/elo tests are not at "time parity" - no consistent meaning):
Vargo wrote:40 games between #157 and #176.
Time parity, 5 min per game, GPU: 1x1080, komi 7.5, no pondering.
#157 wins 29:11 (17 wins as W, 12 wins as B)
Vargo wrote:20 game match between LZ0.15 #157 and #176
--visits=3201 for #176
--visits=12801 for #157
which amounts to approximately time parity (average of 3.03s/move for #176 and 3.4s/move for #157)
#176 wins 13:7 (65% , all games by resignation, 8 wins as W, 5 as B)
nbc44
Dies in gote
Posts: 50
Joined: Sat Sep 15, 2018 2:34 am
GD Posts: 0
Been thanked: 3 times

Re: LZ's progression

Post by nbc44 »

Vargo wrote: Here it is :
20 game match between LZ0.15 #157 and #176
--visits=3201 for #176
--visits=12801 for #157
which amounts to approximately time parity (average of 3.03s/move for #176 and 3.4s/move for #157)
no pondering, twogtp V1.4.10, 2x1080Ti
Average game length : 256 moves

#176 wins 13:7
(65% , all games by resignation, 8 wins as W, 5 as B)

Even if 20 games is not enough, it seems you were right for the longer time settings ;-)
My test (l0 v15 #157 vs #176, still in progress) :

Code: Select all

C:\APPS\l0gpu\validation.exe -k 157-176 -b C:\APPS\l0gpu\leelaz -n C:\APPS\net\d351f06e.gz -o "-g -v 12801 --gpu 0 --gpu 1 --noponder -t 12 -q -d -r 5 --timemanage off -w" -b C:\APPS\l0gpu\leelaz -n C:\APPS\net\dabff367.gz -o "-g -v 3201 --gpu 0 --gpu 1 --noponder -t 12 -q -d -r 5 --timemanage off -w"

Code: Select all

Stopping engine.
25 wins, 15 losses
40 games played.
Status: 0 LLR 0.821218 Lower Bound -2.94444 Upper Bound 2.94444
P.S. If someone wants the games, I can upload them (after the end of the test).
explo
Dies with sente
Posts: 108
Joined: Wed Apr 21, 2010 8:07 am
Rank: FFG 1d
GD Posts: 0
Location: France
Has thanked: 14 times
Been thanked: 18 times

Re: LZ's progression

Post by explo »

Who won the 25 games?
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

Re: LZ's progression

Post by Vargo »

There's a new 256x40b network (#177), a good occasion to see if the result of the last match (157 v 177) still holds.

20 game match between LZ0.15 #157 and #177
--visits=3201 for #177
--visits=12801 for #157
approximately time parity (#157 takes a little more time)
no pondering, twogtp V1.4.10, 2x1080Ti

It's a draw 10:10
(all games by resignation)
So, not as good a result as the last match, but a confirmation that the new networks have caught up with the old 20b (given enough time)
177isW.zip
(9.56 KiB) Downloaded 637 times
177isB.zip
(9.32 KiB) Downloaded 647 times
nbc44 wrote:My test (l0 v15 #157 vs #176, still in progress) :
Happy to see someone else run matches, thanks ! Looking forward to the final result :)
nbc44
Dies in gote
Posts: 50
Joined: Sat Sep 15, 2018 2:34 am
GD Posts: 0
Been thanked: 3 times

Re: LZ's progression

Post by nbc44 »

explo wrote:Who won the 25 games?
#157
moha
Lives in gote
Posts: 311
Joined: Wed May 31, 2017 6:49 am
Rank: 2d
GD Posts: 0
Been thanked: 45 times

Re: LZ's progression

Post by moha »

Vargo wrote:It's a draw 10:10 (all games by resignation)
So, not as good a result as the last match, but a confirmation that the new networks have caught up with the old 20b (given enough time)
Depends on what time is "enough" time. :) (I guess you meant old 15b.) Allowing 6 sec instead of 3 for example, 6400 visits instead of 3200 would likely shift the score some percents in 40b's favor (random variance aside), and so on with even more time.

These scaling effects are the heart of the problem. A more practical question is how much visits would a user get in daily use (on which hardware?) when analysing his games.
nbc44
Dies in gote
Posts: 50
Joined: Sat Sep 15, 2018 2:34 am
GD Posts: 0
Been thanked: 3 times

Re: LZ's progression

Post by nbc44 »

Vargo wrote:Looking forward to the final result :)
Nothing interesting right now :D :

Code: Select all

68 wins, 46 losses
114 games played.
Status: 0 LLR 1.64871 Lower Bound -2.94444 Upper Bound 2.94444
P.S. I think 12801 visits is too big for this test.
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

Re: LZ's progression

Post by Vargo »

moha wrote:I guess you meant old 15b
Yes, 20b networks are for Lc0 , Leela Chess Zero is similar to LZ (description HERE)
The Computer Chess Championship is going on these days HERE, and Lc0 is doing particularly well.
moha wrote:how much visits would a user get in daily use (on which hardware?) when analysing his games
To get 3200 visits (#177) or 12800 visits (#157), with 2x1080Ti, it's around 3 sec/move. For one dedicated GPU, maybe from 5-6 sec for one 1080Ti to 15-20sec (?)
explo
Dies with sente
Posts: 108
Joined: Wed Apr 21, 2010 8:07 am
Rank: FFG 1d
GD Posts: 0
Location: France
Has thanked: 14 times
Been thanked: 18 times

Re: LZ's progression

Post by explo »

Based on using lizzie, I need around a minute to get 3200 visits on a 40b network. I have a GTX 1050 which I guess is better than what most go players have. Right now most people should rather use #157 if they want to briefly review a game and identify mistakes.
Post Reply