LZ's progression

Tryss · Post by **Tryss** » Thu Sep 13, 2018 5:47 am

Has LZ also built up a model of the game for itself? Has AlphaGo? I'm confused as to the AI aspect. I understand how it uses MTCS and NN to solve the computation problem, but there's no AI in there, is it?

If you want to simplify how AlphaGo/LZ works, it's kinda like this :

There is the intuitive part of LZ brain : the neural network. LZ see a position, and her intuition (the neural network) give her a list of candidate moves and a feeling of who's ahead.

And there is the reading process : the Monte Carlo search (even if it's not really Monte Carlo anymore, because there is no rollouts). LZ read the most promising moves, and use her intuition to evaluate the position

Her intuition (the neural network) is trained by feeding her millions of self-play games by previous versions of herself, she's told the result of these positions, and her intuition learn what's good (= what's work), and what's bad (what doesn't). And that's how her intuition get better over time.

Now, what's inside the neural network is quite mysterious, but that's not specific to go. It's a "problem" with all deep neural networks. You can train a network to tell if there's a dog in the picture with really high accuracy, but how exactly the neural network recognise the dog is not well understood.

Knotwilg · Post by **Knotwilg** » Thu Sep 13, 2018 7:40 am

Tryss wrote: And there is the reading process : the Monte Carlo search (even if it's not really Monte Carlo anymore, because there is no rollouts).

OK, now that's confusing: I thought I had to interpret the lower number in Lizzie as "plies", and these plies represent complete rollouts. So now I guess they are not full but partial rollouts, and there's a higher level evaluation than the score.

Tryss wrote:
Her intuition (the neural network) is trained by feeding her millions of self-play games by previous versions of herself, she's told the result of these positions, and her intuition learn what's good (= what's work), and what's bad (what doesn't). And that's how her intuition get better over time.

Now, what's inside the neural network is quite mysterious, but that's not specific to go. It's a "problem" with all deep neural networks. You can train a network to tell if there's a dog in the picture with really high accuracy, but how exactly the neural network recognise the dog is not well understood.

OK. So it's AI after all, not merely an inventive way to speed up reading. Only we get no insight in the "model" used for deciding on either the candidates (explore) or the evaluation of the plie (exploit)

yakcyll · Post by **yakcyll** » Thu Sep 13, 2018 8:01 am

Knotwilg wrote:OK. So it's AI after all, not merely an inventive way to speed up reading. Only we get no insight in the "model" used for deciding on either the candidates (explore) or the evaluation of the plie (exploit)

It's not a way to speed up reading, but rather to use prior reading experience to its advantage. Another way, I think a more precise one, to think about the 'intelligence' or the 'intuition' part of a bot is that what it does is not selecting moves based on however 'feeling' could by applied to a program, but rather based on that experience (one could argue that's what intelligence is, but let's avoid that for now). Outside of training, in order to skip the MC search, it employs what's called a value network, which is a neural network used to evaluate positions.

Game tree is searched in simulations composed from 4 phases:
Selection — simulation traverses tree by selecting edges with maximum action value Q (how good this move is).

Expansion — if any node is expanded, it is processed once by SL (Supervised Learning) policy network to get prior probabilities for each legal action.

Evaluation — each node is evaluated by value network and by FR (Fast Rollout) policy.

Backup — action values Q are updated by values collected during evaluation step.

I recommend this article, it describes how AG works pretty well. Basically, there's no set of rules or knowledge it applies, directly or indirectly; that's our thing. The bot merely collects the data about board positions during learning and formats it so that it can utilize the experience quickly, on the fly - in the form of synaptic weights.

Vargo · Post by **Vargo** » Thu Sep 13, 2018 8:57 am

moha wrote:Thanks, this will be interesting. I never saw a statistically significant 40b vs 15b match at more realistic time controls

Here it is :
20 game match between LZ0.15 #157 and #176
--visits=3201 for #176
--visits=12801 for #157
which amounts to approximately time parity (average of 3.03s/move for #176 and 3.4s/move for #157)
no pondering, twogtp V1.4.10, 2x1080Ti
Average game length : 256 moves

#176 wins 13:7 (65% , all games by resignation, 8 wins as W, 5 as B)

Even if 20 games is not enough, it seems you were right for the longer time settings

157isW.zip: (9.82 KiB) Downloaded 699 times

157isB.zip: (9.62 KiB) Downloaded 683 times

Gomoto · Post by **Gomoto** » Thu Sep 13, 2018 9:06 am

I recommend this article, it describes how AG works pretty well. Basically, there's no set of rules or knowledge it applies, directly or indirectly; that's our thing. The bot merely collects the data about board positions during learning and formats it so that it can utilize the experience quickly, on the fly - in the form of synaptic weights.

And now please explain how we humans use rules or knowledge to recognize for example an image.

Indeed there is no difference to the bots our brain merely collects the data during learning and formats it so that it can utilize the experience quickly, on the fly - in the form of synaptic ...

It is not that easy to define the difference.

Gomoto · Post by **Gomoto** » Thu Sep 13, 2018 9:08 am

And while there are no explicit rules in a neural network, we can check the data like sorin and find "rules" the AI adheres to. For example the josekis and moves it prefers in specific configurations.

moha · Post by **moha** » Thu Sep 13, 2018 9:24 am

Vargo wrote:Even if 20 games is not enough, it seems you were right for the longer time settings

Thanks, nice to see 40b win at last. This may also answer your earlier question (why official/elo tests are not at "time parity" - no consistent meaning):

Vargo wrote:40 games between #157 and #176.
Time parity, 5 min per game, GPU: 1x1080, komi 7.5, no pondering.
#157 wins 29:11 (17 wins as W, 12 wins as B)

Vargo wrote:20 game match between LZ0.15 #157 and #176
--visits=3201 for #176
--visits=12801 for #157
which amounts to approximately time parity (average of 3.03s/move for #176 and 3.4s/move for #157)
#176 wins 13:7 (65% , all games by resignation, 8 wins as W, 5 as B)

nbc44 · Post by **nbc44** » Sat Sep 15, 2018 3:02 am

Vargo wrote: Here it is :
20 game match between LZ0.15 #157 and #176
--visits=3201 for #176
--visits=12801 for #157
which amounts to approximately time parity (average of 3.03s/move for #176 and 3.4s/move for #157)
no pondering, twogtp V1.4.10, 2x1080Ti
Average game length : 256 moves

#176 wins 13:7 (65% , all games by resignation, 8 wins as W, 5 as B)

Even if 20 games is not enough, it seems you were right for the longer time settings

My test (l0 v15 #157 vs #176, still in progress) :

Code: Select all

C:\APPS\l0gpu\validation.exe -k 157-176 -b C:\APPS\l0gpu\leelaz -n C:\APPS\net\d351f06e.gz -o "-g -v 12801 --gpu 0 --gpu 1 --noponder -t 12 -q -d -r 5 --timemanage off -w" -b C:\APPS\l0gpu\leelaz -n C:\APPS\net\dabff367.gz -o "-g -v 3201 --gpu 0 --gpu 1 --noponder -t 12 -q -d -r 5 --timemanage off -w"

Code: Select all

Stopping engine.
25 wins, 15 losses
40 games played.
Status: 0 LLR 0.821218 Lower Bound -2.94444 Upper Bound 2.94444

P.S. If someone wants the games, I can upload them (after the end of the test).

explo · Post by **explo** » Sat Sep 15, 2018 11:00 am

Who won the 25 games?

Vargo · Post by **Vargo** » Sat Sep 15, 2018 12:19 pm

There's a new 256x40b network (#177), a good occasion to see if the result of the last match (157 v 177) still holds.

20 game match between LZ0.15 #157 and #177
--visits=3201 for #177
--visits=12801 for #157
approximately time parity (#157 takes a little more time)
no pondering, twogtp V1.4.10, 2x1080Ti

It's a draw 10:10 (all games by resignation)
So, not as good a result as the last match, but a confirmation that the new networks have caught up with the old 20b (given enough time)

177isW.zip: (9.56 KiB) Downloaded 692 times

177isB.zip: (9.32 KiB) Downloaded 703 times

nbc44 wrote:My test (l0 v15 #157 vs #176, still in progress) :

Happy to see someone else run matches, thanks ! Looking forward to the final result

nbc44 · Post by **nbc44** » Sat Sep 15, 2018 12:51 pm

explo wrote:Who won the 25 games?

#157

moha · Post by **moha** » Sat Sep 15, 2018 2:33 pm

Vargo wrote:It's a draw 10:10 (all games by resignation)
So, not as good a result as the last match, but a confirmation that the new networks have caught up with the old 20b (given enough time)

Depends on what time is "enough" time.

(I guess you meant old 15b.) Allowing 6 sec instead of 3 for example, 6400 visits instead of 3200 would likely shift the score some percents in 40b's favor (random variance aside), and so on with even more time.

These scaling effects are the heart of the problem. A more practical question is how much visits would a user get in daily use (on which hardware?) when analysing his games.

nbc44 · Post by **nbc44** » Sun Sep 16, 2018 1:55 am

Vargo wrote:Looking forward to the final result

Nothing interesting right now

:

Code: Select all

68 wins, 46 losses
114 games played.
Status: 0 LLR 1.64871 Lower Bound -2.94444 Upper Bound 2.94444

P.S. I think 12801 visits is too big for this test.

Vargo · Post by **Vargo** » Sun Sep 16, 2018 2:25 am

moha wrote:I guess you meant old 15b

Yes, 20b networks are for Lc0 , Leela Chess Zero is similar to LZ (description HERE)
The Computer Chess Championship is going on these days HERE, and Lc0 is doing particularly well.

moha wrote:how much visits would a user get in daily use (on which hardware?) when analysing his games

To get 3200 visits (#177) or 12800 visits (#157), with 2x1080Ti, it's around 3 sec/move. For one dedicated GPU, maybe from 5-6 sec for one 1080Ti to 15-20sec (?)

explo · Post by **explo** » Sun Sep 16, 2018 3:45 am

Based on using lizzie, I need around a minute to get 3200 visits on a 40b network. I have a GTX 1050 which I guess is better than what most go players have. Right now most people should rather use #157 if they want to briefly review a game and identify mistakes.

Life In 19x19

LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression