LZ's progression

For discussing go computing, software announcements, etc.
nbc44
Dies in gote
Posts: 50
Joined: Sat Sep 15, 2018 2:34 am
GD Posts: 0
Been thanked: 3 times

Re: LZ's progression

Post by nbc44 »

Yes, you are right. All these tests are just rubbish in terms of mathematical statistics. And poor Lee Sedol still has chances to defeat Alfago :).
And you are right again:
hoa803 wrote:It's still fun to do the matches, though. :)
hoa803
Beginner
Posts: 19
Joined: Tue Apr 02, 2019 7:12 pm
GD Posts: 0
Been thanked: 2 times

Re: LZ's progression

Post by hoa803 »

I don't know what they did in the new version of Leela but my Gflops pretty much doubled. I'm looking at an RTX 2060 for gaming and deep learning. Anybody try one and have a benchmark?
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

Re: LZ's progression

Post by Vargo »

hoa803 wrote:The thing nobody seems to be talking about in this thread is the confidence interval.
nbc44 wrote:All these tests are just rubbish
All these 20, 30 ... game matches aren't gospel, obviously. Match parameters vary wildly (different gpus, different time per game, different number of visits, usage of -m, -r, etc. etc.)
No one thinks #XXX is definitively stronger than elfv2 just because XXX won a single 20 game match by 11-9 (for example)

But all these matches are really fun to run, and I think, taken as a whole, they can give an idea of the strength of the different networks.

They're not gospel, but they're not rubbish either, even with so few as 20 games.

For example
20 game match XXX v. YYY , result 15-5
If XXX and YYY were the same strength, there's only 2% chance for XXX to get at least 15 wins. It's not unreasonable to think that XXX is stronger, (particularly if there are several such matches going the same way).
hoa803
Beginner
Posts: 19
Joined: Tue Apr 02, 2019 7:12 pm
GD Posts: 0
Been thanked: 2 times

Re: LZ's progression

Post by hoa803 »

Vargo wrote:But all these matches are really fun to run, and I think, taken as a whole, they can give an idea of the strength of the different networks.

They're not gospel, but they're not rubbish either, even with so few as 20 games.

For example
20 game match XXX v. YYY , result 15-5
If XXX and YYY were the same strength, there's only 2% chance for XXX to get at least 15 wins. It's not unreasonable to think that XXX is stronger, (particularly if there are several such matches going the same way).
Yeah, I definitely agree with that. I ran something a few days ago, I believe it was LZ216 vs Elfv2, using Leela release 0.17 on my GTX 1060 6GB at CGOS rules, 15 minutes a side absolute. After 52 games the match was tied 26 apiece.

Seems like whatever blip caused some of the LZ nets to be significantly weaker than Elf has gone away. I wish I had better hardware to run out more visits and/or more games, but if I did that I'd have less statistical significance. Still, the 0.17 version of Leela gets ~2600 Gflops on my gpu, which results in a decent number of visits per move at that time control.

I will say that one thing I disagree with is adding any of the self play randomness params to matches that ostensibly compare engine strength. I feel like the main value there is the games are perhaps more interesting to watch. However, I think any of the programmers on Github would agree it shouldn't be used in a "match" situation. Although, I haven't seen anybody discuss such a thing outside of training, so who knows.
splee99
Dies with sente
Posts: 101
Joined: Thu Nov 15, 2012 9:46 pm
Rank: KGS 2 D
GD Posts: 0
Has thanked: 2 times
Been thanked: 16 times

Re: LZ's progression

Post by splee99 »

hoa803 wrote:
Yeah, I definitely agree with that. I ran something a few days ago, I believe it was LZ216 vs Elfv2, using Leela release 0.17 on my GTX 1060 6GB at CGOS rules, 15 minutes a side absolute. After 52 games the match was tied 26 apiece.
I'm just curious. In those 52 games, was ELFv2 running on LZ 0.16 or 0.17?
hoa803
Beginner
Posts: 19
Joined: Tue Apr 02, 2019 7:12 pm
GD Posts: 0
Been thanked: 2 times

Re: LZ's progression

Post by hoa803 »

Should have been 0.17. I would have more info but I screwed up my command line and didn't save any of the games, which is very frustrating. Validate prints out XX-XX win/ loss after each game, so I'm basing it on that alone. I need to rerun to confirm at some point. I'd probably just use the latest network.

Right now I'm just helping train the AI rather than running matches.

Edit: I messed with it after work today. Turns out the -k statement to save games must be placed towards the beginning of the command line with 0.17. Or at least, it started saving the games when I moved it from the end to directly after validation.exe statement.

Edit 2: I'm currently running lz219 vs elfv2 at 30 seconds a move. I think I like that better than absolute time for a comparison, because both engines will be much stronger reading the full 30 seconds each move (and subsequent moves).
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

Re: LZ's progression

Post by Vargo »

New network #220 :bow:
Even if regular LZ(017) is not really designed for handicap games, it can play nice H games.

H3 game with komi 7.5 : Crazy Stone DL (5 Dan) v. LZ017#220
(4s/move for LZ, and -r 1 to avoid resigning too soon, laptop with gtx 965)
CS and LZ(Sabaki) don't agree on the final score (W+7.5 and W+4.5) I suppose the 3 points difference comes from the 3 handicap stones. Maybe my settings are wrong somehow ? If someone knows, thx...

Settings :
settings.jpg
settings.jpg (114.83 KiB) Viewed 12136 times
counting...
territory.jpg
territory.jpg (220.66 KiB) Viewed 12136 times
--> the game
______________________________________________________
And
Gosei
Posts: 1464
Joined: Tue Sep 25, 2018 10:28 am
GD Posts: 0
Has thanked: 212 times
Been thanked: 215 times

Re: LZ's progression

Post by And »

interesting. zen shows w+4.5
User avatar
jlt
Gosei
Posts: 1786
Joined: Wed Dec 14, 2016 3:59 am
GD Posts: 0
Has thanked: 185 times
Been thanked: 495 times

Re: LZ's progression

Post by jlt »

LeelaZero counted the score as territory + prisoners (+komi for White)

Crazystone counted 182 points for Black, which corresponds to (black living stones)+(black territory).
My guess is that the 189.5 points for White correspond to (white living stones)+(white territory)+(komi)+(number of handicap stones).
And
Gosei
Posts: 1464
Joined: Tue Sep 25, 2018 10:28 am
GD Posts: 0
Has thanked: 212 times
Been thanked: 215 times

Re: LZ's progression

Post by And »

H3 game with komi 7.5 : Crazy Stone DL (5 Dan) - zen 7 (5sec), zen win, score CS W+35.5, zen and sabaki W+34.5. Crazy Stone possibly mistaken?
ps I figured it out: CS shows area + handicap, and sabaki and zen - territory
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

Re: LZ's progression

Post by Vargo »

And wrote:CS shows area + handicap, and sabaki and zen - territory
jlt wrote:LeelaZero counted the score as territory + prisoners (+komi for White)
Crazystone counted 182 points for Black, which corresponds to (black living stones)+(black territory)
Thx :tmbup:


49 games at H2…H9 played byLZ017#10…#190 v. LZ017#220

time parity, 2 sec/move, 1x 1080, official LZ0.17#220 v.#xxx komi 7.5 , -r 1 for W (to avoid resigning too soon) –r 30 for B (to avoid very long games)
1.jpg
1.jpg (111.02 KiB) Viewed 13268 times
It gives an idea of the handicap skills of #220.

In the interesting zone (bold frames) mini 3 game matches :
2.jpg
2.jpg (85.56 KiB) Viewed 13268 times
According to THIS SITE, and in KGS rankings
#40 is very approximately around 4K
#70 is around 3D
#100 is around 6D
#130 is around 9D
(it seems a lot, and these rankings weren't based on 2sec/move)

You can play handicap go at this excellent site
hoa803
Beginner
Posts: 19
Joined: Tue Apr 02, 2019 7:12 pm
GD Posts: 0
Been thanked: 2 times

Re: LZ's progression

Post by hoa803 »

If anybody hasn't used their free $300 from Google Cloud and feels like doing some deep learning, I recently set it up and it's quite easy to do.

See this github thread for an updated guide. Note that the Microsoft Azure guide currently doesn't work with LeelaZero 0.17, but I'm trying to figure out the solution.

On a single Tesla v100 gpu I am finishing a game every 108 seconds, averaged over 900 games! That means I can expect to make something like 12,000 games (selfplay and matches) before the $300 credit runs out.
nbc44
Dies in gote
Posts: 50
Joined: Sat Sep 15, 2018 2:34 am
GD Posts: 0
Been thanked: 3 times

Re: LZ's progression

Post by nbc44 »

Time parity match with statistically significant result :salute: (part I).
LZ0v17 #219 vs Elfv2
2x1080ti, 30s per move.
C:\APPS\l0gpu17\validation.exe -k 219elfv2-30s -s "0:10" -g 6 -n C:\APPS\net\00ff08eb.gz -o "-g --gpu 0 --gpu 1 --noponder -t 24 -q -d --precision single -w" -n C:\APPS\net\05dbca15.gz -o "-g --gpu 0 --gpu 1 --noponder -t 24 -q -d --precision single -w" -- C:\APPS\l0gpu17\leelaz --gtp-command "time_settings 0 30 1" -- C:\APPS\l0gpu17\leelaz --gtp-command "time_settings 0 30 1"

Code: Select all

#219 v elfv2 ( 400 games)
           wins        black       white
#219   175 43.75%   65 41.67%  110 45.08%
elfv2  225 56.25%   91 58.33%  134 54.92%
                   156 39.00%  244 61.00%
Attachments
219elfv2-30s.zip
(347.73 KiB) Downloaded 627 times
Amtiskaw
Dies in gote
Posts: 38
Joined: Sun Apr 17, 2016 5:22 am
GD Posts: 0
Has thanked: 4 times
Been thanked: 20 times

Re: LZ's progression

Post by Amtiskaw »

Cool. I adjusted the PB and PW properties in the SGF files to make it a bit clearer who was who.
Attachments
219elfv2-30s.zip
(338.48 KiB) Downloaded 615 times
hoa803
Beginner
Posts: 19
Joined: Tue Apr 02, 2019 7:12 pm
GD Posts: 0
Been thanked: 2 times

Re: LZ's progression

Post by hoa803 »

It might provoke an interesting discussion - the folks on GitHub don't feel the time parity matches are a good measure of engine strength, but rather visits. I don't claim to totally understand the reasoning but it might be worth looking into.
Post Reply