Life In 19x19

Posted: **Fri Apr 05, 2019 4:11 pm**

Yes, you are right. All these tests are just rubbish in terms of mathematical statistics. And poor Lee Sedol still has chances to defeat Alfago

.
And you are right again:

hoa803 wrote:It's still fun to do the matches, though.

Posted: **Fri Apr 05, 2019 4:55 pm**

I don't know what they did in the new version of Leela but my Gflops pretty much doubled. I'm looking at an RTX 2060 for gaming and deep learning. Anybody try one and have a benchmark?

Posted: **Sat Apr 06, 2019 1:04 am**

hoa803 wrote:The thing nobody seems to be talking about in this thread is the confidence interval.

nbc44 wrote:All these tests are just rubbish

All these 20, 30 ... game matches aren't gospel, obviously. Match parameters vary wildly (different gpus, different time per game, different number of visits, usage of -m, -r, etc. etc.)
No one thinks #XXX is definitively stronger than elfv2 just because XXX won a single 20 game match by 11-9 (for example)

But all these matches are really fun to run, and I think, taken as a whole, they can give an idea of the strength of the different networks.

They're not gospel, but they're not rubbish either, even with so few as 20 games.

For example
20 game match XXX v. YYY , result 15-5
If XXX and YYY were the same strength, there's only 2% chance for XXX to get at least 15 wins. It's not unreasonable to think that XXX is stronger, (particularly if there are several such matches going the same way).

Posted: **Sat Apr 06, 2019 7:25 pm**

Vargo wrote:But all these matches are really fun to run, and I think, taken as a whole, they can give an idea of the strength of the different networks.

They're not gospel, but they're not rubbish either, even with so few as 20 games.

For example
20 game match XXX v. YYY , result 15-5
If XXX and YYY were the same strength, there's only 2% chance for XXX to get at least 15 wins. It's not unreasonable to think that XXX is stronger, (particularly if there are several such matches going the same way).

Yeah, I definitely agree with that. I ran something a few days ago, I believe it was LZ216 vs Elfv2, using Leela release 0.17 on my GTX 1060 6GB at CGOS rules, 15 minutes a side absolute. After 52 games the match was tied 26 apiece.

Seems like whatever blip caused some of the LZ nets to be significantly weaker than Elf has gone away. I wish I had better hardware to run out more visits and/or more games, but if I did that I'd have less statistical significance. Still, the 0.17 version of Leela gets ~2600 Gflops on my gpu, which results in a decent number of visits per move at that time control.

I will say that one thing I disagree with is adding any of the self play randomness params to matches that ostensibly compare engine strength. I feel like the main value there is the games are perhaps more interesting to watch. However, I think any of the programmers on Github would agree it shouldn't be used in a "match" situation. Although, I haven't seen anybody discuss such a thing outside of training, so who knows.

Posted: **Sun Apr 07, 2019 11:34 am**

hoa803 wrote:
Yeah, I definitely agree with that. I ran something a few days ago, I believe it was LZ216 vs Elfv2, using Leela release 0.17 on my GTX 1060 6GB at CGOS rules, 15 minutes a side absolute. After 52 games the match was tied 26 apiece.

I'm just curious. In those 52 games, was ELFv2 running on LZ 0.16 or 0.17?

Posted: **Mon Apr 08, 2019 9:20 am**

Should have been 0.17. I would have more info but I screwed up my command line and didn't save any of the games, which is very frustrating. Validate prints out XX-XX win/ loss after each game, so I'm basing it on that alone. I need to rerun to confirm at some point. I'd probably just use the latest network.

Right now I'm just helping train the AI rather than running matches.

Edit: I messed with it after work today. Turns out the -k statement to save games must be placed towards the beginning of the command line with 0.17. Or at least, it started saving the games when I moved it from the end to directly after validation.exe statement.

Edit 2: I'm currently running lz219 vs elfv2 at 30 seconds a move. I think I like that better than absolute time for a comparison, because both engines will be much stronger reading the full 30 seconds each move (and subsequent moves).

Posted: **Tue Apr 09, 2019 5:00 am**

New network #220

Even if regular LZ(017) is not really designed for handicap games, it can play nice H games.

H3 game with komi 7.5 : Crazy Stone DL (5 Dan) v. LZ017#220
(4s/move for LZ, and -r 1 to avoid resigning too soon, laptop with gtx 965)
CS and LZ(Sabaki) don't agree on the final score (W+7.5 and W+4.5) I suppose the 3 points difference comes from the 3 handicap stones. Maybe my settings are wrong somehow ? If someone knows, thx...

Settings :

counting...

--> the game
______________________________________________________

Posted: **Tue Apr 09, 2019 5:43 am**

interesting. zen shows w+4.5

Posted: **Tue Apr 09, 2019 5:44 am**

LeelaZero counted the score as territory + prisoners (+komi for White)

Crazystone counted 182 points for Black, which corresponds to (black living stones)+(black territory).
My guess is that the 189.5 points for White correspond to (white living stones)+(white territory)+(komi)+(number of handicap stones).

Posted: **Tue Apr 09, 2019 10:09 am**

H3 game with komi 7.5 : Crazy Stone DL (5 Dan) - zen 7 (5sec), zen win, score CS W+35.5, zen and sabaki W+34.5. Crazy Stone possibly mistaken?
ps I figured it out: CS shows area + handicap, and sabaki and zen - territory

Posted: **Thu Apr 11, 2019 9:01 am**

And wrote:CS shows area + handicap, and sabaki and zen - territory

jlt wrote:LeelaZero counted the score as territory + prisoners (+komi for White)
Crazystone counted 182 points for Black, which corresponds to (black living stones)+(black territory)

Thx

49 games at H2…H9 played byLZ017#10…#190 v. LZ017#220

time parity, 2 sec/move, 1x 1080, official LZ0.17#220 v.#xxx komi 7.5 , -r 1 for W (to avoid resigning too soon) –r 30 for B (to avoid very long games)

It gives an idea of the handicap skills of #220.

In the interesting zone (bold frames) mini 3 game matches :

According to THIS SITE, and in KGS rankings
#40 is very approximately around 4K
#70 is around 3D
#100 is around 6D
#130 is around 9D
(it seems a lot, and these rankings weren't based on 2sec/move)

You can play handicap go at this excellent site

Posted: **Thu Apr 11, 2019 7:04 pm**

If anybody hasn't used their free $300 from Google Cloud and feels like doing some deep learning, I recently set it up and it's quite easy to do.

See this github thread for an updated guide. Note that the Microsoft Azure guide currently doesn't work with LeelaZero 0.17, but I'm trying to figure out the solution.

On a single Tesla v100 gpu I am finishing a game every 108 seconds, averaged over 900 games! That means I can expect to make something like 12,000 games (selfplay and matches) before the $300 credit runs out.

Posted: **Fri Apr 12, 2019 12:46 am**

Time parity match with statistically significant result

(part I).
LZ0v17 #219 vs Elfv2
2x1080ti, 30s per move.

Code: Select all

#219 v elfv2 ( 400 games)
           wins        black       white
#219   175 43.75%   65 41.67%  110 45.08%
elfv2  225 56.25%   91 58.33%  134 54.92%
                   156 39.00%  244 61.00%

Posted: **Fri Apr 12, 2019 3:13 am**

Cool. I adjusted the PB and PW properties in the SGF files to make it a bit clearer who was who.

Posted: **Sat Apr 13, 2019 6:10 pm**

It might provoke an interesting discussion - the folks on GitHub don't feel the time parity matches are a good measure of engine strength, but rather visits. I don't claim to totally understand the reasoning but it might be worth looking into.

Life In 19x19

LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression