LZ's progression

splee99 · Post by **splee99** » Sat Nov 03, 2018 9:08 am

GTX1050 with 2G memory. In the middle game, 20 s means about 1500 playouts for LZ and more playouts in the end game.

nbc44 · Post by **nbc44** » Sat Nov 03, 2018 12:56 pm

"True" time parity test - #186 (white) vs #157a (3s per move, 2x1080ti).

New hope, part2:

gogui-twogtp -white "C:\APPS\l0gpu16\leelaz.exe --gtp --weights=C:\APPS\net\f50dc27c.gz -t 12 --gpu 0 --gpu 1 --noponder --precision half" -black "C:\APPS\l0gpu16\leelaz.exe --gtp --weights=C:\APPS\net\fc5e0a50.gz -t 12 --gpu 0 --gpu 1 --noponder --precision half" -games 50 -sgffile 157a-186 -auto -time 1s+4s/1 -komi 7.5 -verbose

29:21

Finally: #186 vs #157a : +48-52=0

P.S. I can not explain the results of my test. What is wrong with black colour? V16, "--precision half" option, too little games...?

splee99 · Post by **splee99** » Sat Nov 03, 2018 1:18 pm

There is a well-known saying among professional go players. When you have to kill a huge dragon to win, you are already far behind. It may still take more time for LZ to understand this. When LZ186 plays black, it almost always tries to kill a huge dragon. LZ185 is more modest and plays more defensive moves.

nbc44 · Post by **nbc44** » Sat Nov 03, 2018 1:42 pm

splee99 wrote:... When LZ186 plays black, it almost always tries to kill a huge dragon. LZ185 is more modest and plays more defensive moves.

And what about #157a? It have the same results, so i'm desagree with you.

splee99 · Post by **splee99** » Sat Nov 03, 2018 3:27 pm

LZ157 is using 15 blocks, so it is unable to generate any huge dragon (at least as large as the one generated by the 40 block). So there is less unexpected outcome regarding life and death of a large dragon.

Uberdude · Post by **Uberdude** » Sat Nov 03, 2018 3:40 pm

Yakago wrote:
luigi wrote:I'm following Leela's progression with delight. Does anyone know what its current rating against human pros on "normal" hardware (let's say one GTX 1080) would be? Is it better than the #1 human on those conditions? If so, by how much?
Considering that even a few months ago, LeelaZero was beating human professionals with 2-3 handicap stones, which it was not even built for, on consumer hardware. (Since then, there has been various improvements to handicap play)

I would think that LeelaZero is well above any human on a GTX 1080. It is possible that you can find a blind spot with a ladder or similar though, depending on the time settings.

I agree LZ is now super-human on a GTX 1080 with say 30s a move, but I'm not actually aware of much direct evidence. Yakago, do you refer to Haylee, she is a lot weaker than a top pro. But yes there's lots of transitive evidence, e.g. a while ago LZ bijxo got some wins in the match vs Golaxy and Golaxy spanked all the humans it played (except that one loss). Reviewing pro games with LZ (and comparing with Elf) it appears to me that LZ is stronger and much more consistent, though it is of course possible the pro had a good response LZ didn't see that invalidates LZ's judgement, but I expect that's a minority (and I can find some, particularly ladders). Or on wbaduk I see LZ beating up non-top Japanese pros, often winning by resign in under 100 moves.

I did make an account on Fox to play with LZ to try to play some of the 9ds there and hopefully eventually top pros to gather some direct evidence, but as you start at 3d and need to win 20 games in a row to double rank promote that's 60 games of grinding to 9d, plus I felt a bit bad about beating up 3ds even though my username says LeelaZero and I say I'm a bot so they can quit before the game counts if they don't want to play (but most are Chinese so language barrier). I did actually play a few games this morning, this one below I liked move 47 armpit hit kosumi approach to 4-4 (that's usually a terrible beginner mistake) as a leaning attack to kill the group above, though of course my opp helped with that unnecessary c2 defence.

jlt · Post by **jlt** » Sun Nov 04, 2018 1:25 am

Some games of LZ#131 (network ecab83bb, 192x15) against Haylee last May were on relatively powerful hardware. The comments in https://online-go.com/game/12760703 indicate that at each move, the number of visits was typically 200000. On the other hand, https://online-go.com/game/12665694 indicates "1x1080Ti", and a number of visits less than 100000 in general, which is more reasonable.

Vargo · Post by **Vargo** » Sun Nov 04, 2018 7:02 am

20 game match between LZ0.16 #187 and Elf v1
(1x1080, 5 min per side and game, no pondering, twogtp 1.4.10)

Elf v1 wins 13:7 (7 wins as B, 6 wins as W)

#187 won 35% of its games, not so bad

(reminder : #184 had won only 30% of its 20 games against Elf v1)
Maybe not significant, but it seems to go in the right direction)

Average length : 220 moves , average time used per side and game: 215" (about the same for both sides)

elf2_187_ELFisW.zip: (8.27 KiB) Downloaded 637 times

elf2_187-ELFisB.zip: (9.1 KiB) Downloaded 630 times

sorin · Post by **sorin** » Sun Nov 04, 2018 1:56 pm

splee99 wrote:There is a well-known saying among professional go players. When you have to kill a huge dragon to win, you are already far behind. It may still take more time for LZ to understand this. When LZ186 plays black, it almost always tries to kill a huge dragon. LZ185 is more modest and plays more defensive moves.

Recent Go AIs are known for not paying much attention to human pros' sayings, but rather for proving humans wrong

splee99 · Post by **splee99** » Sun Nov 04, 2018 4:55 pm

Well, overplay is always wrong. As we can see LZ186 is well punished by LZ187 which is quite often in the defensive mode.

nbc44 · Post by **nbc44** » Sun Nov 04, 2018 10:39 pm

Vargo wrote:20 game match between LZ0.16 #187 and Elf v1
#187 won 35% of its games, not so bad

It looks like you are an optimist

:

Code: Select all

# Black: Leela Zero
# BlackCommand: C:\APPS\l0gpu16\leelaz.exe --gtp --weights=C:\APPS\net\6c7c1c83.gz -t 12 --gpu 0 --gpu 1 --noponder --precision single
# BlackLabel: Leela Zero:0.16
# BlackVersion: 0.16
# Date: November 5, 2018 5:10:21 AM VLAT
# Host: DESKTOP-SP9U1O5
# Komi: 7.5
# Referee: -
# Size: 19
# White: Leela Zero
# WhiteCommand: C:\APPS\l0gpu16\leelaz.exe --gtp --weights=C:\APPS\net\d13c4099.gz -t 12 --gpu 0 --gpu 1 --noponder --precision single
# WhiteLabel: Leela Zero:0.16
# WhiteVersion: 0.16
# Xml: 0
#
#GAME	RES_B	RES_W	RES_R	ALT	DUP	LEN	TIME_B	TIME_W	CPU_B	CPU_W	ERR	ERR_MSG
0	W+R	W+R	W+R	0	-	228	352	344.5	0	0	0	
1	W+R	W+R	W+R	0	-	244	378.8	369.4	0	0	0	
2	B+R	B+R	B+R	0	-	221	340.8	336	0	0	0	
3	W+R	W+R	W+R	0	-	188	293	284.4	0	0	0	
4	B+R	B+R	B+R	0	-	261	402	396.4	0	0	0	
5	W+R	W+R	W+R	0	-	240	373	363.4	0	0	0	
6	W+R	W+R	W+R	0	-	238	370.5	360.4	0	0	0	
7	B+R	B+R	B+R	0	-	219	338.5	332.7	0	0	0	
8	W+R	W+R	W+R	0	-	258	400.4	390.6	0	0	0	
9	W+R	W+R	W+R	0	-	292	452.3	442.7	0	0	0	
10	W+R	W+R	W+R	0	-	278	430	420.3	0	0	0	
11	W+R	W+R	W+R	0	-	292	451.9	442.4	0	0	0	
12	W+R	W+R	W+R	0	-	226	347.8	342	0	0	0	
13	W+R	W+R	W+R	0	-	298	459.6	451	0	0	0	
14	W+R	W+R	W+R	0	-	250	386.4	379.2	0	0	0	
15	W+R	W+R	W+R	0	-	226	350.1	341.4	0	0	0	
16	W+R	W+R	W+R	0	-	248	385.8	375.3	0	0	0	
17	B+R	B+R	B+R	0	-	113	175	172	0	0	0	
18	W+R	W+R	W+R	0	-	314	485.5	475.3	0	0	0	
19	B+R	B+R	B+R	0	-	103	159.1	156.4	0	0	0	
20	W+R	W+R	W+R	0	-	210	326.3	317.6	0	0	0	
21	W+R	W+R	W+R	0	-	266	413.1	403.2	0	0	0	
22	W+R	W+R	W+R	0	-	240	371.1	363.3	0	0	0	
23	W+R	W+R	W+R	0	-	202	312.2	305.3	0	0	0	
24	W+R	W+R	W+R	0	-	246	382.1	371.9	0	0	0	
25	B+R	B+R	B+R	0	-	121	186.8	184	0	0	0	
26	W+R	W+R	W+R	0	-	264	407	398.5	0	0	0	
27	W+R	W+R	W+R	0	-	312	481.4	472.5	0	0	0	
28	W+R	W+R	W+R	0	-	210	326.6	317.7	0	0	0	
29	W+R	W+R	W+R	0	-	292	450.8	442.4	0	0	0	
30	B+R	B+R	B+R	0	-	91	141.2	138.7	0	0	0	
31	B+R	B+R	B+R	0	-	91	141.1	138.8	0	0	0	
32	W+R	W+R	W+R	0	-	288	448.3	435.7	0	0	0	
33	W+R	W+R	W+R	0	-	250	388.4	377.2	0	0	0	
34	W+R	W+R	W+R	0	-	220	341.3	332.3	0	0	0	
35	W+R	W+R	W+R	0	-	190	293.6	287.6	0	0	0	
36	B+R	B+R	B+R	0	-	189	290.7	287.1	0	0	0	
37	B+R	B+R	B+R	0	-	227	350.1	344.3	0	0	0	
38	W+R	W+R	W+R	0	-	362	558.6	548.2	0	0	0	
39	W+R	W+R	W+R	0	-	184	285.7	277.6	0	0	0	
40	W+R	W+R	W+R	0	-	174	270.8	263	0	0	0	
41	W+R	W+R	W+R	0	-	272	421.2	413	0	0	0	
42	W+R	W+R	W+R	0	-	298	458.1	452.1	0	0	0	
43	W+R	W+R	W+R	0	-	192	298.3	290.1	0	0	0	
44	W+R	W+R	W+R	0	-	196	304	296.1	0	0	0	
45	W+R	W+R	W+R	0	-	364	560.9	551.9	0	0	0	
46	W+R	W+R	W+R	0	-	280	431.2	424.2	0	0	0	
47	W+R	W+R	W+R	0	-	340	524.3	514.9	0	0	0	
48	B+R	B+R	B+R	0	-	271	417	411.4	0	0	0	
49	W+R	W+R	W+R	0	-	128	199.9	192.7	0	0	0

#187(black) vs Elf v1. : +11-39

P.S. 3s per move, 2x1080ti
P.S.S Games ##30-31 are elf's ladder-blindness, so everything is very bad.

EDIT.

#187(white) vs Elf v1. : +17-33

Code: Select all

# Black: Leela Zero
# BlackCommand: C:\APPS\l0gpu16\leelaz.exe --gtp --weights=C:\APPS\net\d13c4099.gz -t 12 --gpu 0 --gpu 1 --noponder --precision single
# BlackLabel: Leela Zero:0.16
# BlackVersion: 0.16
# Date: November 5, 2018 3:14:25 PM VLAT
# Host: DESKTOP-SP9U1O5
# Komi: 7.5
# Referee: -
# Size: 19
# White: Leela Zero
# WhiteCommand: C:\APPS\l0gpu16\leelaz.exe --gtp --weights=C:\APPS\net\6c7c1c83.gz -t 12 --gpu 0 --gpu 1 --noponder --precision single
# WhiteLabel: Leela Zero:0.16
# WhiteVersion: 0.16
# Xml: 0
#
#GAME	RES_B	RES_W	RES_R	ALT	DUP	LEN	TIME_B	TIME_W	CPU_B	CPU_W	ERR	ERR_MSG
0	B+R	B+R	B+R	0	-	277	421.7	425.3	0	0	0	
1	B+R	B+R	B+R	0	-	231	352.1	354.7	0	0	0	
2	W+R	W+R	W+R	0	-	272	417	415.7	0	0	0	
3	B+R	B+R	B+R	0	-	217	331	334.1	0	0	0	
4	W+R	W+R	W+R	0	-	312	476.2	480	0	0	0	
5	W+R	W+R	W+R	0	-	128	197.3	195.7	0	0	0	
6	W+R	W+R	W+R	0	-	224	343.6	344.8	0	0	0	
7	B+R	B+R	B+R	0	-	285	434.5	439.2	0	0	0	
8	B+R	B+R	B+R	0	-	167	255.1	256.5	0	0	0	
9	B+R	B+R	B+R	0	-	229	349.6	351.5	0	0	0	
10	B+R	B+R	B+R	0	-	193	294.9	296	0	0	0	
11	B+R	B+R	B+R	0	-	213	324.6	327	0	0	0	
12	B+R	B+R	B+R	0	-	213	325.3	326.7	0	0	0	
13	B+R	B+R	B+R	0	-	303	462.2	464.1	0	0	0	
14	W+R	W+R	W+R	0	-	232	355.1	353.2	0	0	0	
15	B+R	B+R	B+R	0	-	181	275.8	277.6	0	0	0	
16	W+R	W+R	W+R	0	-	120	184.9	183.5	0	0	0	
17	W+R	W+R	W+R	0	-	212	324.7	324.3	0	0	0	
18	B+R	B+R	B+R	0	-	167	255.2	257.3	0	0	0	
19	W+R	W+R	W+R	0	-	172	264.4	263.9	0	0	0	
20	B+R	B+R	B+R	0	-	337	513.1	516.3	0	0	0	
21	B+R	B+R	B+R	0	-	319	486.2	488.6	0	0	0	
22	B+R	B+R	B+R	0	-	175	267.4	269.2	0	0	0	
23	B+R	B+R	B+R	0	-	209	319	322.4	0	0	0	
24	B+R	B+R	B+R	0	-	289	440.3	445.6	0	0	0	
25	W+R	W+R	W+R	0	-	142	218.8	216.2	0	0	0	
26	B+R	B+R	B+R	0	-	313	475.8	480.9	0	0	0	
27	B+R	B+R	B+R	0	-	337	513.3	517.5	0	0	0	
28	B+R	B+R	B+R	0	-	305	465.2	466	0	0	0	
29	W+R	W+R	W+R	0	-	146	225.3	222.9	0	0	0	
30	W+R	W+R	W+R	0	-	164	251.9	249.3	0	0	0	
31	W+R	W+R	W+R	0	-	158	242.7	240.5	0	0	0	
32	W+R	W+R	W+R	0	-	122	187.8	185.7	0	0	0	
33	B+R	B+R	B+R	0	-	245	373.5	375.7	0	0	0	
34	B+R	B+R	B+R	0	-	285	433.9	436.6	0	0	0	
35	B+R	B+R	B+R	0	-	161	245.9	247.4	0	0	0	
36	W+R	W+R	W+R	0	-	244	373	373	0	0	0	
37	W+R	W+R	W+R	0	-	192	295.1	293.9	0	0	0	
38	B+R	B+R	B+R	0	-	291	443.2	447.4	0	0	0	
39	B+R	B+R	B+R	0	-	153	233.1	234.7	0	0	0	
40	W+R	W+R	W+R	0	-	172	264	263.9	0	0	0	
41	B+R	B+R	B+R	0	-	163	249.3	250.4	0	0	0	
42	B+R	B+R	B+R	0	-	247	376.8	378.8	0	0	0	
43	B+R	B+R	B+R	0	-	221	336.3	339.7	0	0	0	
44	W+R	W+R	W+R	0	-	100	155.3	151	0	0	0	
45	B+R	B+R	B+R	0	-	301	457.4	462.6	0	0	0	
46	B+R	B+R	B+R	0	-	345	524.7	533.3	0	0	0	
47	B+R	B+R	B+R	0	-	359	546.1	550.6	0	0	0	
48	B+R	B+R	B+R	0	-	299	454.5	458.5	0	0	0	
49	B+R	B+R	B+R	0	-	237	360.9	363.9	0	0	0

In total:

#187 vs Elf v1. : +28-72

P.S. If someone wants the games...

nbc44 · Post by **nbc44** » Mon Nov 05, 2018 7:17 pm

On the other hand ("visit" parity test (-v 1601 -r 5)):

Code: Select all

The first net is worse than the second
Elfv1 v #187 ( 74 games)
        wins        black       white
Elfv1   24 32.43%   15 34.88%    9 29.03%
#187    50 67.57%   28 65.12%   22 70.97%
                    43 58.11%   31 41.89%

Vargo · Post by **Vargo** » Fri Nov 30, 2018 6:19 am

Ten 10 game matches at time parity :

157 v 158
157 v 160
157 v 165
157 v 170
157 v 175
157 v 180
157 v 185
157 v 190
157 v 193

All games and results :

157vxxx_2m.zip: (83.8 KiB) Downloaded 596 times

Below, a graph of the win percentages of #157, with a dashed linear fit, showing the "average progression" of the networks.
For example, second black dot from left means : #157 has a win rate of 90% against #160

(1x1080, 2 min per side per game, no pondering, komi 7.5, twogtp 1.5.0, LZ016, #157 is always W in the even numbered games)
The networks get better indeed !

I'm running the exact same experiment a second time...Results this evening or tomorrow.

I hope the linear fits will look similar.

Vargo · Post by **Vargo** » Fri Nov 30, 2018 11:25 am

The second series of matches shows progress too, but not as much as the first one.

(Same commands, same hardware)
Games and results of the second series:

157_xxxxx.zip: (82.84 KiB) Downloaded 617 times

The combined graph (ten 20-game matches):
#157 average win rate goes down from ~75% to ~45%

Vargo · Post by **Vargo** » Sat Dec 01, 2018 3:05 am

Third and last experiment with another ten 10-game matches (I should have done the three experiments in one go, sorry...)

157vxxxxx.zip: (79.43 KiB) Downloaded 601 times

After these 270 matches at short time parity (~1 sec/move with 1x1080), the combined graph shows real progress for the 40x256 networks. I think they are now better than #157 (last 15x192), even for relatively fast games.

I've not plotted the linear fit, because I'm sure someone will tell me the long time progression model is not linear (true, but it can be a good approximation, locally)
For example : rightmost blue point means #157 wins 40% against #193

Life In 19x19

LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression

Re: LZ's progression