so basically leela zero has made 0 progress in the last 4mo

For discussing go computing, software announcements, etc.
bernds
Lives with ko
Posts: 259
Joined: Sun Apr 30, 2017 11:18 pm
Rank: 2d
GD Posts: 0
Has thanked: 46 times
Been thanked: 116 times

Re: so basically leela zero has made 0 progress in the last

Post by bernds »

Charlie wrote:Perhaps your opening book is biased to favour black.
This is possible, but since both sides got to play each position with both colors, it should not make a difference for testing relative strengths. I did not pick the openings completely at random. They were from pro games, picked with an eye towards looking reasonably even, but if one side made one or two moves that an AI perhaps wouldn't, like 3-3 point openings, it was White. The idea being that LZ evaluates an empty board as favourable for White, and if one could construct more even positions it wouldn't be a bad thing for this sort of test.
Here are the first evaluations in each file where LZ#191 was black:

Code: Select all

NN eval=0.445787
NN eval=0.457268
NN eval=0.466394
NN eval=0.461104
NN eval=0.510580
NN eval=0.487815
NN eval=0.460059
NN eval=0.464494
NN eval=0.452350
NN eval=0.453560
NN eval=0.489030
NN eval=0.461185
NN eval=0.462972
NN eval=0.503706
NN eval=0.444272
NN eval=0.513425
NN eval=0.464524
NN eval=0.433751
NN eval=0.482730
NN eval=0.449405
NN eval=0.470484
NN eval=0.518139
NN eval=0.450151
NN eval=0.474539
NN eval=0.460860
NN eval=0.407501
NN eval=0.511947
NN eval=0.468585
NN eval=0.454147
NN eval=0.459028
NN eval=0.474770
That suggests most of the evaluations were around the 0.46 point that the programs think is the evaluation for an empty board. Now, it's possible that these evaluations don't reflect actual winning percentages, and that would be an interesting find if it could be shown.
Uberdude
Judan
Posts: 6727
Joined: Thu Nov 24, 2011 11:35 am
Rank: UK 4 dan
GD Posts: 0
KGS: Uberdude 4d
OGS: Uberdude 7d
Location: Cambridge, UK
Has thanked: 436 times
Been thanked: 3718 times

Re: so basically leela zero has made 0 progress in the last

Post by Uberdude »

Something I picked up from my google translate of Chinese kibitz whilst playing as LeelaZero on Fox is they use different bots to play as black and white as some are better at once colour than another. Elfv1 is unusual in that it thinks black is winning on the empty board, so maybe it's better playing as black? Regarding bernds's test, here's a some possibly relevant musings:

#157 likes to knight back off as white after approach in parallel 4-4:
Click Here To Show Diagram Code
[go]$$B
$$ +---------------------------------------+
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . O . . . . . , . . . . . X . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . , . . . . . , . . . . . , . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . 2 . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . O . . . . . , . . . . . X . . . |
$$ | . . . . . 1 . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ +---------------------------------------+[/go]
#188 and other recent 40b like to counter-approach (like Elfv1 does too, 4 4-4 corners then 4 approaches is very common Elf fuseki and we see pros playing it recently too). But if white plays knight answer not much loss of winrate.
Click Here To Show Diagram Code
[go]$$B
$$ +---------------------------------------+
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . O . . . . . , . . . . . X . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . , . . . . . , . . . . . , . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . O . . . . . , . . . . . X . . . |
$$ | . . . . . 1 . . . . . . . 2 . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ +---------------------------------------+[/go]
Elfv1 thinks knight answer is -7% mistake (I saw in a facebook thread recently Nikola Mitic reported some pros reckon 10% Elf in opening is about 1 point) and black's 3-3 invasion punishes. LZ 188 thinks white is still good here.
Click Here To Show Diagram Code
[go]$$B
$$ +---------------------------------------+
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . 3 . . . . . . . . . . . . . . . . |
$$ | . . . O . . . . . , . . . . . X . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . , . . . . . , . . . . . , . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . 2 . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . O . . . . . , . . . . . X . . . |
$$ | . . . . . 1 . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ +---------------------------------------+[/go]
So it appears LZ from 157 to the recent 40b is becoming more like Elfv1 and if we assume Elf is "correct" (at least when it plays) that this 3-3 after knight response is good for black then 157 will willingly play white on this position whilst 40b won't, and 40b is probably better at making black win from here, but maybe both >50% as black.
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

Re: so basically leela zero has made 0 progress in the last

Post by Vargo »

Another 10 game match between #157 and #191, same PC, same time parity, but with twogtp v1.5.0 and LZ0.16 (thanks to baduk1 for the workaround !)
So, much better benchmarks (845 n/s for #191 and 2677 n/s for #157)
Same commands as last match.

Average length : 250 moves
Average time per game : 678" for #191 and 662" for #157

All games by resignation, no duplicate game.
At move 60, the 10 games look different from one another :
191.jpg
191.jpg (185.37 KiB) Viewed 22198 times
And the result is... hum....
5-5 Go figure... :scratch:
191_is_W_on_odd_.zip
(9.63 KiB) Downloaded 756 times
(I used -alternate, so #191 is W only in the odd numbered games)
splee99
Dies with sente
Posts: 101
Joined: Thu Nov 15, 2012 9:46 pm
Rank: KGS 2 D
GD Posts: 0
Has thanked: 2 times
Been thanked: 16 times

Re: so basically leela zero has made 0 progress in the last

Post by splee99 »

Observing the success of Leela-master, I would suspect the real potential of the "zero" approach. I have no doubt that the zero approach can produce bots playing at super human level, but obviously the bot is rather good at micro strategy level. For macro strategy, such as how to choose the opening moves, especially in handicap games, it may take so many self-play games for the bot to learn. Thinking about this, even Leela zero has to use ELF games in the training to make reasonable progress.
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

Re: so basically leela zero has made 0 progress in the last

Post by Vargo »

In another thread, @splee99 said :
If you have two GPU's, why don't you try assign GPU0 to 181 and GPU1 to 157? I know this would make the speed slower, but it will make a fair match because some data maybe cached in a GPU during the game.
Good idea.
10 game match #191 v #157 at 5min per game per side, but with pondering enabled for both,
result 6:4 in favor of #191


All the stats and commands :
stats.jpg
stats.jpg (123.01 KiB) Viewed 22095 times
The games (191 is B only in even numbered games)
191v157_5m_ponder_191isBfor_even_num.zip
(9.92 KiB) Downloaded 796 times
bernds
Lives with ko
Posts: 259
Joined: Sun Apr 30, 2017 11:18 pm
Rank: 2d
GD Posts: 0
Has thanked: 46 times
Been thanked: 116 times

Re: so basically leela zero has made 0 progress in the last

Post by bernds »

User avatar
ez4u
Oza
Posts: 2414
Joined: Wed Feb 23, 2011 10:15 pm
Rank: Jp 6 dan
GD Posts: 0
KGS: ez4u
Location: Tokyo, Japan
Has thanked: 2351 times
Been thanked: 1332 times

Re: so basically leela zero has made 0 progress in the last

Post by ez4u »

bernds wrote:Relevant discussion elsewhere:
https://www.reddit.com/r/cbaduk/comment ... e_overfit/
This is a very interesting report! The follow up posting on the LZ site is here: https://github.com/gcp/leela-zero/issues/2044
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
sorin
Lives in gote
Posts: 389
Joined: Wed Apr 21, 2010 9:14 pm
Has thanked: 418 times
Been thanked: 198 times

Re: so basically leela zero has made 0 progress in the last

Post by sorin »

ez4u wrote:The follow up posting on the LZ site is here: https://github.com/gcp/leela-zero/issues/2044
To summarize the github thread: the claim that starting with pre-set openings somehow makes 40 block LZ weaker than 15 block is wrong.

The error was due to using incorrect setup params, which somehow seem to affect pre-set games but not regular games.

Phew, the world is sane again :-)
Vargo
Lives in gote
Posts: 337
Joined: Sat Aug 17, 2013 5:28 am
GD Posts: 0
Has thanked: 22 times
Been thanked: 97 times

Re: so basically leela zero has made 0 progress in the last

Post by Vargo »

The networks do get better, even at short time parity (~1 sec/move) , see HERE
roy7
Dies in gote
Posts: 41
Joined: Sat Jan 28, 2017 8:36 pm
GD Posts: 0
OGS: roy7
Universal go server handle: roy7
Been thanked: 7 times

Re: so basically leela zero has made 0 progress in the last

Post by roy7 »

bernds wrote:I was curious, so I was trying to run something similar yesterday. While I got a similar result, the games all looked identical up to move 40 or so.
I think I saw a comment in the github by someone else who ran comparisons that one of the tools people sometimes use to run these head to head matches (twogtp?) doesn't restart the engine between games, it just uses the existing engine in memory and sets up the new board/etc and begins playing the new game. The issue with this at the time (perhaps a recent pull request addressed it) is that the NN eval cache doesn't flush just because you cleared the board. Thus you'll have exactly the same results of every early NN eval each game you play. (Normally symmetry causes some small random noise, but once cached the value never changes.)
hydrogenpi7
Dies in gote
Posts: 63
Joined: Sat Mar 25, 2017 3:19 pm
GD Posts: 0
Been thanked: 3 times

Re: so basically leela zero has made 0 progress in the last

Post by hydrogenpi7 »

So basically I was wrong,

make that half a year zero net effective progress ...
Post Reply