Page 2 of 2

Re: so basically leela zero has made 0 progress in the last

Posted: Fri Nov 23, 2018 6:05 am
by bernds
Charlie wrote:Perhaps your opening book is biased to favour black.
This is possible, but since both sides got to play each position with both colors, it should not make a difference for testing relative strengths. I did not pick the openings completely at random. They were from pro games, picked with an eye towards looking reasonably even, but if one side made one or two moves that an AI perhaps wouldn't, like 3-3 point openings, it was White. The idea being that LZ evaluates an empty board as favourable for White, and if one could construct more even positions it wouldn't be a bad thing for this sort of test.
Here are the first evaluations in each file where LZ#191 was black:

Code: Select all

NN eval=0.445787
NN eval=0.457268
NN eval=0.466394
NN eval=0.461104
NN eval=0.510580
NN eval=0.487815
NN eval=0.460059
NN eval=0.464494
NN eval=0.452350
NN eval=0.453560
NN eval=0.489030
NN eval=0.461185
NN eval=0.462972
NN eval=0.503706
NN eval=0.444272
NN eval=0.513425
NN eval=0.464524
NN eval=0.433751
NN eval=0.482730
NN eval=0.449405
NN eval=0.470484
NN eval=0.518139
NN eval=0.450151
NN eval=0.474539
NN eval=0.460860
NN eval=0.407501
NN eval=0.511947
NN eval=0.468585
NN eval=0.454147
NN eval=0.459028
NN eval=0.474770
That suggests most of the evaluations were around the 0.46 point that the programs think is the evaluation for an empty board. Now, it's possible that these evaluations don't reflect actual winning percentages, and that would be an interesting find if it could be shown.

Re: so basically leela zero has made 0 progress in the last

Posted: Fri Nov 23, 2018 6:08 am
by Uberdude
Something I picked up from my google translate of Chinese kibitz whilst playing as LeelaZero on Fox is they use different bots to play as black and white as some are better at once colour than another. Elfv1 is unusual in that it thinks black is winning on the empty board, so maybe it's better playing as black? Regarding bernds's test, here's a some possibly relevant musings:

#157 likes to knight back off as white after approach in parallel 4-4:
Click Here To Show Diagram Code
[go]$$B
$$ +---------------------------------------+
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . O . . . . . , . . . . . X . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . , . . . . . , . . . . . , . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . 2 . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . O . . . . . , . . . . . X . . . |
$$ | . . . . . 1 . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ +---------------------------------------+[/go]
#188 and other recent 40b like to counter-approach (like Elfv1 does too, 4 4-4 corners then 4 approaches is very common Elf fuseki and we see pros playing it recently too). But if white plays knight answer not much loss of winrate.
Click Here To Show Diagram Code
[go]$$B
$$ +---------------------------------------+
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . O . . . . . , . . . . . X . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . , . . . . . , . . . . . , . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . O . . . . . , . . . . . X . . . |
$$ | . . . . . 1 . . . . . . . 2 . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ +---------------------------------------+[/go]
Elfv1 thinks knight answer is -7% mistake (I saw in a facebook thread recently Nikola Mitic reported some pros reckon 10% Elf in opening is about 1 point) and black's 3-3 invasion punishes. LZ 188 thinks white is still good here.
Click Here To Show Diagram Code
[go]$$B
$$ +---------------------------------------+
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . 3 . . . . . . . . . . . . . . . . |
$$ | . . . O . . . . . , . . . . . X . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . , . . . . . , . . . . . , . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . 2 . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . O . . . . . , . . . . . X . . . |
$$ | . . . . . 1 . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ +---------------------------------------+[/go]
So it appears LZ from 157 to the recent 40b is becoming more like Elfv1 and if we assume Elf is "correct" (at least when it plays) that this 3-3 after knight response is good for black then 157 will willingly play white on this position whilst 40b won't, and 40b is probably better at making black win from here, but maybe both >50% as black.

Re: so basically leela zero has made 0 progress in the last

Posted: Fri Nov 23, 2018 11:25 am
by Vargo
Another 10 game match between #157 and #191, same PC, same time parity, but with twogtp v1.5.0 and LZ0.16 (thanks to baduk1 for the workaround !)
So, much better benchmarks (845 n/s for #191 and 2677 n/s for #157)
Same commands as last match.

Average length : 250 moves
Average time per game : 678" for #191 and 662" for #157

All games by resignation, no duplicate game.
At move 60, the 10 games look different from one another :
191.jpg
191.jpg (185.37 KiB) Viewed 22216 times
And the result is... hum....
5-5 Go figure... :scratch:
191_is_W_on_odd_.zip
(9.63 KiB) Downloaded 756 times
(I used -alternate, so #191 is W only in the odd numbered games)

Re: so basically leela zero has made 0 progress in the last

Posted: Fri Nov 23, 2018 5:38 pm
by splee99
Observing the success of Leela-master, I would suspect the real potential of the "zero" approach. I have no doubt that the zero approach can produce bots playing at super human level, but obviously the bot is rather good at micro strategy level. For macro strategy, such as how to choose the opening moves, especially in handicap games, it may take so many self-play games for the bot to learn. Thinking about this, even Leela zero has to use ELF games in the training to make reasonable progress.

Re: so basically leela zero has made 0 progress in the last

Posted: Sat Nov 24, 2018 1:51 am
by Vargo
In another thread, @splee99 said :
If you have two GPU's, why don't you try assign GPU0 to 181 and GPU1 to 157? I know this would make the speed slower, but it will make a fair match because some data maybe cached in a GPU during the game.
Good idea.
10 game match #191 v #157 at 5min per game per side, but with pondering enabled for both,
result 6:4 in favor of #191


All the stats and commands :
stats.jpg
stats.jpg (123.01 KiB) Viewed 22113 times
The games (191 is B only in even numbered games)
191v157_5m_ponder_191isBfor_even_num.zip
(9.92 KiB) Downloaded 796 times

Re: so basically leela zero has made 0 progress in the last

Posted: Sat Nov 24, 2018 9:31 am
by bernds

Re: so basically leela zero has made 0 progress in the last

Posted: Sat Nov 24, 2018 10:58 pm
by ez4u
bernds wrote:Relevant discussion elsewhere:
https://www.reddit.com/r/cbaduk/comment ... e_overfit/
This is a very interesting report! The follow up posting on the LZ site is here: https://github.com/gcp/leela-zero/issues/2044

Re: so basically leela zero has made 0 progress in the last

Posted: Sun Nov 25, 2018 1:53 pm
by sorin
ez4u wrote:The follow up posting on the LZ site is here: https://github.com/gcp/leela-zero/issues/2044
To summarize the github thread: the claim that starting with pre-set openings somehow makes 40 block LZ weaker than 15 block is wrong.

The error was due to using incorrect setup params, which somehow seem to affect pre-set games but not regular games.

Phew, the world is sane again :-)

Re: so basically leela zero has made 0 progress in the last

Posted: Fri Nov 30, 2018 6:25 am
by Vargo
The networks do get better, even at short time parity (~1 sec/move) , see HERE

Re: so basically leela zero has made 0 progress in the last

Posted: Mon Dec 24, 2018 8:23 am
by roy7
bernds wrote:I was curious, so I was trying to run something similar yesterday. While I got a similar result, the games all looked identical up to move 40 or so.
I think I saw a comment in the github by someone else who ran comparisons that one of the tools people sometimes use to run these head to head matches (twogtp?) doesn't restart the engine between games, it just uses the existing engine in memory and sets up the new board/etc and begins playing the new game. The issue with this at the time (perhaps a recent pull request addressed it) is that the NN eval cache doesn't flush just because you cleared the board. Thus you'll have exactly the same results of every early NN eval each game you play. (Normally symmetry causes some small random noise, but once cached the value never changes.)

Re: so basically leela zero has made 0 progress in the last

Posted: Sat Jan 19, 2019 12:04 pm
by hydrogenpi7
So basically I was wrong,

make that half a year zero net effective progress ...