Page 1 of 2

Elf teaches us how to play the black hole vs LeelaZero

Posted: Fri Jun 15, 2018 9:15 am
by Uberdude
baduk1 continues his interesting bot vs bot experiments with forcing Elf to play the black hole opening vs LeelaZero, Elf gave itself 4% winrate to start:

Re: Elf teaches us how to play the black hole vs LeelaZero

Posted: Fri Jun 15, 2018 3:39 pm
by oren
What was the end result?

Re: Elf teaches us how to play the black hole vs LeelaZero

Posted: Fri Jun 15, 2018 4:21 pm
by Tryss
If I'm right, black win by 1.5 points

Re: Elf teaches us how to play the black hole vs LeelaZero

Posted: Fri Jun 15, 2018 4:26 pm
by Baywa
Black wins by 1.5 points if I count correctly.

Edit: Tryss was quicker-

Re: Elf teaches us how to play the black hole vs LeelaZero

Posted: Fri Jun 15, 2018 4:27 pm
by Bill Spight
#me too. :D

Re: Elf teaches us how to play the black hole vs LeelaZero

Posted: Fri Jun 15, 2018 8:28 pm
by Uberdude
Me too, but I didn't add it to the sgf so as not to spoil the ending. :)

Something I'd like to check here is did LZ fail to see Elf's good moves or see but misevaluate them?

Re: Elf teaches us how to play the black hole vs LeelaZero

Posted: Sat Jun 16, 2018 6:00 am
by ez4u
The question is what happens when you switch colors? Or else give LeelaZero 16k visits? :rambo:

Re: Elf teaches us how to play the black hole vs LeelaZero

Posted: Tue Jun 19, 2018 5:07 am
by Uberdude
Uberdude wrote: Something I'd like to check here is did LZ fail to see Elf's good moves or see but misevaluate them?
So the first move of Elf's that stood out to me as a strong move was q7. Leela #145 doesn't consider this at all until 1k nodes (the 2 extends are main choices), around 2k it notices it is strong and examines it more and is #1 by 3k with a 27.4% winrate. It gave the inside hane a few moves prior 25.2%. I then updated to the latest #149 network and it found q7 within 300 playouts and was #1 in under 1k.

Re: Elf teaches us how to play the black hole vs LeelaZero

Posted: Thu Jun 21, 2018 2:16 pm
by Uberdude
Next nice move of Elf, this cutting across knight's move in response to white j16. It's the natural shape weakness so I think plenty of strong humans could find it, but the timing is smart. Letting LeelaZero #150 analyse for ages she considers k15 practically the only move with 120k playouts and 28.4%, e17 gets 5 (not 5k). Once played LZ quickly realises it's a good move, black win up to 35.3%, a 7% swing which is pretty huge for 1 move. A severe blindspot, looks like LZ needs to be more exploratory. The continuation is as expected.
Click Here To Show Diagram Code
[go]$$Wcm54
$$ ---------------------------------------
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . X O . . . |
$$ | . . . O 2 . O . X . . . . . X . . . . |
$$ | . . O , . O . O 1 X . . . . . , O . . |
$$ | . . X X . X X . . . . . . . . X O . . |
$$ | . . . . . . . . . . . . . . . X . . . |
$$ | . . . . . . . . . . . . . . X . O . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . X . . |
$$ | . . O , . . . . . , . . . . . , O . . |
$$ | . . . . . . . . . . . . . . . . . O . |
$$ | . . . . . . . . . . . . . . . . O . O |
$$ | . . O . X . . . . . . . . . . X X O . |
$$ | . . . . . . . . . . . . . . . X . X . |
$$ | . . . . . . . . . . . . X . . X . X . |
$$ | . . O , . . . . . , . . . . X O X . . |
$$ | . . . . X . . . O . O . . O O O X . . |
$$ | . . . . . . . . . . . . . . O X X . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ ---------------------------------------[/go]

Re: Elf teaches us how to play the black hole vs LeelaZero

Posted: Thu Jun 21, 2018 2:58 pm
by Bill Spight
Uberdude wrote:Next nice move of Elf, this cutting across knight's move in response to white j16. It's the natural shape weakness so I think plenty of strong humans could find it, but the timing is smart. Letting LeelaZero #150 analyse for ages she considers k15 practically the only move with 120k playouts and 28.4%, e17 gets 5 (not 5k). Once played LZ quickly realises it's a good move, black win up to 35.3%, a 7% swing which is pretty huge for 1 move. A severe blindspot, looks like LZ needs to be more exploratory.
(Emphasis mine.)

Is that the result of self-play? Or is it just one of those things that we can expect and have to put up with? OC, no player is going to be perfect, but it seems like the current Zero bots have deep, if contained, weaknesses that they might not have if they had trained against a variety of opponents.

Re: Elf teaches us how to play the black hole vs LeelaZero

Posted: Thu Jun 21, 2018 4:05 pm
by moha
I think such blind spots are a tricky problem also involving the search code of the bot. Even in training it will only learn moves that its search can find within the low selfplay visit limit (in each position it is trained towards search results). So if there is a blind spot in the net, that can only get fixed if the bot doesn't rely too much on its policy during selfplay search - otherwise it will just keep reinforcing the oversight.

It seems this problem affects bots to different extent though, with LZ search being one of the most rigid / least exploratory. But it also uses ELF net for some selfplay games now, so some blind spots should start to close (ELF based search can see different moves).

Re: Elf teaches us how to play the black hole vs LeelaZero

Posted: Fri Jun 22, 2018 2:31 am
by Uberdude
A relevant game I came across today, Alexander Dinerstein 3p shows us how not to use the black hole opening against the much stronger Mi Yuting 9p:


Re: Elf teaches us how to play the black hole vs LeelaZero

Posted: Thu Jul 05, 2018 4:09 pm
by jokkebk
Alexander is a great teacher to us all. I hope we'll see many more educational games in the future!

Re: Elf teaches us how to play the black hole vs LeelaZero

Posted: Thu Jul 05, 2018 5:28 pm
by Bill Spight
moha wrote:I think such blind spots are a tricky problem also involving the search code of the bot. Even in training it will only learn moves that its search can find within the low selfplay visit limit (in each position it is trained towards search results). So if there is a blind spot in the net, that can only get fixed if the bot doesn't rely too much on its policy during selfplay search - otherwise it will just keep reinforcing the oversight.
IIUC, Monte Carlo bots are guaranteed to find the correct play in infinite time. I would be surprised if the Zero bots don't meet the same criterion. OC, that does not mean that they cannot develop blind spots that will last for millennia. :lol:

Re: Elf teaches us how to play the black hole vs LeelaZero

Posted: Fri Jul 06, 2018 7:26 am
by moha
Bill Spight wrote:
moha wrote:I think such blind spots are a tricky problem also involving the search code of the bot. Even in training it will only learn moves that its search can find within the low selfplay visit limit (in each position it is trained towards search results). So if there is a blind spot in the net, that can only get fixed if the bot doesn't rely too much on its policy during selfplay search - otherwise it will just keep reinforcing the oversight.
IIUC, Monte Carlo bots are guaranteed to find the correct play in infinite time. I would be surprised if the Zero bots don't meet the same criterion. OC, that does not mean that they cannot develop blind spots that will last for millennia. :lol:
This convergence you mention means the bot will find the best move if it is run in a given position for infinity (though even this only holds if all moves are guaranteed to get infinite further visits, which may not always be the case with NN-based pruning). But there is no infinite search during selfplay, and the training problem I wrote above may not show the same convergence.

In a given position 2-3k visits may give the same results as the current raw policy net (which can be an oversight, and, say, 100k visits could find a better move - but this will never happen). In this case no learning takes place - the network is just just trained towards its current output, reinforcing it. So the blind spot could only be fixed if the bot comes across a position where it manifests AND the visit limit is enough for the search to find the correct move (or at least slightly more correct evaluations) EVEN if started from the wrong policy distribution. This latter part, the level of exploration is where various bots differ: some only look at moves that have decent policy weights, some spend a few visits on less promising moves as well.