20-game match LZ15#160 v LZ15_ELF2 at time parity.
5 minutes per game (1xGTX1080), noponder, komi 7.5
ELF2 wins 18:2 (90% , 10 wins as W, 8 wins as B)
I've seen a bunch more, and some by PhoenixGo too. Mostly against young low dan pros. So my guess is it is similar to the online training series against DeepZen from last year. I've not seen a human win. Also I believe the Japan server on WBaduk is a mirror/clone/relay/something of the Japanese Yugen No Ma server. Currently playing is username "aikiller" with 1p Japanese flag, let's see if he or she can live up to that!sorin wrote:I saw today two games on WBaduk server, between Ichiriki Ryo 8p (one of Japan's top pros) and LeelaZero. Both were played on even.
LeelaZero won both (one playing black, one white).
Does anyone know what was the event?
Actually, if a program A is stronger than B ( the win probability of A against B is >= 0.5), then there is under 0.13% chance that the program A get 3 win or less in 20 games. So the result is statistically significant.Ouch... ! 20 games is not enough to be really sigificant, but stilll...
But is statistically significant really significant?Tryss wrote:Actually, if a program A is stronger than B ( the win probability of A against B is >= 0.5), then there is under 0.13% chance that the program A get 3 win or less in 20 games. So the result is statistically significant.Ouch... ! 20 games is not enough to be really sigificant, but stilll...
How come, indeed?Vargo wrote:In the 50 games played in these matches, #161 won only 11, that's 22%, and #161 has a better Elo score than #157, mmmmm... How come ?
Accumulation of errors without calibration, either to older versions of self or an external benchmark. (On a related note, dead-reckoning / inertial guidance systems (as for Apollo or ICBMs) are amazing feats of engineering, I was recently reading about them.)Vargo wrote:In the 50 games played in these matches, #161 won only 11, that's 22%, and #161 has a better Elo score than #157, mmmmm... How come ?
I think I've managed to get twogtp working and am running a 147 vs 157 match now at 3200 visits.Uberdude wrote:In fact during normal 15-block training I think doing some occasionally e.g. #157 vs #147 would be a good idea to see how inflated the incremental self-improvement Elo is (based on the incremental Elo differences from the promotions 147->148->149 etc it went from 11401 -> 11806 which predicts #157 would beat #147 91% of the time, but I would bet it would be quite a lot lower than that in reality).
Could you try #165? It is noticeably more aggressive than before.Vargo wrote:
Here, #163 wins only 20% of its games against #157... It's only 20 games, but I doubt that #163 is stronger than #157.