Life In 19x19 :: Home-made Elo ratings for some engines

Life In 19x19 http://lifein19x19.com/

Home-made Elo ratings for some engines http://lifein19x19.com/viewtopic.php?f=18&t=16086	Page 2 of 2

Author:	Kris Storm [ Sun Oct 14, 2018 3:30 pm ]
Post subject:	Re: Home-made Elo ratings for some engines
Thanks for your explanation. That is a clever method. I have a lot of .dat files from GoGui tournaments and always wanted to make such ELO list. Maybe you can share your Python code. I'm sure it would be useful for others.

Author:	xela [ Mon Oct 15, 2018 12:43 am ]
Post subject:	Re: Home-made Elo ratings for some engines
Kris Storm wrote: Maybe you can share your Python code. I'm sure it would be useful for others. It needs a bit of a rewrite before I can share it. At the moment it wouldn't work on someone else's computer because of all the hard-coded path names (and I'd be embarrassed to let it out in this shape). I'll add it to my to-do list.

Author:	xela [ Fri Oct 19, 2018 6:16 pm ]
Post subject:	Re: Home-made Elo ratings for some engines
Going slow this week, because my system keeps crashing! It doesn't like the combination fo strong bots and slow games. I think I need to upgrade an nvidia driver, but it's hard to find information on which drivers are more stable. At the moment, I can queue up 8 games to be played overnight, but I'll wake up to a black screen and an unresponsive system, and when I reboot it looks like only two or three games got played. Anyway, new this week: AQ has entered the 1-minute and 5-minute ratings. It's at a disadvantage because it was trained for Japanese rules and 6.5 komi, and I'm playing all the games with Chinese rules and 7.5 komi (this works for the majority of bots). My guess is that AQ's rating will therefore be 50-100 points below its true strength, but I can't think of a good way to measure how much difference it actually makes. Looking at the 5-minute games: AQ played 52 games 50 were won or lost by resignation. One was lost by AQ by 47.5 points. I can't figure out why AQ didn't resign. The game was 449 moves long. A large group died at move 234, and analysis with LZ_157 says the winrate was below 5% for the rest of the game. The position could have been scored at move 346, but AQ kept playing inside its own territory and trying to live inside black's territory for another 100 moves. One game was lost by AQ by 2.5 points. Ray actually gave away 2 points with a slack endgame move, so AQ was previously behind by 4.5 points, hard to explain this as a 6.5 versus 7.5 komi issue. Of course we don't know how many of the resigned games were due to an overplay that wouldn't have happened with the correct komi. A few more bots added to the 20-minute ratings. My mathematical model (post number 16 above) is looking about as bad as expected :-) At the slower time limit, it looks like LZ now gets stronger with more threads, unlike in the fast games. Results so far at 1 minute time limit, based on 1350 games with 62 engines: Code: Name Elo Elo+ Elo- games score avg_opp LZ_157 4164 97 91 60 72% 3998 LM_GX47 4142 102 95 62 73% 3900 LZ_ELF 4100 101 98 48 58% 4034 LZ_ELF_6t 4020 96 101 48 42% 4076 LZ_174 3973 87 88 64 47% 3995 ray_ELF_12t 3926 93 95 58 45% 3962 LZ_173 3903 115 112 48 60% 3794 LZ_141 3854 115 114 44 59% 3741 LM_E8 3833 116 115 50 64% 3630 LZ_116 3753 99 98 58 55% 3701 LZ_174_6t 3708 112 113 42 50% 3699 ray_173_6t 3694 112 110 36 53% 3676 LM_Z2 3691 98 94 60 67% 3518 ray_173_12t 3672 115 113 34 53% 3652 LM_W11 3642 111 115 44 50% 3611 LM_B5 3561 102 101 56 59% 3457 LZ_phoenix 3539 119 123 36 39% 3648 ray_W11_12t 3533 114 118 36 42% 3595 ray_173_2t 3486 129 129 28 50% 3487 LZ_zed 3393 123 126 34 41% 3471 leela 3374 115 114 56 55% 3294 LZ_91 3318 99 105 76 32% 3516 ray_ELF 3279 128 124 34 50% 3305 ray_173 3271 131 125 34 59% 3206 ray_W11 3129 104 103 48 50% 3138 dream_ponder 3129 119 123 40 53% 3050 AQ 3054 146 149 24 46% 3086 oakfoam_nn 2993 117 119 84 62% 2784 dream 2992 116 119 36 44% 3033 LM_GX47_c 2969 122 117 34 59% 2909 LZ_116_c2t 2866 109 112 60 30% 3144 LM_E8_c 2851 141 141 22 50% 2851 LM_B5_c 2849 134 134 24 50% 2849 LZ_116_c6t 2828 137 139 24 50% 2824 LM_Z2_c 2746 121 114 36 61% 2667 LZ_57 2745 114 116 52 50% 2725 LM_W11_c 2683 125 134 28 39% 2755 leela_c1t 2576 108 108 42 52% 2551 leela_c2t 2508 128 136 30 37% 2622 LZ_91_c2t 2506 137 140 26 46% 2533 leela_c 2499 103 101 88 59% 2377 pachi_nn 2400 110 107 76 64% 2228 pachi 2190 127 123 68 54% 2179 leela_nonet 2156 105 102 88 58% 2094 gnugo 1872 88 83 84 64% 1774 gnugo_l7 1871 120 122 52 38% 2005 LZ_57_c2t 1864 246 218 8 63% 1791 gnugo_M 1842 140 133 34 53% 1844 gnugo_l1 1824 91 89 84 48% 1882 gnugo_l4 1807 141 139 32 47% 1862 leela_nonet_1t 1758 241 333 10 10% 2160 oakfoam1 1735 126 122 32 56% 1692 pachi_pat 1711 385 328 2 50% 1711 fuego 1711 90 90 78 37% 1945 oakfoam_book 1628 113 119 40 38% 1731 pachi_1t 1585 199 231 14 14% 1894 oakfoam 1567 92 101 72 25% 1806 oakfoam2 1524 130 150 30 23% 1725 pachi_monte 1523 357 203 2 0% 1711 pachi_plain 1523 357 203 2 0% 1711 michi 1506 312 189 4 0% 1791 matilda 1437 141 120 44 9% 1877 Attachment: 1_min_crosstable-2018-10-20.csv [11.36 KiB] Downloaded 437 times Results so far at 5 minute time limit, based on 1448 games with 53 engines: Code: Name Elo Elo+ Elo- games score avg_opp LZ_ELF 4546 2 114 46 67% 4416 LM_GX47 4505 40 113 44 66% 4371 LZ_ELF_6t 4497 47 105 48 65% 4382 LZ_157 4458 82 124 32 59% 4386 LZ_173 4308 94 93 66 55% 4243 LZ_174 4292 103 98 64 66% 4137 LZ_141 4282 90 90 78 58% 4190 ray_ELF_12t 4146 102 105 46 46% 4174 LZ_phoenix 4134 121 122 36 47% 4155 LZ_174_6t 4121 84 82 92 57% 4067 ray_173_12t 3970 103 108 46 39% 4055 LM_Z2 3934 95 97 72 50% 3885 LM_B5 3932 112 111 44 57% 3861 LZ_116 3882 87 88 94 51% 3845 ray_173_6t 3876 99 101 48 44% 3925 LM_E8 3793 115 113 46 54% 3759 LM_W11 3791 109 105 60 63% 3663 ray_173_2t 3707 114 114 50 54% 3664 ray_W11_12t 3627 113 104 52 67% 3493 ray_ELF 3490 102 102 62 39% 3646 AQ 3457 110 112 52 54% 3385 ray_173 3406 102 107 54 33% 3568 leela 3400 93 96 94 44% 3438 LZ_zed 3360 109 108 52 48% 3397 LZ_91 3271 94 98 74 35% 3431 dream_ponder 3208 118 117 40 58% 3094 ray_W11 3144 104 105 52 44% 3207 dream 3095 125 127 32 47% 3117 LM_E8_c 3006 108 104 54 56% 2970 LZ_116_c2t 2959 106 105 64 56% 2903 LM_W11_c 2845 116 114 36 53% 2828 LM_GX47_c 2814 105 105 46 46% 2863 oakfoam_nn 2811 92 89 72 51% 2824 leela_c 2732 91 92 74 50% 2732 leela_c2t 2711 82 83 78 49% 2717 LZ_91_c2t 2620 105 101 56 59% 2539 LM_Z2_c 2573 109 110 38 47% 2596 LM_B5_c 2570 114 121 34 38% 2655 LZ_57 2569 113 114 38 45% 2616 leela_c1t 2507 95 100 68 41% 2596 pachi_nn 2400 106 113 64 39% 2515 pachi 2109 112 108 80 58% 2005 LZ_57_c2t 2064 132 121 40 70% 1872 leela_nonet 2058 137 150 42 36% 2157 fuego 1837 107 105 72 65% 1662 pachi_1t 1829 119 115 54 65% 1662 leela_nonet_1t 1827 124 117 52 69% 1624 gnugo 1472 142 -77 106 20% 1763 michi 1438 230 -112 40 55% 1403 oakfoam1 1258 377 -291 28 43% 1402 oakfoam 1039 562 -509 26 27% 1357 oakfoam_book 970 615 -578 32 13% 1406 matilda 947 645 -601 26 15% 1379 Attachment: 5_min_crosstable-2018-10-20.csv [8.8 KiB] Downloaded 473 times Results so far at 20 minute time limit, based on 188 games with 19 engines: Code: Name Elo Elo+ Elo- games score avg_opp LZ_174_6t 4337 276 194 16 94% 3948 LZ_174 4183 228 202 8 63% 4116 ray_ELF_12t 4116 118 120 32 50% 4098 LM_Z2_6t 4090 186 167 16 69% 3948 LM_Z2 3781 143 146 40 40% 3895 ray_173_6t 3504 184 166 24 67% 3368 LM_B5 3427 238 520 8 0% 3781 ray_173_2t 3212 172 181 16 38% 3308 ray_ELF 3112 114 114 38 47% 3140 ray_173 3087 145 135 22 64% 2991 LM_W11 3053 170 182 12 42% 3100 leela 2823 121 127 32 38% 2922 LZ_zed 2796 160 152 16 56% 2758 LM_W11_c 2692 111 111 32 50% 2697 leela_c2t 2618 153 153 16 50% 2620 dream 2553 201 261 8 25% 2692 LZ_91_c2t 2547 159 153 16 56% 2509 pachi_nn 2400 149 154 16 44% 2442 oakfoam_nn 2336 197 221 8 38% 2400 Attachment: 20_min_crosstable-2018-10-20.csv [1.61 KiB] Downloaded 490 times

Author:

xela [ Fri Oct 19, 2018 6:22 pm ]

Post subject:

Re: Home-made Elo ratings for some engines

Here are the two games where I thought AQ should have resigned.

Attachments:

File comment: AQ loses by 2.5 points; ray gives away points at move 282

ray_ELF-AQ-2018-10-11_21-15.sgf [2.85 KiB]
Downloaded 1361 times

File comment: AQ loses by 47.5 points; resignable from move 242; game could have been scored at move 346

AQ-ray_W11_12t-2018-10-11_20-15.sgf [3.07 KiB]
Downloaded 1391 times

Author:	xela [ Sat Nov 10, 2018 4:17 am ]
Post subject:	Re: Home-made Elo ratings for some engines
Sorry for the long gap between updates! I spent a lot of time figuring out how to update my graphics drivers, but I still haven't solved the crashing problem. It looks like I can't reliably run LZ with 6 or more threads in long games. But that's OK, I've found out what I originally wanted to, which is that LZ (even with 2 threads) seems to achieve superhuman performance on a fairly ordinary computer. I'm a little surprised to see ELF still at the top of the list, as I thought recent LZ networks had overtaken ELF at time parity. Over the next couple of weeks I'll add some more games to reduce some of the error margins, maybe throw LZ_157 into the mix, and maybe do some benchmarking to see how many visits per second I'm getting for various different networks. Oh, and for anyone who's observant: in previous posts, the Elo+ and Elo- columns were the wrong way round. I've gone back and edited the earlier posts so they're now correct. Results so far at 20 minute time limit, based on 228 games with 22 engines: Code: Name Elo Elo+ Elo- games score avg_opp LZ_ELF 4657 160 136 24 79% 4451 LM_GX47 4567 143 132 24 63% 4481 LZ_188 4440 132 143 24 38% 4523 LZ_phoenix 4346 107 112 40 38% 4439 LZ_174 4318 148 133 24 71% 4180 ray_ELF_12t 4214 147 157 24 54% 4144 LZ_141 3980 233 482 8 0% 4318 LM_Z2 3769 199 165 24 63% 3716 ray_173_6t 3504 182 166 24 67% 3364 LM_B5 3431 233 482 8 0% 3769 ray_173_2t 3212 173 182 16 38% 3308 ray_ELF 3112 115 115 38 47% 3140 ray_173 3087 147 136 22 64% 2990 LM_W11 3052 171 184 12 42% 3099 leela 2823 122 128 32 38% 2921 LZ_zed 2795 161 153 16 56% 2757 LM_W11_c 2691 112 112 32 50% 2697 leela_c2t 2617 155 155 16 50% 2619 dream 2552 203 263 8 25% 2691 LZ_91_c2t 2547 160 154 16 56% 2508 pachi_nn 2400 150 155 16 44% 2441 oakfoam_nn 2336 199 222 8 38% 2400 Attachment: 20_min_crosstable-2018-11-09.csv [1.99 KiB] Downloaded 448 times

Author:	pangafu [ Tue Dec 04, 2018 8:31 pm ]
Post subject:	Re: Home-made Elo ratings for some engines
@xela I am the author of LeelaMaster Weight I had seen you do some elo test with LM, so could I add this post to the readme of LeelaMaster Weigth https://github.com/pangafu/LeelaMasterWeight/ About LeelaMaster strength(elo) ..... Home-made Elo ratings for some engines (by xela@lifein19x19.com) https ://lifein19x19.com/viewtopic.php?f=18&t=16086 .... Thanks for your great work~

Author:	pangafu [ Tue Dec 04, 2018 8:37 pm ]
Post subject:	Re: Home-made Elo ratings for some engines
Hello @xela I am the author of Leela Master weight, and glad to see you do some test with lm weight. So could I add this post to the readme of Leela Master weight? About LeelaMaster strength(elo) .... Home-made Elo ratings for some engines (by xela@lifein19x19.com) viewtopic.php?f=18&t=16086 .... Please enjoy the human style of go game~

Author:	xela [ Sat Dec 08, 2018 4:37 am ]
Post subject:	Re: Home-made Elo ratings for some engines
pangafu wrote: Hello @xela I am the author of Leela Master weight, and glad to see you do some test with lm weight. So could I add this post to the readme of Leela Master weight? Yes. Thanks for asking!

Author:	xela [ Sat Dec 08, 2018 4:49 am ]
Post subject:	Re: Home-made Elo ratings for some engines
Here are the final results (unless I get inspired to do more). Looking at the error bounds, we can't say for sure which of the top 6 is actually the strongest, but they all seem to be definitely in the "superhuman" range (considering that the bottom of this list is already amateur dan level). Just for interest, on my hardware LZ_174 and LZ_188 get about 300 visits per second, ELF about 700, GX47 around 1200, LZ_157 around 1500 (numbers are approximate because they vary from one game to another, possibly depending on the board position and how much of the tree is reused from previous moves). Results at 20 minute time limit, based on 426 games with 25 engines: Code: Name Elo Elo+ Elo- games score avg_opp LZ_ELF 4481 132 115 32 72% 4330 LM_GX47 4394 106 102 40 58% 4343 LZ_157 4378 98 94 48 58% 4320 LZ_188 4324 102 105 40 45% 4357 LZ_174 4308 97 93 56 63% 4201 LZ_phoenix 4225 90 95 56 39% 4299 LZ_173 4191 118 123 32 44% 4233 ray_ELF_12t 4020 115 119 40 43% 4079 LZ_141 3873 134 134 32 44% 3942 LM_Z2 3801 132 130 32 56% 3741 ray_173_6t 3640 119 112 48 65% 3509 LM_B5 3433 112 118 40 35% 3555 AQ 3348 144 144 20 50% 3346 ray_173_2t 3348 121 124 32 44% 3398 ray_ELF 3169 114 115 42 43% 3242 ray_173 3121 138 132 24 58% 3062 LM_W11 3118 144 138 22 59% 3053 leela 2898 109 116 40 35% 3009 LZ_zed 2870 162 155 16 56% 2832 LM_W11_c 2765 116 116 32 50% 2764 leela_c2t 2695 158 158 16 50% 2695 LZ_91_c2t 2624 114 109 36 61% 2544 dream 2594 113 112 36 53% 2573 pachi_nn 2400 134 148 24 33% 2517 oakfoam_nn 2335 160 194 16 25% 2504 Attachment: 20_min_crosstable-2018-12-08.csv [2.47 KiB] Downloaded 451 times

Author:

xela [ Mon Sep 16, 2019 5:41 am ]

Post subject:

Re: Home-made Elo ratings for some engines

Updated with KataGo, OpenCL version (and also throwing in some recent LZ weights for comparison). Just fast games for this one, didn't get around to updating the 20 minute results.

kata_6b is the 6-block network, and you can probably guess the names for 10, 15, 20 blocks. In the 1 minute games I also tried different numbers of threads but didn't see much potential for significant improvement. The suggestion in the config file of trying more threads than you have cores wasn't a success on my hardware.

Results at 1 minute time limit, based on 1520 games with 72 engines:

Code:

Name            Elo   Elo+  Elo-  games  score  avg_opp
kata_15b        4224  184   166   16     75%    4041
LZ_242          4194  215   218   8      63%    4117
LZ_157          4186  94    85    74     74%    3993
LZ_188          4172  174   166   14     57%    4128
LM_GX47         4160  101   94    64     72%    3921
kata_20b        4142  179   167   16     63%    4047
LZ_ELF          4130  85    82    72     61%    4039
kata_10b        4037  118   110   44     68%    3873
LZ_ELF_6t       4024  92    97    54     39%    4100
LZ_174          3993  83    85    72     47%    4010
LZ_173          3941  107   106   54     59%    3836
ray_ELF_12t     3920  88    92    66     39%    3997
LZ_141          3907  98    97    60     58%    3805
kata_10_12t     3895  141   139   24     50%    3902
kata_10_6t      3856  130   134   28     43%    3912
LM_E8           3827  107   109   56     59%    3673
kata_10_2t      3820  140   145   24     42%    3886
LZ_116          3752  89    89    74     49%    3757
LZ_174_6t       3733  108   109   46     50%    3723
ray_173_6t      3698  107   106   40     53%    3681
kata_10_24t     3689  159   186   20     25%    3888
LM_Z2           3679  96    92    62     65%    3525
ray_173_12t     3672  112   110   36     53%    3653
LM_W11          3649  113   116   44     50%    3619
LZ_phoenix      3554  116   118   38     42%    3641
LM_B5           3545  99    99    58     57%    3460
kata_6b         3540  185   188   14     43%    3608
ray_W11_12t     3518  111   116   38     39%    3599
ray_173_2t      3489  124   124   30     50%    3489
LZ_zed          3402  119   122   36     42%    3476
leela           3378  116   115   56     55%    3298
LZ_91           3319  99    105   80     30%    3548
ray_ELF         3280  128   124   34     50%    3308
ray_173         3272  132   126   34     59%    3206
ray_W11         3130  104   103   48     50%    3139
dream_ponder    3129  119   123   40     53%    3051
AQ              3054  146   149   24     46%    3087
oakfoam_nn      2993  117   119   84     62%    2785
dream           2992  116   120   36     44%    3034
LM_GX47_c       2969  123   117   34     59%    2909
LZ_116_c2t      2865  109   112   60     30%    3142
LM_E8_c         2851  141   141   22     50%    2851
LM_B5_c         2849  135   135   24     50%    2849
LZ_116_c6t      2828  138   139   24     50%    2824
LM_Z2_c         2746  121   115   36     61%    2667
LZ_57           2744  114   116   52     50%    2725
LM_W11_c        2683  126   134   28     39%    2754
leela_c1t       2576  109   108   42     52%    2551
leela_c2t       2508  129   136   30     37%    2621
LZ_91_c2t       2506  137   140   26     46%    2533
leela_c         2499  103   101   88     59%    2377
pachi_nn        2400  111   107   76     64%    2228
pachi           2190  127   123   68     54%    2179
leela_nonet     2156  105   102   88     58%    2094
gnugo           1872  89    83    84     64%    1774
gnugo_l7        1871  120   122   52     38%    2005
LZ_57_c2t       1864  246   218   8      63%    1791
gnugo_M         1842  140   133   34     53%    1844
gnugo_l1        1823  91    89    84     48%    1882
gnugo_l4        1807  141   139   32     47%    1862
leela_nonet_1t  1758  244   255   10     10%    2160
oakfoam1        1735  126   122   32     56%    1692
pachi_pat       1711  394   220   2      50%    1711
fuego           1711  90    90    78     37%    1945
oakfoam_book    1628  113   115   40     38%    1731
pachi_1t        1585  207   110   14     14%    1894
oakfoam         1567  93    85    72     25%    1806
oakfoam2        1524  137   53    30     23%    1725
pachi_monte     1523  387   49    2      0%     1711
pachi_plain     1523  387   49    2      0%     1711
michi           1506  339   34    4      0%     1791
matilda         1437  170   -31   44     9%     1877

Results at 5 minute time limit, based on 1680 games with 59 engines:

Code:

Name            Elo   Elo+  Elo-  games  score  avg_opp
LZ_242          4662  -15   144   34     79%    4442
LZ_ELF          4506  90    87    66     64%    4389
LM_GX47         4499  97    94    58     64%    4380
kata_20b        4465  106   106   42     52%    4446
LZ_188          4454  112   114   36     47%    4473
LZ_ELF_6t       4444  92    90    62     56%    4390
LZ_157          4390  106   105   44     55%    4354
LZ_174          4259  91    89    76     61%    4149
LZ_173          4257  86    86    80     54%    4189
LZ_141          4243  87    86    82     57%    4160
kata_15b        4156  103   104   46     48%    4172
LZ_phoenix      4144  102   102   54     52%    4109
ray_ELF_12t     4134  97    99    50     48%    4144
LZ_174_6t       4093  80    78    100    57%    4034
ray_173_12t     3944  98    101   54     41%    4017
LM_Z2           3912  92    95    76     49%    3878
LM_B5           3905  112   111   46     59%    3808
ray_173_6t      3853  99    101   48     44%    3902
LZ_116          3844  84    85    98     50%    3827
LM_W11          3802  105   99    64     66%    3657
LM_E8           3787  109   106   50     56%    3740
kata_10b        3697  114   115   48     44%    3777
ray_173_2t      3682  106   108   54     52%    3659
ray_W11_12t     3636  107   99    58     67%    3497
ray_ELF         3491  98    99    66     38%    3641
AQ              3445  102   106   60     50%    3411
ray_173         3422  96    99    62     37%    3548
leela           3386  90    93    98     44%    3425
LZ_zed          3367  105   103   56     50%    3388
LZ_91           3282  92    95    76     37%    3419
kata_6b         3192  109   112   48     38%    3354
ray_W11         3190  94    93    62     50%    3202
dream_ponder    3181  110   112   44     52%    3113
dream           3091  117   121   36     44%    3130
LM_E8_c         3011  104   102   58     53%    2988
LZ_116_c2t      2959  105   104   66     55%    2916
LM_W11_c        2846  116   114   36     53%    2829
LM_GX47_c       2817  106   105   46     46%    2867
oakfoam_nn      2811  92    89    72     51%    2823
leela_c         2733  92    92    74     50%    2731
leela_c2t       2712  83    84    78     49%    2718
LZ_91_c2t       2620  106   101   56     59%    2540
LM_Z2_c         2573  109   110   38     47%    2596
LM_B5_c         2571  114   121   34     38%    2655
LZ_57           2569  113   114   38     45%    2616
leela_c1t       2507  95    100   68     41%    2596
pachi_nn        2400  107   113   64     39%    2514
pachi           2108  112   108   80     58%    2005
LZ_57_c2t       2064  132   122   40     70%    1872
leela_nonet     2058  137   150   42     36%    2157
fuego           1836  108   105   72     65%    1662
pachi_1t        1829  119   114   54     65%    1662
leela_nonet_1t  1827  125   116   52     69%    1624
gnugo           1472  214   -177  106    20%    1763
michi           1438  298   -211  40     55%    1403
oakfoam1        1258  462   -391  28     43%    1401
oakfoam         1039  657   -609  26     27%    1357
oakfoam_book    970   710   -678  32     13%    1406
matilda         947   741   -701  26     15%    1379

Attachments:

1_min_crosstable-2019-09-16.csv [14.68 KiB]
Downloaded 391 times

5_min_crosstable-2019-09-16.csv [10.58 KiB]
Downloaded 375 times

Author:	And [ Sun Sep 22, 2019 10:29 am ]
Post subject:	Re: Home-made Elo ratings for some engines
xela, thank you very much for your great work! can you explain why of all the networks LM_GX chose LM_GX47? and where can I download LM_B5 and LM_Z2?

Author:	xela [ Tue Sep 24, 2019 5:25 am ]
Post subject:	Re: Home-made Elo ratings for some engines
Thanks, glad you like it! I think GX47 was the strongest in the GX series when I started doing this (I can't remember exactly, it was a while ago). There are a few newer Leela Master networks now. Download from https://github.com/pangafu/LeelaMasterWeight For more information about how I downloaded and set up the various engines, see the other thread at https://lifein19x19.com/viewtopic.php?p=236178

Author:	And [ Sat Oct 05, 2019 1:12 pm ]
Post subject:	Re: Home-made Elo ratings for some engines
xela, I looked through all several times, but I could not find where to download LM_B5 and LM_Z2

Author:	xela [ Sat Oct 05, 2019 3:56 pm ]
Post subject:	Re: Home-made Elo ratings for some engines
Ah, it looks like some of the older networks have been removed from the Google Drive folders. You'd have to raise an issue on github and ask pangafu there if they're still available.

Author:	hydrogenpi7 [ Sun Oct 06, 2019 2:04 am ]
Post subject:	Re: Home-made Elo ratings for some engines
xela wrote: Updated with KataGo, OpenCL version (and also throwing in some recent LZ weights for comparison). Just fast games for this one, didn't get around to updating the 20 minute results. kata_6b is the 6-block network, and you can probably guess the names for 10, 15, 20 blocks. In the 1 minute games I also tried different numbers of threads but didn't see much potential for significant improvement. The suggestion in the config file of trying more threads than you have cores wasn't a success on my hardware. Results at 1 minute time limit, based on 1520 games with 72 engines: Code: Name Elo Elo+ Elo- games score avg_opp kata_15b 4224 184 166 16 75% 4041 LZ_242 4194 215 218 8 63% 4117 LZ_157 4186 94 85 74 74% 3993 LZ_188 4172 174 166 14 57% 4128 LM_GX47 4160 101 94 64 72% 3921 kata_20b 4142 179 167 16 63% 4047 LZ_ELF 4130 85 82 72 61% 4039 kata_10b 4037 118 110 44 68% 3873 LZ_ELF_6t 4024 92 97 54 39% 4100 LZ_174 3993 83 85 72 47% 4010 LZ_173 3941 107 106 54 59% 3836 ray_ELF_12t 3920 88 92 66 39% 3997 LZ_141 3907 98 97 60 58% 3805 kata_10_12t 3895 141 139 24 50% 3902 kata_10_6t 3856 130 134 28 43% 3912 LM_E8 3827 107 109 56 59% 3673 kata_10_2t 3820 140 145 24 42% 3886 LZ_116 3752 89 89 74 49% 3757 LZ_174_6t 3733 108 109 46 50% 3723 ray_173_6t 3698 107 106 40 53% 3681 kata_10_24t 3689 159 186 20 25% 3888 LM_Z2 3679 96 92 62 65% 3525 ray_173_12t 3672 112 110 36 53% 3653 LM_W11 3649 113 116 44 50% 3619 LZ_phoenix 3554 116 118 38 42% 3641 LM_B5 3545 99 99 58 57% 3460 kata_6b 3540 185 188 14 43% 3608 ray_W11_12t 3518 111 116 38 39% 3599 ray_173_2t 3489 124 124 30 50% 3489 LZ_zed 3402 119 122 36 42% 3476 leela 3378 116 115 56 55% 3298 LZ_91 3319 99 105 80 30% 3548 ray_ELF 3280 128 124 34 50% 3308 ray_173 3272 132 126 34 59% 3206 ray_W11 3130 104 103 48 50% 3139 dream_ponder 3129 119 123 40 53% 3051 AQ 3054 146 149 24 46% 3087 oakfoam_nn 2993 117 119 84 62% 2785 dream 2992 116 120 36 44% 3034 LM_GX47_c 2969 123 117 34 59% 2909 LZ_116_c2t 2865 109 112 60 30% 3142 LM_E8_c 2851 141 141 22 50% 2851 LM_B5_c 2849 135 135 24 50% 2849 LZ_116_c6t 2828 138 139 24 50% 2824 LM_Z2_c 2746 121 115 36 61% 2667 LZ_57 2744 114 116 52 50% 2725 LM_W11_c 2683 126 134 28 39% 2754 leela_c1t 2576 109 108 42 52% 2551 leela_c2t 2508 129 136 30 37% 2621 LZ_91_c2t 2506 137 140 26 46% 2533 leela_c 2499 103 101 88 59% 2377 pachi_nn 2400 111 107 76 64% 2228 pachi 2190 127 123 68 54% 2179 leela_nonet 2156 105 102 88 58% 2094 gnugo 1872 89 83 84 64% 1774 gnugo_l7 1871 120 122 52 38% 2005 LZ_57_c2t 1864 246 218 8 63% 1791 gnugo_M 1842 140 133 34 53% 1844 gnugo_l1 1823 91 89 84 48% 1882 gnugo_l4 1807 141 139 32 47% 1862 leela_nonet_1t 1758 244 255 10 10% 2160 oakfoam1 1735 126 122 32 56% 1692 pachi_pat 1711 394 220 2 50% 1711 fuego 1711 90 90 78 37% 1945 oakfoam_book 1628 113 115 40 38% 1731 pachi_1t 1585 207 110 14 14% 1894 oakfoam 1567 93 85 72 25% 1806 oakfoam2 1524 137 53 30 23% 1725 pachi_monte 1523 387 49 2 0% 1711 pachi_plain 1523 387 49 2 0% 1711 michi 1506 339 34 4 0% 1791 matilda 1437 170 -31 44 9% 1877 Results at 5 minute time limit, based on 1680 games with 59 engines: Code: Name Elo Elo+ Elo- games score avg_opp LZ_242 4662 -15 144 34 79% 4442 LZ_ELF 4506 90 87 66 64% 4389 LM_GX47 4499 97 94 58 64% 4380 kata_20b 4465 106 106 42 52% 4446 LZ_188 4454 112 114 36 47% 4473 LZ_ELF_6t 4444 92 90 62 56% 4390 LZ_157 4390 106 105 44 55% 4354 LZ_174 4259 91 89 76 61% 4149 LZ_173 4257 86 86 80 54% 4189 LZ_141 4243 87 86 82 57% 4160 kata_15b 4156 103 104 46 48% 4172 LZ_phoenix 4144 102 102 54 52% 4109 ray_ELF_12t 4134 97 99 50 48% 4144 LZ_174_6t 4093 80 78 100 57% 4034 ray_173_12t 3944 98 101 54 41% 4017 LM_Z2 3912 92 95 76 49% 3878 LM_B5 3905 112 111 46 59% 3808 ray_173_6t 3853 99 101 48 44% 3902 LZ_116 3844 84 85 98 50% 3827 LM_W11 3802 105 99 64 66% 3657 LM_E8 3787 109 106 50 56% 3740 kata_10b 3697 114 115 48 44% 3777 ray_173_2t 3682 106 108 54 52% 3659 ray_W11_12t 3636 107 99 58 67% 3497 ray_ELF 3491 98 99 66 38% 3641 AQ 3445 102 106 60 50% 3411 ray_173 3422 96 99 62 37% 3548 leela 3386 90 93 98 44% 3425 LZ_zed 3367 105 103 56 50% 3388 LZ_91 3282 92 95 76 37% 3419 kata_6b 3192 109 112 48 38% 3354 ray_W11 3190 94 93 62 50% 3202 dream_ponder 3181 110 112 44 52% 3113 dream 3091 117 121 36 44% 3130 LM_E8_c 3011 104 102 58 53% 2988 LZ_116_c2t 2959 105 104 66 55% 2916 LM_W11_c 2846 116 114 36 53% 2829 LM_GX47_c 2817 106 105 46 46% 2867 oakfoam_nn 2811 92 89 72 51% 2823 leela_c 2733 92 92 74 50% 2731 leela_c2t 2712 83 84 78 49% 2718 LZ_91_c2t 2620 106 101 56 59% 2540 LM_Z2_c 2573 109 110 38 47% 2596 LM_B5_c 2571 114 121 34 38% 2655 LZ_57 2569 113 114 38 45% 2616 leela_c1t 2507 95 100 68 41% 2596 pachi_nn 2400 107 113 64 39% 2514 pachi 2108 112 108 80 58% 2005 LZ_57_c2t 2064 132 122 40 70% 1872 leela_nonet 2058 137 150 42 36% 2157 fuego 1836 108 105 72 65% 1662 pachi_1t 1829 119 114 54 65% 1662 leela_nonet_1t 1827 125 116 52 69% 1624 gnugo 1472 214 -177 106 20% 1763 michi 1438 298 -211 40 55% 1403 oakfoam1 1258 462 -391 28 43% 1401 oakfoam 1039 657 -609 26 27% 1357 oakfoam_book 970 710 -678 32 13% 1406 matilda 947 741 -701 26 15% 1379 So based on this chart anyone with a half way decent GPU at any reasonable time intervals running latest LZ net can already play against AI opponent that is essentially stronger than AlphaGoLee and catching up to AlphaGoMaster?

Author:	xela [ Sun Oct 06, 2019 4:45 am ]
Post subject:	Re: Home-made Elo ratings for some engines
hydrogenpi7 wrote: So based on this chart anyone with a half way decent GPU at any reasonable time intervals running latest LZ net can already play against AI opponent that is essentially stronger than AlphaGoLee and catching up to AlphaGoMaster? It depends on a bunch of assumptions about how the Elo rating system works. I wouldn't dare to be that precise, but it looks to me like AIs can play at a superhuman level on ordinary PCs with a mid-range GPU.

Author:	xela [ Wed Jan 22, 2020 4:10 pm ]
Post subject:	Re: Home-made Elo ratings for some engines
Looks like someone else has done something a bit more comprehensive, although they're a bit short on details of the methodology.

Page 2 of 2	All times are UTC - 8 hours [ DST ]
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/