Life In 19x19 http://lifein19x19.com/ |
|
Home-made Elo ratings for some engines http://lifein19x19.com/viewtopic.php?f=18&t=16086 |
Page 2 of 2 |
Author: | Kris Storm [ Sun Oct 14, 2018 3:30 pm ] |
Post subject: | Re: Home-made Elo ratings for some engines |
Thanks for your explanation. That is a clever method. I have a lot of .dat files from GoGui tournaments and always wanted to make such ELO list. Maybe you can share your Python code. I'm sure it would be useful for others. |
Author: | xela [ Mon Oct 15, 2018 12:43 am ] |
Post subject: | Re: Home-made Elo ratings for some engines |
Kris Storm wrote: Maybe you can share your Python code. I'm sure it would be useful for others. It needs a bit of a rewrite before I can share it. At the moment it wouldn't work on someone else's computer because of all the hard-coded path names (and I'd be embarrassed to let it out in this shape). I'll add it to my to-do list. |
Author: | xela [ Fri Oct 19, 2018 6:16 pm ] |
Post subject: | Re: Home-made Elo ratings for some engines |
Going slow this week, because my system keeps crashing! It doesn't like the combination fo strong bots and slow games. I think I need to upgrade an nvidia driver, but it's hard to find information on which drivers are more stable. At the moment, I can queue up 8 games to be played overnight, but I'll wake up to a black screen and an unresponsive system, and when I reboot it looks like only two or three games got played. Anyway, new this week: AQ has entered the 1-minute and 5-minute ratings. It's at a disadvantage because it was trained for Japanese rules and 6.5 komi, and I'm playing all the games with Chinese rules and 7.5 komi (this works for the majority of bots). My guess is that AQ's rating will therefore be 50-100 points below its true strength, but I can't think of a good way to measure how much difference it actually makes. Looking at the 5-minute games:
A few more bots added to the 20-minute ratings. My mathematical model (post number 16 above) is looking about as bad as expected :-) At the slower time limit, it looks like LZ now gets stronger with more threads, unlike in the fast games. Results so far at 1 minute time limit, based on 1350 games with 62 engines: Results so far at 5 minute time limit, based on 1448 games with 53 engines: Results so far at 20 minute time limit, based on 188 games with 19 engines: |
Author: | xela [ Fri Oct 19, 2018 6:22 pm ] | |||
Post subject: | Re: Home-made Elo ratings for some engines | |||
Here are the two games where I thought AQ should have resigned.
|
Author: | xela [ Sat Nov 10, 2018 4:17 am ] |
Post subject: | Re: Home-made Elo ratings for some engines |
Sorry for the long gap between updates! I spent a lot of time figuring out how to update my graphics drivers, but I still haven't solved the crashing problem. It looks like I can't reliably run LZ with 6 or more threads in long games. But that's OK, I've found out what I originally wanted to, which is that LZ (even with 2 threads) seems to achieve superhuman performance on a fairly ordinary computer. I'm a little surprised to see ELF still at the top of the list, as I thought recent LZ networks had overtaken ELF at time parity. Over the next couple of weeks I'll add some more games to reduce some of the error margins, maybe throw LZ_157 into the mix, and maybe do some benchmarking to see how many visits per second I'm getting for various different networks. Oh, and for anyone who's observant: in previous posts, the Elo+ and Elo- columns were the wrong way round. I've gone back and edited the earlier posts so they're now correct. Results so far at 20 minute time limit, based on 228 games with 22 engines: |
Author: | pangafu [ Tue Dec 04, 2018 8:31 pm ] |
Post subject: | Re: Home-made Elo ratings for some engines |
@xela I am the author of LeelaMaster Weight I had seen you do some elo test with LM, so could I add this post to the readme of LeelaMaster Weigth https://github.com/pangafu/LeelaMasterWeight/ About LeelaMaster strength(elo) ..... Home-made Elo ratings for some engines (by xela@lifein19x19.com) https ://lifein19x19.com/viewtopic.php?f=18&t=16086 .... Thanks for your great work~ |
Author: | pangafu [ Tue Dec 04, 2018 8:37 pm ] |
Post subject: | Re: Home-made Elo ratings for some engines |
Hello @xela I am the author of Leela Master weight, and glad to see you do some test with lm weight. So could I add this post to the readme of Leela Master weight? About LeelaMaster strength(elo) .... Home-made Elo ratings for some engines (by xela@lifein19x19.com) viewtopic.php?f=18&t=16086 .... Please enjoy the human style of go game~ |
Author: | xela [ Sat Dec 08, 2018 4:37 am ] |
Post subject: | Re: Home-made Elo ratings for some engines |
pangafu wrote: Hello @xela I am the author of Leela Master weight, and glad to see you do some test with lm weight. So could I add this post to the readme of Leela Master weight? Yes. Thanks for asking! |
Author: | xela [ Sat Dec 08, 2018 4:49 am ] |
Post subject: | Re: Home-made Elo ratings for some engines |
Here are the final results (unless I get inspired to do more). Looking at the error bounds, we can't say for sure which of the top 6 is actually the strongest, but they all seem to be definitely in the "superhuman" range (considering that the bottom of this list is already amateur dan level). Just for interest, on my hardware LZ_174 and LZ_188 get about 300 visits per second, ELF about 700, GX47 around 1200, LZ_157 around 1500 (numbers are approximate because they vary from one game to another, possibly depending on the board position and how much of the tree is reused from previous moves). Results at 20 minute time limit, based on 426 games with 25 engines: |
Author: | xela [ Mon Sep 16, 2019 5:41 am ] | |||
Post subject: | Re: Home-made Elo ratings for some engines | |||
Updated with KataGo, OpenCL version (and also throwing in some recent LZ weights for comparison). Just fast games for this one, didn't get around to updating the 20 minute results. kata_6b is the 6-block network, and you can probably guess the names for 10, 15, 20 blocks. In the 1 minute games I also tried different numbers of threads but didn't see much potential for significant improvement. The suggestion in the config file of trying more threads than you have cores wasn't a success on my hardware. Results at 1 minute time limit, based on 1520 games with 72 engines: Results at 5 minute time limit, based on 1680 games with 59 engines:
|
Author: | And [ Sun Sep 22, 2019 10:29 am ] |
Post subject: | Re: Home-made Elo ratings for some engines |
xela, thank you very much for your great work! can you explain why of all the networks LM_GX chose LM_GX47? and where can I download LM_B5 and LM_Z2? |
Author: | xela [ Tue Sep 24, 2019 5:25 am ] |
Post subject: | Re: Home-made Elo ratings for some engines |
Thanks, glad you like it! I think GX47 was the strongest in the GX series when I started doing this (I can't remember exactly, it was a while ago). There are a few newer Leela Master networks now. Download from https://github.com/pangafu/LeelaMasterWeight For more information about how I downloaded and set up the various engines, see the other thread at https://lifein19x19.com/viewtopic.php?p=236178 |
Author: | And [ Sat Oct 05, 2019 1:12 pm ] |
Post subject: | Re: Home-made Elo ratings for some engines |
xela, I looked through all several times, but I could not find where to download LM_B5 and LM_Z2 |
Author: | xela [ Sat Oct 05, 2019 3:56 pm ] |
Post subject: | Re: Home-made Elo ratings for some engines |
Ah, it looks like some of the older networks have been removed from the Google Drive folders. You'd have to raise an issue on github and ask pangafu there if they're still available. |
Author: | hydrogenpi7 [ Sun Oct 06, 2019 2:04 am ] |
Post subject: | Re: Home-made Elo ratings for some engines |
xela wrote: Updated with KataGo, OpenCL version (and also throwing in some recent LZ weights for comparison). Just fast games for this one, didn't get around to updating the 20 minute results. kata_6b is the 6-block network, and you can probably guess the names for 10, 15, 20 blocks. In the 1 minute games I also tried different numbers of threads but didn't see much potential for significant improvement. The suggestion in the config file of trying more threads than you have cores wasn't a success on my hardware. Results at 1 minute time limit, based on 1520 games with 72 engines: Results at 5 minute time limit, based on 1680 games with 59 engines: So based on this chart anyone with a half way decent GPU at any reasonable time intervals running latest LZ net can already play against AI opponent that is essentially stronger than AlphaGoLee and catching up to AlphaGoMaster? |
Author: | xela [ Sun Oct 06, 2019 4:45 am ] |
Post subject: | Re: Home-made Elo ratings for some engines |
hydrogenpi7 wrote: So based on this chart anyone with a half way decent GPU at any reasonable time intervals running latest LZ net can already play against AI opponent that is essentially stronger than AlphaGoLee and catching up to AlphaGoMaster? It depends on a bunch of assumptions about how the Elo rating system works. I wouldn't dare to be that precise, but it looks to me like AIs can play at a superhuman level on ordinary PCs with a mid-range GPU. |
Author: | xela [ Wed Jan 22, 2020 4:10 pm ] |
Post subject: | Re: Home-made Elo ratings for some engines |
Looks like someone else has done something a bit more comprehensive, although they're a bit short on details of the methodology. |
Page 2 of 2 | All times are UTC - 8 hours [ DST ] |
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |