Life In 19x19 http://lifein19x19.com/ |
|
Engine Tournament http://lifein19x19.com/viewtopic.php?f=18&t=13322 |
Page 8 of 20 |
Author: | as0770 [ Sat Jan 20, 2018 2:16 pm ] |
Post subject: | Re: Engine Tournament |
q30 wrote: Try to use time control equivalent to 2 min per move. In this case fluctuations will be much smaller and difference between pondering and not pondering will be measurable... You have no idea what you are talking about. Standard deviation doesn't change with the timecontrol. |
Author: | as0770 [ Sat Jan 20, 2018 2:24 pm ] |
Post subject: | Re: Engine Tournament |
New entry in League B is Dream Go, update in League C Mathilda 1.25 Leela vs. AQ Code: 1. AQ 2.0.3 12/16 2. Leela 0.11.0 Beta 11 4/16 Configuration: League A: Code: 1. Leela 0.10.0 22/24 2. Rayon 4.6.0 19/24 3. Oakfoam 0.2.1 NG-06 18/24 4. Hiratuka 10.37B (CPU) 9/24 5. DarkForest v2 MCTS 1.0 7/24 6 DarkGo 1.0 5/24 7. Pachi DCNN 11.99 4/24 Configuration: League B: Code: 1. Ray 9.0.1 31/36 2. Pachi DCNN 11.99 30/36 3. Dream Go 0.5.0 29/36 4. Leela Zero 0.9 (2018.01.01) 21/36 5. MoGo 4.86 18/36 6. deltaGo 1.0.0 18/36 7. Fuego 1.1 15/36 8. Michi C-2 1.4.2 8/36 9. Orego 7.08 8/36 10. GNU Go 3.8 2/36 Configuration: League C: Code: 1. GNU Go 3.8 25/28 2. Hara 0.9 18/28 3. Matilda 1.25 16/28 4. Indigo 2009 16/28 5. Dariush 3.1.5.7 15/28 6. Aya 6.34 13/28 7. Fudo Go 3.0 7/28 8. JrefBot 081016-2022 2/28 Configuration: League D: Code: 1. JrefBot 081016-2022 16/18 2. Iomrascálaí 0.3.2 15/18 3. Crazy Patterns 0008-13 13/18 4. Marcos Go 1.0 13/18 5. AmiGo 1.8 13/18 6. Beancounter 0.1 8/18 7. Stop 0.9-005 5/18 8. GoTraxx 1.4.2 3/18 0. CopyBot 0.1 2/18 10. Brown 1.0 2/18 Configuration: Links: Best, Alex |
Author: | lemonsqueez [ Sun Jan 21, 2018 6:59 am ] |
Post subject: | Re: Engine Tournament |
Thanks for running these tournaments, impressive lineup ! Right now the leagues are based on strength more or less if i understand correctly. Just an idea: how about a gpu league and a cpu league for the top programs ? This is already the case mostly, what i mean is that for programs like Leela which can do both, it'd be interesting to see how the cpu version fares. Not sure how practical this would be. Maybe you don't want to have to unplug your graphic card to prevent it from using the gpu =) |
Author: | as0770 [ Sun Jan 21, 2018 10:24 pm ] |
Post subject: | Re: Engine Tournament |
lemonsqueez wrote: Thanks for running these tournaments, impressive lineup ! Right now the leagues are based on strength more or less if i understand correctly. You do One or two engines play in the upper and lower League to have a virtual connection between the Leagues. lemonsqueez wrote: Just an idea: how about a gpu league and a cpu league for the top programs ? This is already the case mostly, what i mean is that for programs like Leela which can do both, it'd be interesting to see how the cpu version fares. Not sure how practical this would be. Maybe you don't want to have to unplug your graphic card to prevent it from using the gpu =) I want to have a comparison between GPU and CPU engines so I don't want to make different Leagues. In fact I tried to run the GPU engines also in CPU mode, this works well with Leela. It would play in the same League as Leela GPU but I don't want to have one engine playing twice in one League. This would strain the results. But you can find Leela CPU in the history Rayon CPU is basically Ray 9.0.1 afaik. AQ won't work as CPU engine here and for Oakfoam as CPU engine I have to adjust some parameters and there where problems running it. And btw it is very, very weak. For other engines I would have to change the system configuration, but I don't want to mess up my system like this. So after all I would like to run the engines in both modes but I would face too many problems... |
Author: | as0770 [ Fri Jan 26, 2018 10:37 am ] |
Post subject: | Re: Engine Tournament |
This time I downsized the Leagues to get space for new entries. In League B Leela Zero is updated to v0.11 and the last 5x64 network (2018.01.17). After that they changed to a 6x128 network. Also there is a new Engine in League E: SimpleGo 0.4.3 Leela vs. AQ Code: 1. AQ 2.0.3 12/16 2. Leela 0.11.0 Beta 11 4/16 League A: Code: 1. Leela 0.10.0 22/24 2. Rayon 4.6.0 19/24 3. Oakfoam 0.2.1 NG-06 18/24 4. Hiratuka 10.37B (CPU) 9/24 5. DarkForest v2 MCTS 1.0 7/24 6 DarkGo 1.0 5/24 7. Pachi DCNN 11.99 4/24 League B: Code: 1. Leela Zero 0.11 (2018.01.17) 15/20 2. Pachi DCNN 11.99 13/20 3. DarkGo 1.0 12/20 4. Dream Go 0.5.0 11/20 5. Ray 9.0.1 7/20 6. Mogo 4.86 2/20 League C: Code: 1. MoGo 4.86 18/20 2. deltaGo 1.0.0 14/20 3. Fuego 1.1 13/20 4. Michi C-2 1.4.2 8/20 5. Orego 7.08 5/20 6. GNU Go 3.8 2/20 League D: Code: 1. GNU Go 3.8 25/28 2. Hara 0.9 18/28 3. Matilda 1.25 16/28 4. Indigo 2009 16/28 5. Dariush 3.1.5.7 15/28 6. Aya 6.34 13/28 7. Fudo Go 3.0 7/28 8. JrefBot 081016-2022 2/28 League E: Code: 1. JrefBot 081016-2022 16/20 2. Iomrascálaí 0.3.2 12/20 3. SimpleGo 0.4.3 11/20 4. Crazy Patterns 0008-13 7/20 5. Marcos Go 1.0 7/20 6. AmiGo 1.8 7/20 League F: Code: 1. AmiGo 1.8 19/20 2. Beancounter 0.1 15/20 3. Stop 0.9-005 10/20 4. GoTraxx 1.4.2 7/20 5. CopyBot 0.1 6/20 6. Brown 1.0 3/20 Configuration: Links: Best, Alex |
Author: | q30 [ Sat Jan 27, 2018 6:57 am ] |
Post subject: | Re: Engine Tournament |
as0770 wrote: You have no idea what you are talking about. Standard deviation doesn't change with the timecontrol. It depends on game randomness, that changes with the time control... |
Author: | Cyan [ Sat Jan 27, 2018 10:10 am ] |
Post subject: | Re: Engine Tournament |
Some strong bots have been updated: AQ v2.1.1 Leela 0.11.0 Ray 4.32 Pachi 12.00 |
Author: | as0770 [ Sat Jan 27, 2018 11:01 am ] |
Post subject: | Re: Engine Tournament |
q30 wrote: as0770 wrote: You have no idea what you are talking about. Standard deviation doesn't change with the timecontrol. It depends on game randomness, that changes with the time control... Then we have to rewrite basic mathematical principles. |
Author: | as0770 [ Sat Jan 27, 2018 11:04 am ] |
Post subject: | Re: Engine Tournament |
Cyan wrote: Some strong bots have been updated: AQ v2.1.1 Leela 0.11.0 Ray 4.32 Pachi 12.00 Leela 0.11 is running and I'll wait for the official release of Pachi 12. I'll take a look at the others, thank you. Edit: Tried to compile Ray 4.32 without success. First I had to get some libs of cntk 2.1 although in the readme the cntk version is 2.3. Then I ran into the next error messages. The author is not much interested in making the installation easier, not even in keeping the readme up to date. That's no problem, but I'll leave Rn until it is easier to install. |
Author: | pnprog [ Mon Jan 29, 2018 1:10 am ] |
Post subject: | Re: Engine Tournament |
as0770 wrote: Edit: Tried to compile Ray 4.32 without success. First I had to get some libs of cntk 2.1 although in the readme the cntk version is 2.3. Then I ran into the next error messages. The author is not much interested in making the installation easier, not even in keeping the readme up to date. That's no problem, but I'll leave Rn until it is easier to install. Yes, and sadly it's doing prejudice to this otherwise strong bot It would be nice to have somebody who is well versed in unbuntu/apt/ppa/compilation/deb to implement a dedicated ppa for ubuntu and all usual go software. This ppa would include updated deb files for commonly used go program (Leela, Sabaki...). Those hard to compile/install programs like Ray would come in sort of containers or snap applications (with all dependencies included). Not quite a come back of Hikarunix, but still a big improvement. |
Author: | zakki [ Mon Jan 29, 2018 11:37 pm ] |
Post subject: | Re: Engine Tournament |
I usually use Windows, and Rn has no maintainer on Linux. Pull requests is welcomed. |
Author: | pnprog [ Wed Jan 31, 2018 2:50 am ] |
Post subject: | Re: Engine Tournament |
zakki wrote: I usually use Windows, and Rn has no maintainer on Linux. I am confident that at some point, somebody will show up and provide some help for linux, keep up the good work!
Pull requests is welcomed. |
Author: | as0770 [ Fri Feb 02, 2018 12:04 am ] |
Post subject: | Re: Engine Tournament |
Now in League A: Leela Zero 5773f44c (2018.01.26), it lost 5 games because of a ladder. Also Leela is updated to v0.11.0. Leela vs. AQ Code: 1. AQ 2.0.3 12/16 2. Leela 0.11.0 Beta 11 4/16 League A: Code: 1. Leela 0.11.0 18/20 2. Rayon 4.6.0 15/20 3. Oakfoam 0.2.1 NG-06 12/20 4. Leela Zero 0.11 5773f44c 7/20 5. Hiratuka 10.37B (CPU) 6/20 6. DarkForest v2 MCTS 1.0 2/20 League B: Code: 1. Leela Zero 0.11 c83e1b6e 15/20 2. Pachi DCNN 11.99 13/20 3. DarkGo 1.0 12/20 4. Dream Go 0.5.0 11/20 5. Ray 9.0.1 7/20 6. Mogo 4.86 2/20 League C: Code: 1. MoGo 4.86 18/20 2. deltaGo 1.0.0 14/20 3. Fuego 1.1 13/20 4. Michi C-2 1.4.2 8/20 5. Orego 7.08 5/20 6. GNU Go 3.8 2/20 League D: Code: 1. GNU Go 3.8 25/28 2. Hara 0.9 18/28 3. Matilda 1.25 16/28 4. Indigo 2009 16/28 5. Dariush 3.1.5.7 15/28 6. Aya 6.34 13/28 7. Fudo Go 3.0 7/28 8. JrefBot 081016-2022 2/28 League E: Code: 1. JrefBot 081016-2022 16/20 2. Iomrascálaí 0.3.2 12/20 3. SimpleGo 0.4.3 11/20 4. Crazy Patterns 0008-13 7/20 5. Marcos Go 1.0 7/20 6. AmiGo 1.8 7/20 League F: Code: 1. AmiGo 1.8 19/20 2. Beancounter 0.1 15/20 3. Stop 0.9-005 10/20 4. GoTraxx 1.4.2 7/20 5. CopyBot 0.1 6/20 6. Brown 1.0 3/20 Configuration: Links: Best, Alex |
Author: | q30 [ Sat Feb 03, 2018 3:04 am ] |
Post subject: | Re: Engine Tournament |
as0770 wrote: q30 wrote: as0770 wrote: You have no idea what you are talking about. Standard deviation doesn't change with the timecontrol. It depends on game randomness, that changes with the time control... Then we have to rewrite basic mathematical principles. So, it will be good, if we will rewrite Your representations about basic mathematical principles... For beginning, Standard deviation is square root of give right translation to English Yourself, that can be determined by next: https://wikimedia.org/api/rest_v1/media/math/render/svg/1d1610b913011b6744f23f47e0920974b7f78f58, where pi in our case depends among others on time control... |
Author: | as0770 [ Sat Feb 03, 2018 5:15 am ] |
Post subject: | Re: Engine Tournament |
q30 wrote: So, it will be good, if we will rewrite Your representations about basic mathematical principles... For beginning, Standard deviation is square root of give right translation to English Yourself, that can be determined by next: https://wikimedia.org/api/rest_v1/media/math/render/svg/1d1610b913011b6744f23f47e0920974b7f78f58, where pi in our case depends among others on time control... Nice you tried to understand my points. Of course the probability _can_ change _slightly_ with the time control. But the result in a 1h match and a 2h match will be more or less the same. What you claim is that the result of a 2h match will show the relative strength more accurate than a 1h match, and that is nonsense. Two engines of equal strength will have a 50% chance for a 1-1, a 25%chance for a 0-2 and 25% chance for a 2-0. If you double the time control from 1h to 2h the over all winning probability will _maybe_ change to 51:49%. Experience in engine matches in chess is that you get basically the same results in 1min/game and 2h/game as long as there is no significant bug. The difference of 1h/game and 2h/game match is not measurable. There is no reason why it should be different in Go. Even if the probability changes to 55:45%, you would need hundreds of games to prove the difference in strength. What I do is a tournament with 20 or 30 games. If I run the tournament twice I can get completely different results. This won't change with 2h/games or pondering on (League A is 2h on 4 threads btw). |
Author: | lightvector [ Sat Feb 03, 2018 6:59 am ] |
Post subject: | Re: Engine Tournament |
as0770 wrote: q30 wrote: So, it will be good, if we will rewrite Your representations about basic mathematical principles... For beginning, Standard deviation is square root of give right translation to English Yourself, that can be determined by next: https://wikimedia.org/api/rest_v1/media/math/render/svg/1d1610b913011b6744f23f47e0920974b7f78f58, where pi in our case depends among others on time control... Nice you tried to understand my points. Of course the probability _can_ change _slightly_ with the time control. But the result in a 1h match and a 2h match will be more or less the same. What you claim is that the result of a 2h match will show the relative strength more accurate than a 1h match, and that is nonsense. Two engines of equal strength will have a 50% chance for a 1-1, a 25%chance for a 0-2 and 25% chance for a 2-0. If you double the time control from 1h to 2h the over all winning probability will _maybe_ change to 51:49%. Experience in engine matches in chess is that you get basically the same results in 1min/game and 2h/game as long as there is no significant bug. The difference of 1h/game and 2h/game match is not measurable. There is no reason why it should be different in Go. Even if the probability changes to 55:45%, you would need hundreds of games to prove the difference in strength. What I do is a tournament with 20 or 30 games. If I run the tournament twice I can get completely different results. This won't change with 2h/games or pondering on (League A is 2h on 4 threads btw). Although, it's best not to take this heuristic too seriously, because a nontrivial change is possible. I haven't read it that closely, but my skim of the following thread https://github.com/gcp/leela-zero/issues/667 suggested that that Leela Zero has sometimes got noticeably different results between very small numbers of playouts, like 5, and a larger number number of playouts, like 1600, where the relative strength difference and even sometimes the ordering of strength would change between the neural nets. It's not actually not surprising at all to me that Leela Zero in some cases could have quite a large difference in strength between tiny numbers of playouts and large numbers of playouts, enough to change the ordering between nets. For example new candidate nets often appear to vary in strength on the order of multiple hundreds of Elos, so training is very noisy, and there's no reason to expect that the quality of the policy part of the neural net and the value part of the neural net always vary together in the same way. And thinking in those terms, it's pretty obvious that you're measuring something fairly different at 5 playouts vs at 1600 playouts. With very few playouts you rely on the policy net more heavily. I agree that if you're only running 20 or 30 games, then of course none of this matters, the noise in 20 to 30 games still dwarfs this. |
Author: | q30 [ Sat Feb 03, 2018 9:31 am ] |
Post subject: | Re: Engine Tournament |
Quote: as0770 wrote: Quote: q30 wrote: So, it will be good, if we will rewrite Your representations about basic mathematical principles... For beginning, Standard deviation is square root of give right translation to English Yourself, that can be determined by next: https://wikimedia.org/api/rest_v1/media ... 74b7f78f58, where pi in our case depends among others on time control... Nice you tried to understand my points. Of course the probability _can_ change _slightly_ with the time control. But the result in a 1h match and a 2h match will be more or less the same. What you claim is that the result of a 2h match will show the relative strength more accurate than a 1h match, and that is nonsense. Two engines of equal strength will have a 50% chance for a 1-1, a 25%chance for a 0-2 and 25% chance for a 2-0. If you double the time control from 1h to 2h the over all winning probability will _maybe_ change to 51:49%. Experience in engine matches in chess is that you get basically the same results in 1min/game and 2h/game as long as there is no significant bug. The difference of 1h/game and 2h/game match is not measurable. There is no reason why it should be different in Go. Even if the probability changes to 55:45%, you would need hundreds of games to prove the difference in strength. What I do is a tournament with 20 or 30 games. If I run the tournament twice I can get completely different results. This won't change with 2h/games or pondering on (League A is 2h on 4 threads btw). You are quite right, if there is the same engine sparring. But even if there will be 2 simple MC engines (which will in sparring demonstrate mentioned by You chances with time on move --> 0), it may be difference in strength (i.e. in chances) dependent on time control because of difference in best move choice algorithm (and especially more complex engines with more complex algorithms). You can try to compare 2 engines (with close strength levels) results with time and thread control, that You have used for league B-F, and results of these engines sparring with 2' per move and 4 threads... |
Author: | as0770 [ Sat Feb 03, 2018 1:07 pm ] |
Post subject: | Re: Engine Tournament |
q30 wrote: You are quite right, if there is the same engine sparring. But even if there will be 2 simple MC engines (which will in sparring demonstrate mentioned by You chances with time on move --> 0), it may be difference in strength (i.e. in chances) dependent on time control because of difference in best move choice algorithm (and especially more complex engines with more complex algorithms). You can try to compare 2 engines (with close strength levels) results with time and thread control, that You have used for league B-F, and results of these engines sparring with 2' per move and 4 threads... You don't get the point. The statistical fluctuation is way too high to meassure little differences in strength. I won't play hundreds of games to prove you wrong. Once again: This are two matches with the same engines and the same conditions: as0770 wrote: Pachi vs. Hiratuka 8:8 Pachi vs. Hiratuka 2:14 This discussion doesn't make any sense. No more replies by me. |
Author: | pnprog [ Sat Feb 03, 2018 10:40 pm ] |
Post subject: | Re: Engine Tournament |
as0770 wrote: Now in League A: Leela Zero 5773f44c (2018.01.26), it lost 5 games because of a ladder. Also Leela is updated to v0.11.0. Thanks for running the tournament and sharing the result. It's nice also to have the list of internet links |
Author: | as0770 [ Sun Feb 04, 2018 2:24 am ] |
Post subject: | Re: Engine Tournament |
lightvector wrote: Although, it's best not to take this heuristic too seriously, because a nontrivial change is possible. I haven't read it that closely, but my skim of the following thread https://github.com/gcp/leela-zero/issues/667 suggested that that Leela Zero has sometimes got noticeably different results between very small numbers of playouts, like 5, and a larger number number of playouts, like 1600, where the relative strength difference and even sometimes the ordering of strength would change between the neural nets. It's not actually not surprising at all to me that Leela Zero in some cases could have quite a large difference in strength between tiny numbers of playouts and large numbers of playouts, enough to change the ordering between nets. For example new candidate nets often appear to vary in strength on the order of multiple hundreds of Elos, so training is very noisy, and there's no reason to expect that the quality of the policy part of the neural net and the value part of the neural net always vary together in the same way. And thinking in those terms, it's pretty obvious that you're measuring something fairly different at 5 playouts vs at 1600 playouts. With very few playouts you rely on the policy net more heavily. I agree that if you're only running 20 or 30 games, then of course none of this matters, the noise in 20 to 30 games still dwarfs this. Of course with 5 playouts there will be different results, but we are talking about 1h/game vs 2h/game what is 7000 vs. 14000 playouts on my system. It is also funny to follow the history when I replace or remove some engines, look at Ray: Code: 1. Ray 9.0.1 29/32 2. Pachi DCNN 11.99 28/32 3. Leela Zero 0.9 (2018.01.01) 19/32 4. MoGo 4.86 18/32 5. deltaGo 1.0.0 17/32 6. Fuego 1.1 15/32 7. Michi C-2 1.4.2 8/32 8. Orego 7.08 8/32 9. GNU Go 3.8 2/32 Code: 1. Leela Zero 0.11 c83e1b6e 15/20 2. Pachi DCNN 11.99 13/20 3. DarkGo 1.0 12/20 4. Dream Go 0.5.0 11/20 5. Ray 9.0.1 7/20 6. Mogo 4.86 2/20 And at DreamGo: Code: 1. DreamGo 0.5.0 15/20 2. DarkForest v2 MCTS 1.0 12/20 3. Pachi DCNN 11.99 12/20 4. DarkGo 1.0 10/20 5. Ray 9.0.1 9/20 6. Mogo 4.86 2/20 It do not replay the whole tournament, I just remove the old engines and add the new ones. |
Page 8 of 20 | All times are UTC - 8 hours [ DST ] |
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |