Life In 19x19 http://lifein19x19.com/ |
|
Engine Tournament http://lifein19x19.com/viewtopic.php?f=18&t=13322 |
Page 7 of 20 |
Author: | as0770 [ Thu Dec 07, 2017 3:56 pm ] |
Post subject: | Re: Engine Tournament |
New entries in League D are Leela Zero with the netfork file from 2017-12-03 and Beancounter, my own attempt to write a go engine. Leela vs. AQ Code: 1. AQ 2.0.3 12/16 2. Leela 0.11.0 Beta 11 4/16 Configuration: League A: Code: 1. Leela 0.10.0 22/24 2. Rayon 4.6.0 19/24 3. Oakfoam 0.2.1 NG-06 18/24 4. Hiratuka 10.37B (CPU) 9/24 5. DarkForest v2 MCTS 1.0 7/24 6 DarkGo 1.0 5/24 7. Pachi DCNN 11.99 4/24 Configuration: League B: Code: 1. Pachi DCNN 11.99 29/32 2. Ray 9.0.1 27/32 3. MoGo 4.86 21/32 4. deltaGo 1.0.0 17/32 5. Fuego 1.1 17/32 6. Leela Zero 0.1 15/32 7. Michi C-2 1.4.2 9/32 8. Orego 7.08 7/32 9. GNU Go 3.8 2/32 Configuration: League C: Code: 1. GNU Go 3.8 24/28 2. Hara 0.9 18/28 3. Dariush 3.1.5.7 16/28 4. Indigo 2009 15/28 5. Matilda 1.24 15/28 6. Aya 6.34 11/28 7. Fudo Go 3.0 11/28 8. JrefBot 081016-2022 2/28 Configuration: League D: Code: 1. JrefBot 081016-2022 18/20 2. Iomrascálaí 0.3.2 17/20 3. Crazy Patterns 0008-13 15/20 4. Marcos Go 1.0 15/20 5. AmiGo 1.8 15/20 6. Beancounter 0.1 10/20 7. Leela Zero 0.6 (2017-12-03) 7/20 8. Stop 0.9-005 5/20 9. GoTraxx 1.4.2 4/20 10. CopyBot 0.1 2/20 11. Brown 1.0 2/20 Configuration: Links: Best, Alex |
Author: | as0770 [ Fri Dec 08, 2017 12:16 pm ] |
Post subject: | Re: Engine Tournament |
as0770 wrote: Code: 1. JrefBot 081016-2022 18/20 2. Iomrascálaí 0.3.2 17/20 3. Crazy Patterns 0008-13 15/20 4. Marcos Go 1.0 15/20 5. AmiGo 1.8 15/20 6. Beancounter 0.1 10/20 7. Leela Zero 0.6 (2017-12-03) 7/20 8. Stop 0.9-005 5/20 9. GoTraxx 1.4.2 4/20 10. CopyBot 0.1 2/20 11. Brown 1.0 2/20 Four more days of training for Leela Zero and this is the result: Code: 1. JrefBot 081016-2022 18/20
2. Iomrascálaí 0.3.2 17/20 3. Crazy Patterns 0008-13 15/20 4. AmiGo 1.8 14/20 5. Marcos Go 1.0 13/20 6. Leela Zero 0.8 (2017-12-07) 12/20 7. Beancounter 0.1 8/20 8. Stop 0.9-005 5/20 9. GoTraxx 1.4.2 4/20 10. CopyBot 0.1 2/20 11. Brown 1.0 2/20 |
Author: | q30 [ Sat Dec 09, 2017 5:32 am ] |
Post subject: | Re: Engine Tournament |
as0770 wrote: ...The difference between ponder on and ponder off matches is nearly irrelevant compared to all the other influences and for sure interfered by the statistical fluctuation when rating the engines with less than 100 games. For same strength engines sparring ponder is significant parameter... But most significant (for the game strength) difference of these synthetic tests and real prof. games is in the time control parameter values. |
Author: | as0770 [ Sun Dec 10, 2017 9:29 am ] |
Post subject: | Re: Engine Tournament |
q30 wrote: For same strength engines sparring ponder is significant parameter... Based on what tests? I played thousands of computer AI games, even if one engine is pondering, and one not, you need hundrets of games to see the difference. |
Author: | q30 [ Sat Dec 16, 2017 1:25 am ] |
Post subject: | Re: Engine Tournament |
as0770 wrote: Based on what tests? ... On tests with close to real game parameters (2'' on move, for example). |
Author: | as0770 [ Sun Dec 17, 2017 11:56 pm ] |
Post subject: | Re: Engine Tournament |
q30 wrote: as0770 wrote: Based on what tests? ... On tests with close to real game parameters (2'' on move, for example). In chess doubling the calculating time will make Engines stronger by 60 ELO Points. Engines of the same strength have about 50% ponderhits. So the difference of a pondering Engine to a non pondering Engine is about 30 ELO. You need more than 1000 games to measure an ELO difference of 30 ELO. And we are talking about ponder vs. no ponder. In Go the difference is even smaller because there are less ponderhits. Also the ELO gap between engines and their pondering ELO gain is much less 30 ELO. So after all it is simple impossible to meassure a difference in the ELO gain with pomdering with such a small amount of games. |
Author: | q30 [ Sat Dec 23, 2017 12:51 am ] |
Post subject: | Re: Engine Tournament |
as0770 wrote: In chess doubling the calculating time will make Engines stronger by 60 ELO Points... It's not absolutely linear strength(time) dependency. So strength increasing value by doubling time will depend on absolute time value. Pondering may not affect only on simple MC engines, such MoGo, where, for example, increasing by "--earlyCut 0" thinking time doesn't make the engine game stronger. I'll test pondering effect on MoGo, Pachi, Ray and Leela soon. |
Author: | as0770 [ Tue Dec 26, 2017 8:34 am ] |
Post subject: | Re: Engine Tournament |
q30 wrote: as0770 wrote: In chess doubling the calculating time will make Engines stronger by 60 ELO Points... It's not absolutely linear strength(time) dependency. So strength increasing value by doubling time will depend on absolute time value 30 years of statistics in computerchess say something different. |
Author: | Cyan [ Wed Dec 27, 2017 9:37 am ] |
Post subject: | Re: Engine Tournament |
Leela Zero is much stronger now, can you test it again please? |
Author: | as0770 [ Wed Dec 27, 2017 10:06 am ] |
Post subject: | Re: Engine Tournament |
Cyan wrote: Leela Zero is much stronger now, can you test it again please? I'd love to do so, but I am on vacations I'll be back in 1-2 weeks. |
Author: | lightvector [ Wed Dec 27, 2017 1:03 pm ] |
Post subject: | Re: Engine Tournament |
as0770 wrote: q30 wrote: as0770 wrote: In chess doubling the calculating time will make Engines stronger by 60 ELO Points... It's not absolutely linear strength(time) dependency. So strength increasing value by doubling time will depend on absolute time value 30 years of statistics in computerchess say something different. Are you sure? I'm pretty sure I recall more than one case of computer chess statistics, one informally posted on a forum, and one from some published paper, indicating slightly sublinear elo gains with log(time). Possibly other attempts I didn't see got more linear results, perhaps it depends a little on the engine and perhaps you only see it if you test a wide enough range. The order of magnitude differences were like a +35 elo difference for a given time multiplication factor becoming a +25 elo difference for that time multiplication factor between the ends of a range that was 5 or 6 orders of magnitude wide, or something like that (those numbers are all made up, I'm just trying to convey the rough scale of things that I fuzzily recall). So, not a big difference, but still a bit nonlinear. Unless I just made up those memories. |
Author: | as0770 [ Thu Dec 28, 2017 10:12 am ] |
Post subject: | Re: Engine Tournament |
lightvector wrote: Are you sure? I'm pretty sure I recall more than one case of computer chess statistics, one informally posted on a forum, and one from some published paper, indicating slightly sublinear elo gains with log(time). Possibly other attempts I didn't see got more linear results, perhaps it depends a little on the engine and perhaps you only see it if you test a wide enough range. The order of magnitude differences were like a +35 elo difference for a given time multiplication factor becoming a +25 elo difference for that time multiplication factor between the ends of a range that was 5 or 6 orders of magnitude wide, or something like that (those numbers are all made up, I'm just trying to convey the rough scale of things that I fuzzily recall). So, not a big difference, but still a bit nonlinear. Unless I just made up those memories. Indeed this is true. I defalcated that with faster hardware or longer timecontrol there is a slight decrase in the ELO gain. But we are talking about a decrase from a 70 ELO gain on an 286 30 years ago to a 60, maybe 50 ELO gain nowadays. This has something to do with the increasing amount of draws with nearly perfect play. |
Author: | q30 [ Fri Dec 29, 2017 8:15 am ] |
Post subject: | Re: Engine Tournament |
The results (with pondering - without pondering): MoGo 3 - 1; Pachi 3 - 1; Ray 3 - 1; Leela 3 - 1; in all 12 - 4 (details). I don't know, what about quantitatively results (in ELO), but definitely there is qualitative effect, and in sparrings of equivalent strength Go engines the same with pondering may pass in rating engine without pondering. |
Author: | as0770 [ Mon Jan 01, 2018 4:29 pm ] |
Post subject: | Re: Engine Tournament |
q30 wrote: ... and in sparrings of equivalent strength Go engines the same with pondering may pass in rating engine without pondering. So we agree that the question is only relevant in matches between engines where one is able to ponder and the other engine is not? Fine. In your ratinglist Hiratuka is the only engine that does not ponder. Once you had DarkGo which moves instandly, Hira is limited to one minute. So for both the question of the absolute timecontrol and CPU power is much more relevant than the question of running it in ponder on or ponder off matches. And still our results are similar to equal. So where is your point always claiming others as "synthetic" results? Your testing does not become more precious by depreciating others. For me it does not make any sense to test engines made for GPU support without GPU, so I have to play ponder off to get realistic results. |
Author: | as0770 [ Wed Jan 03, 2018 6:48 am ] |
Post subject: | Re: Engine Tournament |
Finally a Leela Zero update. v0.9 with the network file from 2018.1.1 makes it into League B and is now stronger than the human trained version 0.1 which was placed 6th in League B with 15 points against the same opponents. Leela vs. AQ Code: 1. AQ 2.0.3 12/16 2. Leela 0.11.0 Beta 11 4/16 Configuration: League A: Code: 1. Leela 0.10.0 22/24 2. Rayon 4.6.0 19/24 3. Oakfoam 0.2.1 NG-06 18/24 4. Hiratuka 10.37B (CPU) 9/24 5. DarkForest v2 MCTS 1.0 7/24 6 DarkGo 1.0 5/24 7. Pachi DCNN 11.99 4/24 Configuration: League B: Code: 1. Ray 9.0.1 29/32 2. Pachi DCNN 11.99 28/32 3. Leela Zero 0.9 (2018.01.01) 19/32 4. MoGo 4.86 18/32 5. deltaGo 1.0.0 17/32 6. Fuego 1.1 15/32 7. Michi C-2 1.4.2 8/32 8. Orego 7.08 8/32 9. GNU Go 3.8 2/32 Configuration: League C: Code: 1. GNU Go 3.8 24/28 2. Hara 0.9 18/28 3. Dariush 3.1.5.7 16/28 4. Indigo 2009 15/28 5. Matilda 1.24 15/28 6. Aya 6.34 11/28 7. Fudo Go 3.0 11/28 8. JrefBot 081016-2022 2/28 Configuration: League D: Code: 1. JrefBot 081016-2022 16/18 2. Iomrascálaí 0.3.2 15/18 3. Crazy Patterns 0008-13 13/18 4. Marcos Go 1.0 13/18 5. AmiGo 1.8 13/18 6. Beancounter 0.1 8/18 7. Stop 0.9-005 5/18 8. GoTraxx 1.4.2 3/18 0. CopyBot 0.1 2/18 10. Brown 1.0 2/18 Configuration: Links: Best, Alex |
Author: | LetterRip [ Fri Jan 05, 2018 3:05 pm ] |
Post subject: | Re: Engine Tournament |
Any idea how much time per move and how many playouts were being used by LZ? |
Author: | as0770 [ Sat Jan 06, 2018 1:11 am ] |
Post subject: | Re: Engine Tournament |
LetterRip wrote: Any idea how much time per move and how many playouts were being used by LZ? 1h/game on one thread. Starting with 50sec/move what is around 7000 playouts. |
Author: | as0770 [ Sat Jan 06, 2018 7:32 am ] |
Post subject: | Re: Engine Tournament |
q30 wrote: The results (with pondering - without pondering): MoGo 3 - 1; Pachi 3 - 1; Ray 3 - 1; Leela 3 - 1; in all 12 - 4 (details). I don't know, what about quantitatively results (in ELO), but definitely there is qualitative effect, and in sparrings of equivalent strength Go engines the same with pondering may pass in rating engine without pondering. One more thing about this test: If you play with one engine against itself the ponder hits are close to 100%, That means the pondering side will benefit a lot more than against other engines. I did also some testing: In my League A there are two engines that do not ponder: Hiratuka and DarkGo. DarkGo plays instandly, so it doesn't matter if the opponent is pondering because he don't get any time to ponder. So the only Engine that may be affected is Hiratuka, so I testet Hiratuka against Pachi in 30min games with ponder on and ponder off. First Round: Ponder Off: Pachi vs. Hiratuka 5:11 Ponder On: Pachi vs. Hiratuka 8:8 This could indeed be a hint that Pachi becomes significantly stronger with pondering. Because I don't believe this I did the same match again with the same conditions. Now I got: Ponder Off: Pachi vs. Hiratuka 6:10 Ponder On: Pachi vs. Hiratuka 2:14 That is exactly what I expected: The statistical fluctuation when playing matches between engines with similar strength is very high, just like rolling a dice. The difference between pondering and not pondering is simply not meassurable with such a small amount of games. |
Author: | q30 [ Sat Jan 20, 2018 8:18 am ] |
Post subject: | Re: Engine Tournament |
as0770 wrote: q30 wrote: ... and in sparrings of equivalent strength Go engines the same with pondering may pass in rating engine without pondering. So we agree that the question is only relevant in matches between engines where one is able to ponder and the other engine is not? ... So for both the question of the absolute timecontrol and CPU power is much more relevant than the question of running it in ponder on or ponder off matches... So where is your point always claiming others as "synthetic" results? ... If both can ponder too, because one may ponder more effectively, than other... Yes. Only on results, obtained from tests with non realistic parameters, for example, time control. On engines with no big difference in strength the results may vary from real control parameters tests. |
Author: | q30 [ Sat Jan 20, 2018 8:29 am ] |
Post subject: | Re: Engine Tournament |
as0770 wrote: ... That is exactly what I expected: The statistical fluctuation when playing matches between engines with similar strength is very high, just like rolling a dice. The difference between pondering and not pondering is simply not meassurable with such a small amount of games. Try to use time control equivalent to 2 min per move. In this case fluctuations will be much smaller and difference between pondering and not pondering will be measurable... |
Page 7 of 20 | All times are UTC - 8 hours [ DST ] |
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |