Life In 19x19http://lifein19x19.com/ Engine Tournamenthttp://lifein19x19.com/viewtopic.php?f=18&t=13322 Page 16 of 17

 Author: jann [ Tue Sep 10, 2019 10:05 am ] Post subject: Re: Engine Tournament Bill Spight wrote:How does the winning percentage of the stronger player change with, say, doubling the playouts? My guess is that there is an optimal number of playouts to discriminate between players.As the issue linked by jlt (and other similar reports) shows, there is no real upper optimum (within reasonable limits) - more search means more difference. (The artifact at extremes, like at very low playouts is a different effect - if the engine is only allowed to look at 5-10 nodes the rewards and penalties in choosing even a single one (in)efficiently can be very different).Doubling the playouts is a bit tricky question, because going from 100 to 200 is NOT completely the same utility-wise as going from 300 to 600. But still, past tests often shown remarkably consistent Elo gains per playout multiplication.as0770 wrote:Bills "joke" will help you understand why the winning chances will shift in direction to 50% with more playouts. Here it is your part to prove me wrong.Just look at the data above.Quote:The only question is: Does the number of playouts affect the statistical significanceAs you can see the stronger engine is expected to win more games under high-search conditions. For the weaker net to win a 400 game match by a chance upset, he needs the noise / random deviation to overcome the strengthwise expected advantage of the stronger player. Random deviation is constant for 400 games, the advantage of the stronger player is bigger with more playouts, hence the probability of getting the winner/stronger side wrong is less for the same number of games.

 Author: as0770 [ Tue Sep 10, 2019 1:43 pm ] Post subject: Re: Engine Tournament jann wrote:Just look at the data above.OMG. You completely mixed up different issues.The data are related to the strength of nets of different sizes. Of course their strength depends on the number of playouts.The issue of this topic is only the statistical significance of results...

 Author: jann [ Tue Sep 10, 2019 2:06 pm ] Post subject: Re: Engine Tournament The data is about how strength DIFFERENCE (thus the expected win%) between the same two nets changes with increasing playouts (for both nets, proportionally). There are data points even for nets of the same sizes (both here and elsewhere).

 Author: as0770 [ Wed Sep 11, 2019 9:04 am ] Post subject: Re: Engine Tournament jann wrote:The data is about how strength DIFFERENCE (thus the expected win%) between the same two nets changes with increasing playouts (for both nets, proportionally). There are data points even for nets of the same sizes (both here and elsewhere).Exactly. And the data say nothing about statistical significance related to the number of games.

 Author: Bill Spight [ Wed Sep 11, 2019 9:09 am ] Post subject: Re: Engine Tournament BTW, has anyone come up with an explanation for the U shape graphs? Thanks.

 Author: jann [ Wed Sep 11, 2019 10:28 am ] Post subject: Re: Engine Tournament Bill Spight wrote:BTW, has anyone come up with an explanation for the U shape graphs? Thanks. Note that it's not complete U shape, only the low extreme below ~10 nodes behaves differently. As I wrote my guess is granularity (which makes small errors or differences impossible, and changes the rewards and penalties for selection).With only a few nodes in total, you cannot get the order or move priority "slightly" wrong. Looking at a single useless node wastes 20% of the whole search if the total playouts is 5. Another thing to consider that if you choose the order of A and B inefficiently, this may cut off B completely if only a few nodes examined, while normally it only delays looking at B.

 Author: as0770 [ Wed Sep 11, 2019 10:30 am ] Post subject: Re: Engine Tournament Bill Spight wrote:BTW, has anyone come up with an explanation for the U shape graphs? Thanks. I'd say with a few playouts the factor for the smaller net is higher because in a Monte Carlo search the number of playouts below 100 don't help much. The number of playouts must be high enough to get a statistical significant result. The same at the end of the scale. If you have a statistical significant result even more playouts don't help much either. So there is an area inbetween where the smaller/weaker net can compensate less knowledge with more playouts.But I am not an expert and didn't look closely at it

 Author: Bill Spight [ Wed Sep 11, 2019 11:47 am ] Post subject: Re: Engine Tournament as0770 wrote:Bill Spight wrote:BTW, has anyone come up with an explanation for the U shape graphs? Thanks. I'd say with a few playouts the factor for the smaller net is higher because in a Monte Carlo search the number of playouts below 100 don't help much. The number of playouts must be high enough to get a statistical significant result. The same at the end of the scale. If you have a statistical significant result even more playouts don't help much either. So there is an area inbetween where the smaller/weaker net can compensate less knowledge with more playouts.But I am not an expert and didn't look closely at it Well, statistical significance is a matter of convention, not skill at go, so I don't see how that matters.

 Author: as0770 [ Wed Sep 11, 2019 11:57 am ] Post subject: Re: Engine Tournament Bill Spight wrote:as0770 wrote:Bill Spight wrote:BTW, has anyone come up with an explanation for the U shape graphs? Thanks. I'd say with a few playouts the factor for the smaller net is higher because in a Monte Carlo search the number of playouts below 100 don't help much. The number of playouts must be high enough to get a statistical significant result. The same at the end of the scale. If you have a statistical significant result even more playouts don't help much either. So there is an area inbetween where the smaller/weaker net can compensate less knowledge with more playouts.But I am not an expert and didn't look closely at it Well, statistical significance is a matter of convention, not skill at go, so I don't see how that matters.Monte Carlo Search plays random moves to finish the game and usees the win/loss statistics to evaluate moves. With "statistical significant result" I mean the results of the Monte Carlo Search.

 Author: Mike Novack [ Wed Sep 11, 2019 12:11 pm ] Post subject: Re: Engine Tournament I'll try to make that clearer.The programs based upon MonteCarlo search are based upon the concept that IF (in a given position) the win rate of a number of random playouts is higher following move A than following move B then move A is a better move than move B.Statistical relevance (the need for enough of these random playouts) is because otherwise the observed difference (win % following A higher than win % following B) might just be chance, or worse,worng just by chance.

 Author: Bill Spight [ Wed Sep 11, 2019 12:37 pm ] Post subject: Re: Engine Tournament Bill Spight wrote:as0770 wrote:Bill Spight wrote:BTW, has anyone come up with an explanation for the U shape graphs? Thanks. I'd say with a few playouts the factor for the smaller net is higher because in a Monte Carlo search the number of playouts below 100 don't help much. The number of playouts must be high enough to get a statistical significant result. The same at the end of the scale. If you have a statistical significant result even more playouts don't help much either. So there is an area inbetween where the smaller/weaker net can compensate less knowledge with more playouts.But I am not an expert and didn't look closely at it Well, statistical significance is a matter of convention, not skill at go, so I don't see how that matters.as0770 wrote:jjMonte Carlo Search plays random moves to finish the game and usees the win/loss statistics to evaluate moves. With "statistical significant result" I mean the results of the Monte Carlo Search.Mike Novack wrote:I'll try to make that clearer.The programs based upon MonteCarlo search are based upon the concept that IF (in a given position) the win rate of a number of random playouts is higher following move A than following move B then move A is a better move than move B.Statistical relevance (the need for enough of these random playouts) is because otherwise the observed difference (win % following A higher than win % following B) might just be chance, or worse,worng just by chance.I'm not up on the internals of today's bots, but, IIRC, statistical arguments in the N-armed bandit solution have to do with choosing which node of the game tree to expand. Anyway, in simple estimation, the more instances you have the greater the convergence. The U shape indicates that something else is going on. My first guess is that with sufficient playouts new moves are uncovered which alter the evaluation, and that stronger programs are better at taking that difference in evaluation into account. But that does not explain a sustained divergence in the winning percentage of the stronger program. {shrug}

 Author: Bill Spight [ Thu Sep 12, 2019 7:28 am ] Post subject: Re: Engine Tournament lightvector wrote:Basically, this comes from the "stronger" neural net being better at evaluation. And in Go, time horizons are long enough that no feasible amount of search can save you in many cases if you are fundamentally misjudging a position. E.g. if you think a given territory is larger than it actually is because you miss straightforward endgame aji, or that a group is alive when actually the opponent has a throw-in ko and is just waiting to build threats and/or for the ko to be valuable enough, then you're in trouble. If it's early in the game, it could take all the way until the later midgame or endgame before it becomes worth the opponent's time to exploit it, and search obviously cannot read that far, so search cannot help you correct the misjudgment.Just to note that the examples given are things that humans, by our propensity for depth first local search, can and do uncover early in the game. Edit: Let me add that, because of the nature of go, in which pieces do not move, local reading often remains pertinent for many moves.

 Author: jann [ Thu Sep 12, 2019 10:45 am ] Post subject: Re: Engine Tournament There's actually another and very simple possible explanation why the stronger side's win% expectation keeps on moving further away from 50% with more playouts for both sides: the compounding effect.Being stronger usually means that a look at a position gives slightly more correct move ordering, the first (highest policy) moves are slightly more often better objectively. Let's say this advantage worth 1.01 (in whatever scale). Now let's say that in a low-search condition the average search depth is 3, while in a high-search condition it is 8. This means this advantage is raised to different powers, since it manifests at ALL depth levels (from top to leaf). So the stronger side's search looks at more relevant answers to more relevant answers to more relevant moves etc. The deeper the search, the greater the efficiency/relevance advantage is (by the time a leaf node reached, eg. raised from 1.03 to 1.083 above).(Edit: Since leaf nodes are the most numerous, their weight is high in the averages, thus their relevance strongly affect the accuracy of even the top evaluations.)

 Author: as0770 [ Thu Sep 12, 2019 11:06 pm ] Post subject: Re: Engine Tournament Bill Spight wrote:Anyway, in simple estimation, the more instances you have the greater the convergence. For every engine you can say the more playouts, the stronger. But the benefit is not linear. For above mentioned reason there is less benefit with a very small and a very high number of playouts.The graph compares engines with a different number of playouts. The benefit form 5 -> 10 po is likely less then from 500 -> 1000 po. The benefit from 500->1000 po is likely higher than from 500.000 to 1.000.000 po. This would explain the u-shape.

 Author: jann [ Fri Sep 13, 2019 4:43 am ] Post subject: Re: Engine Tournament Even this graph shows the same curve for nets of the same sizes starting with the same number of playouts, where the strength difference is smaller thus the low-mid point is at playout parity (1:1).Also as I mentioned there were more similar reports, some of them compared dozens of nets (mostly the same size) and also found bigger Elo difference between the same two nets at the same playouts with more playouts.

 Author: as0770 [ Fri Sep 13, 2019 6:44 am ] Post subject: Re: Engine Tournament jann wrote:Even this graph shows the same curve for nets of the same sizes starting with the same number of playouts, where the strength difference is smaller thus the low-mid point is at playout parity (1:1).Well, I see 2 graphs of nets with the same size, at one you can adumbrate some kind of a u shape, at the other one you can't. Not that much of a sample... Of course the same effect might happen with nets of the same size but different strength, but you can't disclaim that it is at least "less distinctive".

 Author: as0770 [ Sat Sep 14, 2019 2:08 am ] Post subject: Re: Engine Tournament as0770 wrote:jann wrote:Even this graph shows the same curve for nets of the same sizes starting with the same number of playouts, where the strength difference is smaller thus the low-mid point is at playout parity (1:1).Well, I see 2 graphs of nets with the same size, at one you can adumbrate some kind of a u shape, at the other one you can't. Not that much of a sample... Of course the same effect might happen with nets of the same size but different strength, but you can't disclaim that it is at least "less distinctive".Thinking about that... If the hypothesis is correct, it has nothing to do with the size of the net, but with the factor of the number of playouts you need to make the engines play on the same strength. The more the factor tend to 1, the less distinctive is the u-shape. And I think you can interpret the graphs in this way.

 Author: jann [ Sat Sep 14, 2019 2:50 am ] Post subject: Re: Engine Tournament The size of the net and the low parity factor is closely related (larger nets are stronger but slower). Same-size nets tend to be closer in strength, that's why the curve is less steep. And again, there were plenty of other tests done beyond the single linked graph.

 Author: as0770 [ Sat Sep 14, 2019 2:53 am ] Post subject: Re: Engine Tournament jann wrote:The size of the net and the low parity factor is closely related (larger nets are stronger but slower). Same-size nets tend to be closer in strength, that's why the curve is less steep. And again, there were plenty of other tests done beyond the single linked graph.And where is the contradiction to what I wrote?

 Page 16 of 17 All times are UTC - 8 hours [ DST ] Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Grouphttp://www.phpbb.com/