It is currently Wed Oct 23, 2019 6:52 pm

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 334 posts ]  Go to page Previous  1 ... 13, 14, 15, 16, 17  Next
Author Message
Offline
 Post subject: Re: Engine Tournament
Post #301 Posted: Tue Sep 10, 2019 10:05 am 
Dies in gote

Posts: 58
Liked others: 0
Was liked: 16
Bill Spight wrote:
How does the winning percentage of the stronger player change with, say, doubling the playouts? My guess is that there is an optimal number of playouts to discriminate between players.

As the issue linked by jlt (and other similar reports) shows, there is no real upper optimum (within reasonable limits) - more search means more difference. (The artifact at extremes, like at very low playouts is a different effect - if the engine is only allowed to look at 5-10 nodes the rewards and penalties in choosing even a single one (in)efficiently can be very different).

Doubling the playouts is a bit tricky question, because going from 100 to 200 is NOT completely the same utility-wise as going from 300 to 600. But still, past tests often shown remarkably consistent Elo gains per playout multiplication.

as0770 wrote:
Bills "joke" will help you understand why the winning chances will shift in direction to 50% with more playouts. Here it is your part to prove me wrong.

Just look at the data above.

Quote:
The only question is: Does the number of playouts affect the statistical significance

As you can see the stronger engine is expected to win more games under high-search conditions. For the weaker net to win a 400 game match by a chance upset, he needs the noise / random deviation to overcome the strengthwise expected advantage of the stronger player. Random deviation is constant for 400 games, the advantage of the stronger player is bigger with more playouts, hence the probability of getting the winner/stronger side wrong is less for the same number of games.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #302 Posted: Tue Sep 10, 2019 1:43 pm 
Lives with ko

Posts: 177
Liked others: 15
Was liked: 23
Rank: Beginner
jann wrote:
Just look at the data above.


OMG. You completely mixed up different issues.

The data are related to the strength of nets of different sizes. Of course their strength depends on the number of playouts.

The issue of this topic is only the statistical significance of results...

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #303 Posted: Tue Sep 10, 2019 2:06 pm 
Dies in gote

Posts: 58
Liked others: 0
Was liked: 16
The data is about how strength DIFFERENCE (thus the expected win%) between the same two nets changes with increasing playouts (for both nets, proportionally). There are data points even for nets of the same sizes (both here and elsewhere).

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #304 Posted: Wed Sep 11, 2019 9:04 am 
Lives with ko

Posts: 177
Liked others: 15
Was liked: 23
Rank: Beginner
jann wrote:
The data is about how strength DIFFERENCE (thus the expected win%) between the same two nets changes with increasing playouts (for both nets, proportionally). There are data points even for nets of the same sizes (both here and elsewhere).


Exactly. And the data say nothing about statistical significance related to the number of games.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #305 Posted: Wed Sep 11, 2019 9:09 am 
Honinbo

Posts: 8919
Liked others: 2682
Was liked: 3036
BTW, has anyone come up with an explanation for the U shape graphs? Thanks. :)

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

The race is not to the swift, nor the battle to the strong, but that's the way to bet. ;)

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #306 Posted: Wed Sep 11, 2019 10:28 am 
Dies in gote

Posts: 58
Liked others: 0
Was liked: 16
Bill Spight wrote:
BTW, has anyone come up with an explanation for the U shape graphs? Thanks. :)

Note that it's not complete U shape, only the low extreme below ~10 nodes behaves differently. As I wrote my guess is granularity (which makes small errors or differences impossible, and changes the rewards and penalties for selection).

With only a few nodes in total, you cannot get the order or move priority "slightly" wrong. Looking at a single useless node wastes 20% of the whole search if the total playouts is 5. Another thing to consider that if you choose the order of A and B inefficiently, this may cut off B completely if only a few nodes examined, while normally it only delays looking at B.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #307 Posted: Wed Sep 11, 2019 10:30 am 
Lives with ko

Posts: 177
Liked others: 15
Was liked: 23
Rank: Beginner
Bill Spight wrote:
BTW, has anyone come up with an explanation for the U shape graphs? Thanks. :)


I'd say with a few playouts the factor for the smaller net is higher because in a Monte Carlo search the number of playouts below 100 don't help much. The number of playouts must be high enough to get a statistical significant result. The same at the end of the scale. If you have a statistical significant result even more playouts don't help much either. So there is an area inbetween where the smaller/weaker net can compensate less knowledge with more playouts.

But I am not an expert and didn't look closely at it :-)

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #308 Posted: Wed Sep 11, 2019 11:47 am 
Honinbo

Posts: 8919
Liked others: 2682
Was liked: 3036
as0770 wrote:
Bill Spight wrote:
BTW, has anyone come up with an explanation for the U shape graphs? Thanks. :)


I'd say with a few playouts the factor for the smaller net is higher because in a Monte Carlo search the number of playouts below 100 don't help much. The number of playouts must be high enough to get a statistical significant result. The same at the end of the scale. If you have a statistical significant result even more playouts don't help much either. So there is an area inbetween where the smaller/weaker net can compensate less knowledge with more playouts.

But I am not an expert and didn't look closely at it :-)


Well, statistical significance is a matter of convention, not skill at go, so I don't see how that matters.

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

The race is not to the swift, nor the battle to the strong, but that's the way to bet. ;)

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #309 Posted: Wed Sep 11, 2019 11:57 am 
Lives with ko

Posts: 177
Liked others: 15
Was liked: 23
Rank: Beginner
Bill Spight wrote:
as0770 wrote:
Bill Spight wrote:
BTW, has anyone come up with an explanation for the U shape graphs? Thanks. :)


I'd say with a few playouts the factor for the smaller net is higher because in a Monte Carlo search the number of playouts below 100 don't help much. The number of playouts must be high enough to get a statistical significant result. The same at the end of the scale. If you have a statistical significant result even more playouts don't help much either. So there is an area inbetween where the smaller/weaker net can compensate less knowledge with more playouts.

But I am not an expert and didn't look closely at it :-)


Well, statistical significance is a matter of convention, not skill at go, so I don't see how that matters.

Monte Carlo Search plays random moves to finish the game and usees the win/loss statistics to evaluate moves. With "statistical significant result" I mean the results of the Monte Carlo Search.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #310 Posted: Wed Sep 11, 2019 12:11 pm 
Lives in sente

Posts: 935
Liked others: 0
Was liked: 159
I'll try to make that clearer.

The programs based upon MonteCarlo search are based upon the concept that IF (in a given position) the win rate of a number of random playouts is higher following move A than following move B then move A is a better move than move B.

Statistical relevance (the need for enough of these random playouts) is because otherwise the observed difference (win % following A higher than win % following B) might just be chance, or worse,worng just by chance.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #311 Posted: Wed Sep 11, 2019 12:37 pm 
Honinbo

Posts: 8919
Liked others: 2682
Was liked: 3036
Bill Spight wrote:
as0770 wrote:
Bill Spight wrote:
BTW, has anyone come up with an explanation for the U shape graphs? Thanks. :)


I'd say with a few playouts the factor for the smaller net is higher because in a Monte Carlo search the number of playouts below 100 don't help much. The number of playouts must be high enough to get a statistical significant result. The same at the end of the scale. If you have a statistical significant result even more playouts don't help much either. So there is an area inbetween where the smaller/weaker net can compensate less knowledge with more playouts.

But I am not an expert and didn't look closely at it :-)


Well, statistical significance is a matter of convention, not skill at go, so I don't see how that matters.

as0770 wrote:
jjMonte Carlo Search plays random moves to finish the game and usees the win/loss statistics to evaluate moves. With "statistical significant result" I mean the results of the Monte Carlo Search.


Mike Novack wrote:
I'll try to make that clearer.

The programs based upon MonteCarlo search are based upon the concept that IF (in a given position) the win rate of a number of random playouts is higher following move A than following move B then move A is a better move than move B.

Statistical relevance (the need for enough of these random playouts) is because otherwise the observed difference (win % following A higher than win % following B) might just be chance, or worse,worng just by chance.


I'm not up on the internals of today's bots, but, IIRC, statistical arguments in the N-armed bandit solution have to do with choosing which node of the game tree to expand.

Anyway, in simple estimation, the more instances you have the greater the convergence. The U shape indicates that something else is going on. My first guess is that with sufficient playouts new moves are uncovered which alter the evaluation, and that stronger programs are better at taking that difference in evaluation into account. But that does not explain a sustained divergence in the winning percentage of the stronger program. {shrug}

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

The race is not to the swift, nor the battle to the strong, but that's the way to bet. ;)

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #312 Posted: Thu Sep 12, 2019 6:16 am 
Lives in gote

Posts: 300
Liked others: 59
Was liked: 257
Rank: maybe 2d
At the high end, my intuition is strongly that the playout advantage needed for a "weaker" net to match a sufficiently "stronger" neural net should continue to increase almost arbitrarily high the further you go.

Basically, this comes from the "stronger" neural net being better at evaluation. And in Go, time horizons are long enough that no feasible amount of search can save you in many cases if you are fundamentally misjudging a position. E.g. if you think a given territory is larger than it actually is because you miss straightforward endgame aji, or that a group is alive when actually the opponent has a throw-in ko and is just waiting to build threats and/or for the ko to be valuable enough, then you're in trouble. If it's early in the game, it could take all the way until the later midgame or endgame before it becomes worth the opponent's time to exploit it, and search obviously cannot read that far, so search cannot help you correct the misjudgment.

So the intuitive mental model I have is that you have some distribution of mistakes that vary widely in how fixable-by-search they are, immediate tactical blunders, versus short-term misjudgments that will reveal their impact quickly, versus medium-term misjudgments that can still be corrected with very deep reading, versus long-term misjudgments that are unfixable. At lower playouts, the weaker net sometimes wins games because the stronger net makes tactical blunders sometimes, and only a small playout advantage is sufficient to reach equality because that will greatly decrease the weaker net's blunder rate compared to the stronger net. As you increase playouts enough, more and more of the tactical mistakes get fixed, increasingly often leaving just the errors in judgment, which aren't fixable by search. And therefore the playout advantage for the weaker net needed to reach 50-50 increases almost arbitrarily, because as fewer and fewer of the mistakes become easily fixable by search, the weaker net keeps needing more and more and more of a playout factor to be able to fix enough of them to compenstate the fact that it's also making a constant unfixable rate of judgment errors compared to the stronger net.

I saw these curves a long time ago, so the results aren't news to me, but at the time I first read it, the surprising part of the "U shape" was the ratio being high at the beginning, which is of course not explained by the above intuition. I'm still pretty confident in the above intuitive explanation for the right side of the "U shape", It's very likely an entirely different phenomenon responsible for the left half. (Which I have some intuitions about now too but I won't touch on since they're harder to put into words for me).

Keep in mind that none of the above is a logical proof or anything. Indeed, it would probably not be hard to construct a strong neural net at low playouts that gets *weaker* the more you search, flagrantly violating the above argument! In the extreme, take a strong neural net and negate its value input, so that as it searches, it directly prefers bad results to good ones. At low playouts, it will be dominated by its initial move intuitions and priors (the "policy" output), which are still likely good moves and will usually beat the weaker net at similarly low playouts. But as it searches, those priors will get corrupted by the horrendous evaluations, leading it to increasingly prefer worse moves after a little search. (I haven't actually tested this, but I'm pretty sure something like this would happen). So the above is just an intuition about neural nets we train in practice, not something that is true by necessity.


This post by lightvector was liked by 2 people: Bill Spight, dfan
Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #313 Posted: Thu Sep 12, 2019 7:28 am 
Honinbo

Posts: 8919
Liked others: 2682
Was liked: 3036
lightvector wrote:
Basically, this comes from the "stronger" neural net being better at evaluation. And in Go, time horizons are long enough that no feasible amount of search can save you in many cases if you are fundamentally misjudging a position. E.g. if you think a given territory is larger than it actually is because you miss straightforward endgame aji, or that a group is alive when actually the opponent has a throw-in ko and is just waiting to build threats and/or for the ko to be valuable enough, then you're in trouble. If it's early in the game, it could take all the way until the later midgame or endgame before it becomes worth the opponent's time to exploit it, and search obviously cannot read that far, so search cannot help you correct the misjudgment.


Just to note that the examples given are things that humans, by our propensity for depth first local search, can and do uncover early in the game. :)

Edit: Let me add that, because of the nature of go, in which pieces do not move, local reading often remains pertinent for many moves.

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

The race is not to the swift, nor the battle to the strong, but that's the way to bet. ;)

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #314 Posted: Thu Sep 12, 2019 10:45 am 
Dies in gote

Posts: 58
Liked others: 0
Was liked: 16
There's actually another and very simple possible explanation why the stronger side's win% expectation keeps on moving further away from 50% with more playouts for both sides: the compounding effect.

Being stronger usually means that a look at a position gives slightly more correct move ordering, the first (highest policy) moves are slightly more often better objectively. Let's say this advantage worth 1.01 (in whatever scale). Now let's say that in a low-search condition the average search depth is 3, while in a high-search condition it is 8. This means this advantage is raised to different powers, since it manifests at ALL depth levels (from top to leaf). So the stronger side's search looks at more relevant answers to more relevant answers to more relevant moves etc. The deeper the search, the greater the efficiency/relevance advantage is (by the time a leaf node reached, eg. raised from 1.03 to 1.083 above).

(Edit: Since leaf nodes are the most numerous, their weight is high in the averages, thus their relevance strongly affect the accuracy of even the top evaluations.)

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #315 Posted: Thu Sep 12, 2019 11:06 pm 
Lives with ko

Posts: 177
Liked others: 15
Was liked: 23
Rank: Beginner
Bill Spight wrote:
Anyway, in simple estimation, the more instances you have the greater the convergence.


For every engine you can say the more playouts, the stronger. But the benefit is not linear. For above mentioned reason there is less benefit with a very small and a very high number of playouts.

The graph compares engines with a different number of playouts. The benefit form 5 -> 10 po is likely less then from 500 -> 1000 po. The benefit from 500->1000 po is likely higher than from 500.000 to 1.000.000 po. This would explain the u-shape.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #316 Posted: Fri Sep 13, 2019 4:43 am 
Dies in gote

Posts: 58
Liked others: 0
Was liked: 16
Even this graph shows the same curve for nets of the same sizes starting with the same number of playouts, where the strength difference is smaller thus the low-mid point is at playout parity (1:1).

Also as I mentioned there were more similar reports, some of them compared dozens of nets (mostly the same size) and also found bigger Elo difference between the same two nets at the same playouts with more playouts.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #317 Posted: Fri Sep 13, 2019 6:44 am 
Lives with ko

Posts: 177
Liked others: 15
Was liked: 23
Rank: Beginner
jann wrote:
Even this graph shows the same curve for nets of the same sizes starting with the same number of playouts, where the strength difference is smaller thus the low-mid point is at playout parity (1:1).


Well, I see 2 graphs of nets with the same size, at one you can adumbrate some kind of a u shape, at the other one you can't. Not that much of a sample... Of course the same effect might happen with nets of the same size but different strength, but you can't disclaim that it is at least "less distinctive".

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #318 Posted: Sat Sep 14, 2019 2:08 am 
Lives with ko

Posts: 177
Liked others: 15
Was liked: 23
Rank: Beginner
as0770 wrote:
jann wrote:
Even this graph shows the same curve for nets of the same sizes starting with the same number of playouts, where the strength difference is smaller thus the low-mid point is at playout parity (1:1).


Well, I see 2 graphs of nets with the same size, at one you can adumbrate some kind of a u shape, at the other one you can't. Not that much of a sample... Of course the same effect might happen with nets of the same size but different strength, but you can't disclaim that it is at least "less distinctive".


Thinking about that... If the hypothesis is correct, it has nothing to do with the size of the net, but with the factor of the number of playouts you need to make the engines play on the same strength. The more the factor tend to 1, the less distinctive is the u-shape. And I think you can interpret the graphs in this way.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #319 Posted: Sat Sep 14, 2019 2:50 am 
Dies in gote

Posts: 58
Liked others: 0
Was liked: 16
The size of the net and the low parity factor is closely related (larger nets are stronger but slower). Same-size nets tend to be closer in strength, that's why the curve is less steep. And again, there were plenty of other tests done beyond the single linked graph.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #320 Posted: Sat Sep 14, 2019 2:53 am 
Lives with ko

Posts: 177
Liked others: 15
Was liked: 23
Rank: Beginner
jann wrote:
The size of the net and the low parity factor is closely related (larger nets are stronger but slower). Same-size nets tend to be closer in strength, that's why the curve is less steep. And again, there were plenty of other tests done beyond the single linked graph.


And where is the contradiction to what I wrote?

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 334 posts ]  Go to page Previous  1 ... 13, 14, 15, 16, 17  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: Google [Bot] and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group