It is currently Wed Oct 23, 2019 4:02 pm

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 334 posts ]  Go to page Previous  1 ... 12, 13, 14, 15, 16, 17  Next
Author Message
Offline
 Post subject: Re: Engine Tournament
Post #281 Posted: Sun Sep 08, 2019 6:02 am 
Lives with ko

Posts: 177
Liked others: 15
Was liked: 23
Rank: Beginner
q30 wrote:
in a match with small number of games, but big number of playouts,that was confirmed in a match with big number of games, but small number of playouts,


Your main error in reasoning is that the statistical significance of a result will not increase with a higher number of playouts but only with a higher number of games.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #282 Posted: Sun Sep 08, 2019 6:58 am 
Dies in gote

Posts: 58
Liked others: 0
Was liked: 16
as0770 wrote:
the statistical significance of a result will not increase with a higher number of playouts but only with a higher number of games.

This is not entirely correct. Higher playouts reduce the random factor in individual matches somewhat, making the result more representative. OC this is a weaker effect than the statistical validity coming from the number of samples (only increasing the weight of samples towards 1, whereas a match on low playouts may only worth 0.7, for example).

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #283 Posted: Sun Sep 08, 2019 8:35 am 
Lives with ko

Posts: 177
Liked others: 15
Was liked: 23
Rank: Beginner
jann wrote:
as0770 wrote:
the statistical significance of a result will not increase with a higher number of playouts but only with a higher number of games.

This is not entirely correct. Higher playouts reduce the random factor in individual matches somewhat, making the result more representative. OC this is a weaker effect than the statistical validity coming from the number of samples (only increasing the weight of samples towards 1, whereas a match on low playouts may only worth 0.7, for example).


In a match with x playouts the winning chance for an engine is y %. There is nothing like a random factor. What you mean is: Results with a higher number of playouts are more representative for the engines strength. Of course that's true. But that don't mean you get a statistical significant result with less games.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #284 Posted: Sun Sep 08, 2019 9:41 am 
Honinbo

Posts: 8919
Liked others: 2682
Was liked: 3036
as0770 wrote:
What you mean is: Results with a higher number of playouts are more representative for the engines strength.


The number of playouts is one parameter of an engine's strength.

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

The race is not to the swift, nor the battle to the strong, but that's the way to bet. ;)

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #285 Posted: Sun Sep 08, 2019 10:50 am 
Dies in gote

Posts: 58
Liked others: 0
Was liked: 16
as0770 wrote:
There is nothing like a random factor.

Without random factor the stronger net would always win (and the games may even be identical).

A winrate of eg. 54% may go up to 58% with quadruple playouts. This 58% makes slightly more statistical mass from the same number of games (because each sample weights nearly 1, while at very low playouts game results are more random, thus weight less than 1 - carry less information).

as0770 wrote:
Results with a higher number of playouts are more representative for the engines strength. Of course that's true. But that don't mean you get a statistical significant result with less games.

The same number of more representative samples weights more than the same number of less representative samples. Maybe you understand better from a specific example: 102 games with 200 playouts are statistically less significant than 101 games with 2000 playouts (a weaker effect as mentioned).

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #286 Posted: Sun Sep 08, 2019 11:19 am 
Lives with ko

Posts: 177
Liked others: 15
Was liked: 23
Rank: Beginner
jann wrote:
as0770 wrote:
There is nothing like a random factor.

Without random factor the stronger net would always win (and the games may even be identical).


Now I got what you mean. But the stronger net won't necessarily win. The chance it would win all games is e.g. 60% but there is also a chance it will lose all games by e.g. 40%. The random factor is part of the match condition.

jann wrote:
A winrate of eg. 54% may go up to 58% with quadruple playouts. This 58% makes slightly more statistical mass from the same number of games (because each sample weights nearly 1, while at very low playouts game results are more random, thus weight less than 1 - carry less information).


In a match between A and B it might happen that A wins with x playouts, and B with x*4 playouts. Then A is stronger in games with x playouts and B in games with x*4 playouts. The winning chances can be different for all kind of match conditions, and the results of a match are only valid for its own match condition. The statistical significance for every match condition depends exclusively on the number of games.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #287 Posted: Sun Sep 08, 2019 12:16 pm 
Dies in gote

Posts: 58
Liked others: 0
Was liked: 16
as0770 wrote:
The chance it would win all games is e.g. 60% but there is also a chance it will lose all games by e.g. 40%. The random factor is part of the match condition.

This is not just what I meant. For the stronger net to lose, there still must be some random factor - something that can go against it. Without such, and without even being unlucky, it won't lose.

Quote:
In a match between A and B it might happen that A wins with x playouts, and B with x*4 playouts. Then A is stronger in games with x playouts and B in games with x*4 playouts.

No, this is quite unlikely. It was observed that higher playouts usually match the results of lower playouts, only with increased differences.

Quote:
The statistical significance for every match condition depends exclusively on the number of games.

No, it also depends on the quality/representativeness of the games. A less representative / more random game sample can be though of like having N% chance of being replaced by a random value (thus resulting in lower number of effective samples).

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #288 Posted: Sun Sep 08, 2019 12:41 pm 
Honinbo

Posts: 8919
Liked others: 2682
Was liked: 3036
There is an argument that increasing the number of playouts makes the result of each engine less representative, not moreso. Taken to the extreme, with enough playouts each engine plays perfectly. From the results you can't tell which engine is which. ;)

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

The race is not to the swift, nor the battle to the strong, but that's the way to bet. ;)

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #289 Posted: Mon Sep 09, 2019 12:43 pm 
Lives with ko

Posts: 177
Liked others: 15
Was liked: 23
Rank: Beginner
jann wrote:
as0770 wrote:
The chance it would win all games is e.g. 60% but there is also a chance it will lose all games by e.g. 40%. The random factor is part of the match condition.

This is not just what I meant. For the stronger net to lose, there still must be some random factor - something that can go against it. Without such, and without even being unlucky, it won't lose.


If Engine A wins 60% against Engine B, it is supposed to be stronger. If there is no random factor, the Engines will play the same game again and again and one engine will win all games. Hence the chance, that the stronger Engine A loses all games, is 40%.

On the other hand the random factor will affect the result very much, a high random factor might force the stronger engine to play moves that it don't like and it may even lose because of that if the weaker engine can handle that better. So it is a very important point for interpreting the results of a match and you will get completely different results just by changing the random factor, and it is unpredictable which influence it has.

jann wrote:
Quote:
In a match between A and B it might happen that A wins with x playouts, and B with x*4 playouts. Then A is stronger in games with x playouts and B in games with x*4 playouts.

No, this is quite unlikely.


Just because it is unlikely you can't ignore it. We are talking about determining the strength of an engine and how much games you need to get a statistical significant result. If you think to know the outcome of a match you don't need to play it. And as soon as you don't set the number of playouts but the time for each move you will find out that there are matchups small vs. big nets where with little time the small net will win and with more time the big net.

The same might happen with nets of equal size, one net understands ladders, the other one needs a special amount of playouts to calculate ladders.

jann wrote:
It was observed that higher playouts usually match the results of lower playouts, only with increased differences.


With every match you evaluate the strength in a specific condition. What you are talking about is maybe an effect when matching similar engine nets. It might be different with other types of engines. In fact the opposite is true as Bill easily proved. But that is not at all the point. The only point I am talking about is the statistical significance of a result.

That means if you have a result of A vs. B of 220:180 the chance that B is stronger in this match conditions is still > 2%. Regardless the number of playouts.

And btw, good points don't begin with "it was observed that..." ;-)

jann wrote:
as0770 wrote:
The statistical significance for every match condition depends exclusively on the number of games.

No, it also depends on the quality/representativeness of the games. A less representative / more random game sample can be though of like having N% chance of being replaced by a random value (thus resulting in lower number of effective samples).


The outcome of a game is like rolling a dice and the result depends on probabilities. It won't change the outcome if you roll the dice stronger. The results of a game with more playouts are more important for us, no doubt, but to get a statistical significant result you don't need less games as with few playouts. This is simple mathematics.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #290 Posted: Mon Sep 09, 2019 8:15 pm 
Dies in gote

Posts: 58
Liked others: 0
Was liked: 16
as0770 wrote:
If there is no random factor, the Engines will play the same game again and again and one engine will win all games. Hence the chance, that the stronger Engine A loses all games, is 40%.

Here you are contradicting yourself, still unconsciously thinking there is some random factor, there are still "chances".

Quote:
In fact the opposite is true as Bill easily proved.

He was joking. :D (If I really need to spell this out: that artifact only happens at the very end of the scale, where there are really no random factor and no competition anymore.)

Quote:
jann wrote:
A less representative / more random game sample can be though of like having N% chance of being replaced by a random value (thus resulting in lower number of effective samples).

The outcome of a game is like rolling a dice and the result depends on probabilities. It won't change the outcome if you roll the dice stronger. The results of a game with more playouts are more important for us, no doubt, but to get a statistical significant result you don't need less games as with few playouts.

Pls reread what I wrote. It's not with more playouts you need less games (sampe weight nearing 1) - it's with too few playouts you need more (sample weight <1).

Last attempt to help you understand: suppose there is a special match condition where engine strengths only have a minimal effect on results, who wins each match is almost completely random, but the stronger engine still have some tiny advantage. You play 400 games. The significance of the result will be very small: sd (from the random part) will still be +-10, and the useful, informative part will be dwarfed beside it (+-1 or so). You don't want to measure random noise, you want to measure a signal ("more important for us"). Statistical significance means how unlikely your results were caused by pure chance instead of that signal.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #291 Posted: Mon Sep 09, 2019 8:25 pm 
Honinbo

Posts: 8919
Liked others: 2682
Was liked: 3036
jann wrote:
Quote:
In fact the opposite is true as Bill easily proved.

He was joking. :D (If I really need to spell this out: that artifact only happens at the very end of the scale, where there are really no random factor and no competition anymore.)


Actually, I was not joking. I do not know enough to engage in this debate. But there is a problem with talking about representativeness without defining it. My point was really made earlier. Search is part of the strength of a program. Making no search does not make much sense, because how search is done is one of the characteristics of today's top programs. But if you are comparing two programs without specifying their search parameters, which includes number of playouts, then what are you saying?

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

The race is not to the swift, nor the battle to the strong, but that's the way to bet. ;)

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #292 Posted: Mon Sep 09, 2019 8:46 pm 
Dies in gote

Posts: 58
Liked others: 0
Was liked: 16
Bill Spight wrote:
Making no search does not make much sense, because how search is done is one of the characteristics of today's top programs. But if you are comparing two programs without specifying their search parameters, which includes number of playouts, then what are you saying?

That restricting search to too little amounts means the results are getting less informative and closer to random, thus less reliable in choosing the stronger side. In typical cases the observed winrate between A and B is proportional to the amount of search allowed (not linearly oc).

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #293 Posted: Mon Sep 09, 2019 9:13 pm 
Honinbo

Posts: 8919
Liked others: 2682
Was liked: 3036
jann wrote:
Bill Spight wrote:
Making no search does not make much sense, because how search is done is one of the characteristics of today's top programs. But if you are comparing two programs without specifying their search parameters, which includes number of playouts, then what are you saying?

That restricting search to too little amounts means the results are getting less informative and closer to random, thus less reliable in choosing the stronger side. In typical cases the observed winrate between A and B is proportional to the amount of search allowed (not linearly oc).


But when you change the search parameters, you change the program being compared. You can't say that more search makes a program more what it is, unless you have defined the program that way.

LZ with 200k playouts is different from LZ with 100k playouts. It is stronger. And it is stronger not just because of randomness. I have shown, with a version of Leela a few years ago, that the strength difference is not random. You can use the non-randomness to identify likely errors by the version with fewer playouts.

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

The race is not to the swift, nor the battle to the strong, but that's the way to bet. ;)

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #294 Posted: Mon Sep 09, 2019 11:04 pm 
Dies in gote

Posts: 58
Liked others: 0
Was liked: 16
I don't see why would that go against what I wrote, or against the basic statistical phenomenon: that it is significantly more likely that the weaker side wins a 100 game match (by chance) under low-search conditions, than under high-search conditions. (Provided that condition change can be performed in an unbiased way, which in practice is only possible if the engines are similar and the amount of search allowed is the same for them. Also usually more search = less randomness, and a player is not rigidly defined by a fixed search amount, even for humans there are things like time controls.)

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #295 Posted: Mon Sep 09, 2019 11:28 pm 
Honinbo

Posts: 8919
Liked others: 2682
Was liked: 3036
jann wrote:
I don't see why would that go against what I wrote, or against the basic statistical phenomenon: that it is significantly more likely that the weaker side wins a 100 game match (by chance) under low-search conditions, than under high-search conditions. (Provided that condition change can be performed in an unbiased way, which in practice is only possible if the engines are similar and the amount of search allowed is the same for them. Also usually more search = less randomness.)


Well, I do not think that your claim is precise enough, nor do I see any evidence. It may well be that, given two similar neural net programs, there are search conditions that distinguish between them best with regard to strength. But that is an empirical question. It is not proven, and IMHO, not plausible, that simply increasing the number of playouts will always provide better discrimination between the programs. You need to demonstrate that claim.

Edit: I am not sure, but is this your claim? Given two similar neural net programs playing a match against each other with a certain number of games (10,000 maybe?), the more playouts you give each program, the more games the stronger program will win.

If so, that's a demonstrable claim. Within practical limits, OC. :)

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

The race is not to the swift, nor the battle to the strong, but that's the way to bet. ;)

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #296 Posted: Tue Sep 10, 2019 4:50 am 
Dies in gote

Posts: 58
Liked others: 0
Was liked: 16
Bill Spight wrote:
Edit: I am not sure, but is this your claim? Given two similar neural net programs playing a match against each other with a certain number of games (10,000 maybe?), the more playouts you give each program, the more games the stronger program will win.

If so, that's a demonstrable claim. Within practical limits, OC. :)

Roughly yes (and the consequence regarding statistical significance). Exceptions are possible oc (and the effect is nowhere that linear), but in reality the same effect would still often occur under less restricted conditions, you just cannot cannot observe it (performing an unbiased experient will not be possible if the two sides start with different amount of search, for example). Btw this is not my claim but actual LZ stats.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #297 Posted: Tue Sep 10, 2019 8:03 am 
Honinbo

Posts: 8919
Liked others: 2682
Was liked: 3036
jann wrote:
Bill Spight wrote:
Edit: I am not sure, but is this your claim? Given two similar neural net programs playing a match against each other with a certain number of games (10,000 maybe?), the more playouts you give each program, the more games the stronger program will win.

If so, that's a demonstrable claim. Within practical limits, OC. :)

Roughly yes (and the consequence regarding statistical significance). Exceptions are possible oc (and the effect is nowhere that linear), but in reality the same effect would still often occur under less restricted conditions, you just cannot cannot observe it (performing an unbiased experient will not be possible if the two sides start with different amount of search, for example). Btw this is not my claim but actual LZ stats.


Thanks. :)

How does the winning percentage of the stronger player change with, say, doubling the playouts? My guess is that there is an optimal number of playouts to discriminate between players. What about games between players with sizable differences? Are there graphs somewhere of the systematic effects? Thanks. :)

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

The race is not to the swift, nor the battle to the strong, but that's the way to bet. ;)

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #298 Posted: Tue Sep 10, 2019 8:46 am 
Lives in gote
User avatar

Posts: 560
Liked others: 48
Was liked: 198
See this post of Friday9i, written on 6 Oct. 2018:
https://github.com/leela-zero/leela-zero/issues/1914

Comparison curves between two nets tend to be U-shaped: the stronger net is much better than the weaker one at low (<10) playouts, or at high (>1000 or >10000) playouts.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #299 Posted: Tue Sep 10, 2019 9:09 am 
Honinbo

Posts: 8919
Liked others: 2682
Was liked: 3036
jlt wrote:
See this post of Friday9i, written on 6 Oct. 2018:
https://github.com/leela-zero/leela-zero/issues/1914

Comparison curves between two nets tend to be U-shaped: the stronger net is much better than the weaker one at low (<10) playouts, or at high (>1000 or >10000) playouts.


Many thanks. :)

I see that the graph is supposedly about game analysis instead of game play. Even now, as I have indicated elsewhere, I would not trust any bot for analysis below 10k. Maybe a broader search setting would be better for game analysis, I dunno. Edit: The point being that a broad search can uncover good plays that are not originally considered good enough to explore.

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

The race is not to the swift, nor the battle to the strong, but that's the way to bet. ;)

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #300 Posted: Tue Sep 10, 2019 9:24 am 
Lives with ko

Posts: 177
Liked others: 15
Was liked: 23
Rank: Beginner
jann wrote:
as0770 wrote:
If there is no random factor, the Engines will play the same game again and again and one engine will win all games. Hence the chance, that the stronger Engine A loses all games, is 40%.

Here you are contradicting yourself, still unconsciously thinking there is some random factor, there are still "chances".


First, this was an answer to "The stronger engine won't lose" which is wrong. I don't see a contradiction. Before playing a game you don't know the outcome. You can estimate the chances by playing with other conditions or against other engines.

If Engine A wins a 1000 playouts match 100:0 and engine B 1001 playouts match 100:0, which one is stronger?

jann wrote:
Quote:
In fact the opposite is true as Bill easily proved.

He was joking. :D (If I really need to spell this out: that artifact only happens at the very end of the scale, where there are really no random factor and no competition anymore.)


You seem not to have much experiences in AI games. Usually the "draw factor" raises with more calculation time. This is proven in many cases with A/B search AI engines. I see no reason why this should be different in a Monte Carlo search. Bills "joke" will help you understand why the winning chances will shift in direction to 50% with more playouts. Here it is your part to prove me wrong.

jann wrote:
Quote:
jann wrote:
A less representative / more random game sample can be though of like having N% chance of being replaced by a random value (thus resulting in lower number of effective samples).

The outcome of a game is like rolling a dice and the result depends on probabilities. It won't change the outcome if you roll the dice stronger. The results of a game with more playouts are more important for us, no doubt, but to get a statistical significant result you don't need less games as with few playouts.

Pls reread what I wrote. It's not with more playouts you need less games (sampe weight nearing 1) - it's with too few playouts you need more (sample weight <1).


Please explain the difference between "with few playouts you need more games" and "with much playouts you need less games"

jann wrote:
Last attempt to help you understand:

Thanks, but what you say doesn't explain anything in question.
jann wrote:
suppose there is a special match condition where engine strengths only have a minimal effect on results, who wins each match is almost completely random, but the stronger engine still have some tiny advantage. You play 400 games. The significance of the result will be very small: sd (from the random part) will still be +-10, and the useful, informative part will be dwarfed beside it (+-1 or so). You don't want to measure random noise, you want to measure a signal ("more important for us"). Statistical significance means how unlikely your results were caused by pure chance instead of that signal.


I didn't say something else. The only question is: Does the number of playouts affect the statistical significance, And no matter what you do or say: For every condition there is a winning chance.

And my last attempt to help you understand: The result of A vs. B of 220:180 means the probability that A is better is 97%. No matter how many playouts.

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 334 posts ]  Go to page Previous  1 ... 12, 13, 14, 15, 16, 17  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group