It is currently Thu Nov 21, 2019 9:38 pm

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 339 posts ]  Go to page Previous  1 ... 13, 14, 15, 16, 17
Author Message
Offline
 Post subject: Re: Engine Tournament
Post #321 Posted: Sat Sep 14, 2019 4:28 am 
Dies with sente

Posts: 91
Liked others: 1
Was liked: 1
Rank: 30 kyu
as0770 wrote:
jann wrote:
as0770 wrote:
The chance it would win all games is e.g. 60% but there is also a chance it will lose all games by e.g. 40%. The random factor is part of the match condition.

This is not just what I meant. For the stronger net to lose, there still must be some random factor - something that can go against it. Without such, and without even being unlucky, it won't lose.


If Engine A wins 60% against Engine B, it is supposed to be stronger. If there is no random factor, the Engines will play the same game again and again and one engine will win all games. Hence the chance, that the stronger Engine A loses all games, is 40%.

On the other hand the random factor will affect the result very much, a high random factor might force the stronger engine to play moves that it don't like and it may even lose because of that if the weaker engine can handle that better. So it is a very important point for interpreting the results of a match and you will get completely different results just by changing the random factor, and it is unpredictable which influence it has.

jann wrote:
Quote:
In a match between A and B it might happen that A wins with x playouts, and B with x*4 playouts. Then A is stronger in games with x playouts and B in games with x*4 playouts.

No, this is quite unlikely.


Just because it is unlikely you can't ignore it. We are talking about determining the strength of an engine and how much games you need to get a statistical significant result. If you think to know the outcome of a match you don't need to play it. And as soon as you don't set the number of playouts but the time for each move you will find out that there are matchups small vs. big nets where with little time the small net will win and with more time the big net.

The same might happen with nets of equal size, one net understands ladders, the other one needs a special amount of playouts to calculate ladders.

jann wrote:
It was observed that higher playouts usually match the results of lower playouts, only with increased differences.


With every match you evaluate the strength in a specific condition. What you are talking about is maybe an effect when matching similar engine nets. It might be different with other types of engines. In fact the opposite is true as Bill easily proved. But that is not at all the point. The only point I am talking about is the statistical significance of a result.

That means if you have a result of A vs. B of 220:180 the chance that B is stronger in this match conditions is still > 2%. Regardless the number of playouts.

And btw, good points don't begin with "it was observed that..." ;-)

jann wrote:
as0770 wrote:
The statistical significance for every match condition depends exclusively on the number of games.

No, it also depends on the quality/representativeness of the games. A less representative / more random game sample can be though of like having N% chance of being replaced by a random value (thus resulting in lower number of effective samples).


The outcome of a game is like rolling a dice and the result depends on probabilities. It won't change the outcome if you roll the dice stronger. The results of a game with more playouts are more important for us, no doubt, but to get a statistical significant result you don't need less games as with few playouts. This is simple mathematics.
...
The data are related to the strength of nets of different sizes. Of course their strength depends on the number of playouts.

The issue of this topic is only the statistical significance of results...

When I wrote, that Your tests are "synthetic" with these small amounts of the thinking time (playouts), and You answered, that my tests aren't "statistically significant" with these amounts of games, I answered, that Your tests aren't "practically significant", because they can get another results, than in sparring with real time control. I wrote too (I didn't remember: there or in PM), that in case of pure MC engines with amount of time (playouts) on move --> 0 the game will --> to random and the match result --> to 50%/50% regardless of engine strength (but stronger engine can get <50% of win because of statistical deviation).
I don't know, will it be or not the same U-shaped curves in case if x-axes will be in time (with constant PC performance) or playouts on move and y-axes will be in win % (and much more in case of other neuronets and engines), that in the data (with amounts of visits) above (not all curves even there are U-shaped), but if these curves will cross the straight line of 50%, the results will depend from the number of playouts not only quantitatively, but also qualitatively...
Quote:
The number of playouts must be high enough to get a statistical significant result.

I am glad, that You understood the main idea...

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #322 Posted: Sat Sep 14, 2019 4:43 am 
Dies with sente

Posts: 91
Liked others: 1
Was liked: 1
Rank: 30 kyu
jann wrote:
The size of the net and the low parity factor is closely related (larger nets are stronger but slower). Same-size nets tend to be closer in strength, that's why the curve is less steep. And again, there were plenty of other tests done beyond the single linked graph.

The larger nets are stronger only potentially, because they are "thinking slower" not only when playing, but also when learning (example).

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #323 Posted: Sat Sep 14, 2019 4:50 am 
Lives with ko

Posts: 179
Liked others: 15
Was liked: 23
Rank: Beginner
q30 wrote:
Quote:
The number of playouts must be high enough to get a statistical significant result.

I am glad, that You understood the main idea...


I am sorry to say that, but once again you didn't understand at all... This was related to Monte Carlo Tree search...

You can't participate in such discussions with Google Translator.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #324 Posted: Sat Sep 14, 2019 5:44 am 
Dies with sente

Posts: 91
Liked others: 1
Was liked: 1
Rank: 30 kyu
as0770 wrote:
q30 wrote:
Quote:
The number of playouts must be high enough to get a statistical significant result.

I am glad, that You understood the main idea...


I am sorry to say that, but once again you didn't understand at all... This was related to Monte Carlo Tree search...

You can't participate in such discussions with Google Translator.

Almost all engines with neuronets are using MC search too (and are using it results for resign), for example, in LZ: neuronets - visits (and nneval win values), MC - playouts (and win %)...

I try without any translator, but for some words and expressions use https://www.translate.ru.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #325 Posted: Sat Sep 14, 2019 6:19 am 
Lives with ko

Posts: 179
Liked others: 15
Was liked: 23
Rank: Beginner
q30 wrote:
Almost all engines with neuronets are using MC search too (and are using it results for resign), for example, in LZ: neuronets - visits (and nneval win values), MC - playouts (and win %)...


You still didn't understand. It was related to "Monte Carlo Tree search" and not to "engines that use Monte Carlo Tree search".

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #326 Posted: Sat Sep 14, 2019 9:24 am 
Dies with sente

Posts: 91
Liked others: 1
Was liked: 1
Rank: 30 kyu
as0770 wrote:
q30 wrote:
Almost all engines with neuronets are using MC search too (and are using it results for resign), for example, in LZ: neuronets - visits (and nneval win values), MC - playouts (and win %)...


You still didn't understand. It was related to "Monte Carlo Tree search" and not to "engines that use Monte Carlo Tree search".


So, You are thinking still, that even in case of pure MC engines there is more "statistically significant" to minimize the engines thinking time (down to 0 in limit) and maximize the amount of games (up to infinity in limit) for receiving real idea of the engines strength ratio, aren't You?

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #327 Posted: Sat Sep 14, 2019 10:02 am 
Lives with ko

Posts: 179
Liked others: 15
Was liked: 23
Rank: Beginner
q30 wrote:
as0770 wrote:
q30 wrote:
Almost all engines with neuronets are using MC search too (and are using it results for resign), for example, in LZ: neuronets - visits (and nneval win values), MC - playouts (and win %)...


You still didn't understand. It was related to "Monte Carlo Tree search" and not to "engines that use Monte Carlo Tree search".


So, You are thinking still, that even in case of pure MC engines there is more "statistically significant" to minimize the engines thinking time (down to 0 in limit) and maximize the amount of games (up to infinity in limit) for receiving real idea of the engines strength ratio, aren't You?


If you use little time/playouts, you can determine the strength with little time/playouts. If you want to know the strength with much time/playouts you have to play with much time/playouts. In both cases you need the same amount of games to get a statistical significant result. Quite simple, isn't it?

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #328 Posted: Sun Sep 15, 2019 3:46 am 
Dies in gote

Posts: 63
Liked others: 0
Was liked: 16
as0770 wrote:
In both cases you need the same amount of games to get a statistical significant result.

jann wrote:
As you can see the stronger engine is expected to win more games under high-search conditions. For the weaker net to win a 400 game match by a chance upset, he needs the noise / random deviation to overcome the strengthwise expected advantage of the stronger player. Random deviation is constant for 400 games, the advantage of the stronger player is bigger with more playouts, hence the probability of getting the winner/stronger side wrong is less for the same number of games.

Your basic oversight is only worrying about the absolute margin of error. But statistical significance is about the proportion between the signal to be observed and the margin of error, ie. the relative error.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #329 Posted: Sun Sep 15, 2019 6:04 am 
Lives with ko

Posts: 179
Liked others: 15
Was liked: 23
Rank: Beginner
jann wrote:
Your basic oversight is only worrying about the absolute margin of error.


Indeed, this was the subject of debate. Some mean that you can measure the strength with a few games as long as the quality is good enough.

jann wrote:
As you can see the stronger engine is expected to win more games under high-search conditions. For the weaker net to win a 400 game match by a chance upset, he needs the noise / random deviation to overcome the strengthwise expected advantage of the stronger player. Random deviation is constant for 400 games, the advantage of the stronger player is bigger with more playouts, hence the probability of getting the winner/stronger side wrong is less for the same number of games.


That's because the strength difference in high- and low-search conditions seems to be bigger (up to one point). The reason for that is a completely different topic.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #330 Posted: Mon Sep 16, 2019 3:40 am 
Dies in gote

Posts: 63
Liked others: 0
Was liked: 16
as0770 wrote:
Some mean that you can measure the strength with a few games as long as the quality is good enough.

I haven't seen such claims, but that's wrong as well. Confidence requires both quality and quantity - a fair amount of representative samples.

The reason I phrased like low quality games need more samples (and not the reverse) is because the two directions are asymmetric. The effect of amplifying the signal is not necessarily always strong enough, may have rare exceptions etc so it's better to just think of sample weights ~= 1 if the playouts are decent enough. But the opposite is different. If there is even a POTENTIAL that your samples become more random, your signal weakens and the expected difference falls near 50%, you already need more samples.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #331 Posted: Sat Sep 21, 2019 1:44 am 
Dies with sente

Posts: 91
Liked others: 1
Was liked: 1
Rank: 30 kyu
as0770 wrote:
If you use little time/playouts, you can determine the strength with little time/playouts. If you want to know the strength with much time/playouts you have to play with much time/playouts. In both cases you need the same amount of games to get a statistical significant result. Quite simple, isn't it?

OK. And in this case why You had begun this discussion, after I had written, that Your tests are "synthetic" due to unreal low number of playouts in them? (In anticipation: the answer on this question isn't quite simple for me...)

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #332 Posted: Sun Sep 22, 2019 1:34 pm 
Lives with ko

Posts: 179
Liked others: 15
Was liked: 23
Rank: Beginner
jann wrote:
as0770 wrote:
Some mean that you can measure the strength with a few games as long as the quality is good enough.

I haven't seen such claims


You can start reading at #227.

q30 wrote:
as0770 wrote:
If you use little time/playouts, you can determine the strength with little time/playouts. If you want to know the strength with much time/playouts you have to play with much time/playouts. In both cases you need the same amount of games to get a statistical significant result. Quite simple, isn't it?

OK. And in this case why You had begun this discussion, after I had written, that Your tests are "synthetic" due to unreal low number of playouts in them? (In anticipation: the answer on this question isn't quite simple for me...)


What do you mean with "in this case"? I neither wrote a novelty, nor any contradiction. Seriously I am not sure if you just don't understand anything or if you are trying to fool us.

I won't start this argument once again. I played 1h and 2h games on 1-4 cores. This is not such a low number of playouts especially since you later quote games with 3000 visits. Your comment's where just disrespectful and not well-founded.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #333 Posted: Sat Sep 28, 2019 5:01 am 
Dies with sente

Posts: 91
Liked others: 1
Was liked: 1
Rank: 30 kyu
as0770 wrote:
q30 wrote:
as0770 wrote:
If you use little time/playouts, you can determine the strength with little time/playouts. If you want to know the strength with much time/playouts you have to play with much time/playouts. In both cases you need the same amount of games to get a statistical significant result. Quite simple, isn't it?

OK. And in this case why You had begun this discussion, after I had written, that Your tests are "synthetic" due to unreal low number of playouts in them? (In anticipation: the answer on this question isn't quite simple for me...)


What do you mean with "in this case"? I neither wrote a novelty, nor any contradiction. Seriously I am not sure if you just don't understand anything or if you are trying to fool us.

I won't start this argument once again. I played 1h and 2h games on 1-4 cores. This is not such a low number of playouts especially since you later quote games with 3000 visits. Your comment's where just disrespectful and not well-founded.

In case You understand, that You wrote...
On Your once again contradiction (that You never wrote)... The game longitude must depend on number of moves: it must not be the same for 91 moves and 291 moves games. I never used visit limitation in my tests. May be I had quoted other test in some context... I limit only move time (except one additional match with limited playouts number). I never try to found on shell visits number, but due to continuously playouts output on shell, it's more conveniently to be guided on playouts number. You can see my tests playouts number for LeelaZero with different neuronets weights categories here.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #334 Posted: Sat Sep 28, 2019 5:48 am 
Lives in sente

Posts: 935
Liked others: 0
Was liked: 159
Please lets calm this down. I think it is clear to some of us that at least part of the problem is with language. I'm pretty sure not all are native English speakers.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #335 Posted: Fri Nov 01, 2019 7:05 am 
Dies with sente

Posts: 91
Liked others: 1
Was liked: 1
Rank: 30 kyu
The best among LeelaZero "welterweight" neuronets is 15b_245_72k_q and it's stronger, then "lightweight" winner (details).

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #336 Posted: Fri Nov 01, 2019 9:40 am 
Lives with ko

Posts: 179
Liked others: 15
Was liked: 23
Rank: Beginner
q30 wrote:
On Your once again contradiction (that You never wrote)... The game longitude must depend on number of moves: it must not be the same for 91 moves and 291 moves games. I never used visit limitation in my tests. May be I had quoted other test in some context... I limit only move time (except one additional match with limited playouts number). I never try to found on shell visits number, but due to continuously playouts output on shell, it's more conveniently to be guided on playouts number. You can see my tests playouts number for LeelaZero with different neuronets weights categories here.


If one prefers a time setting of "games in 120 minutes" or "1s/move" is a matter of taste. Time in X means that an engine usually uses most of the time for the first 200 moves, and plays faster when the game is already decided. There are good reasons for both options. No need to offend someone if he does it either way.

Still I have no clue what contradiction you are talking about...

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #337 Posted: Sat Nov 09, 2019 5:09 am 
Dies with sente

Posts: 91
Liked others: 1
Was liked: 1
Rank: 30 kyu
as0770 wrote:
q30 wrote:
On Your once again contradiction (that You never wrote)... The game longitude must depend on number of moves: it must not be the same for 91 moves and 291 moves games. I never used visit limitation in my tests. May be I had quoted other test in some context... I limit only move time (except one additional match with limited playouts number). I never try to found on shell visits number, but due to continuously playouts output on shell, it's more conveniently to be guided on playouts number. You can see my tests playouts number for LeelaZero with different neuronets weights categories here.


If one prefers a time setting of "games in 120 minutes" or "1s/move" is a matter of taste. Time in X means that an engine usually uses most of the time for the first 200 moves, and plays faster when the game is already decided. There are good reasons for both options. No need to offend someone if he does it either way.

Still I have no clue what contradiction you are talking about...

About time:
1) "Engine usually uses" - doesn't mean engines tests equivalency...
2) On 200 move the game may not be decided...
So one option may be good for play with human, and other - for engines tests.

Contradiction is that 1h game on 1 core and 2h game on 4 cores couldn't be the tests for determining one rating of engines, because:
Quote:
If you use little time/playouts, you can determine the strength with little time/playouts. If you want to know the strength with much time/playouts you have to play with much time/playouts.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #338 Posted: Sat Nov 09, 2019 7:54 am 
Lives with ko

Posts: 179
Liked others: 15
Was liked: 23
Rank: Beginner
q30 wrote:
"Engine usually uses" - doesn't mean engines tests equivalency...


Your understanding of equivalency without variety would mean you can only play two games.

q30 wrote:
Contradiction is that 1h game on 1 core and 2h game on 4 cores couldn't be the tests for determining one rating of engines,


This has nothing to do with what I wrote. Once again, this kind of discussion don't work with online translators.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #339 Posted: Sat Nov 16, 2019 2:15 am 
Dies with sente

Posts: 91
Liked others: 1
Was liked: 1
Rank: 30 kyu
I use translator only for translation to English of some words while I'm posting message.
So, if You want, that most of people there (who have already written that they aren't native English speakers) understand You right, try to use simple unambiguous terminology without any beautiful, but superfluous words and phrases, please.

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 339 posts ]  Go to page Previous  1 ... 13, 14, 15, 16, 17

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group