jann wrote:
as0770 wrote:
The chance it would win all games is e.g. 60% but there is also a chance it will lose all games by e.g. 40%. The random factor is part of the match condition.
This is not just what I meant. For the stronger net to lose, there still must be some random factor - something that can go against it. Without such, and without even being unlucky, it won't lose.
If Engine A wins 60% against Engine B, it is supposed to be stronger. If there is no random factor, the Engines will play the same game again and again and one engine will win all games. Hence the chance, that the stronger Engine A loses all games, is 40%.
On the other hand the random factor will affect the result very much, a high random factor might force the stronger engine to play moves that it don't like and it may even lose because of that if the weaker engine can handle that better. So it is a very important point for interpreting the results of a match and you will get completely different results just by changing the random factor, and it is unpredictable which influence it has.
jann wrote:
Quote:
In a match between A and B it might happen that A wins with x playouts, and B with x*4 playouts. Then A is stronger in games with x playouts and B in games with x*4 playouts.
No, this is quite unlikely.
Just because it is unlikely you can't ignore it. We are talking about determining the strength of an engine and how much games you need to get a statistical significant result. If you think to know the outcome of a match you don't need to play it. And as soon as you don't set the number of playouts but the time for each move you will find out that there are matchups small vs. big nets where with little time the small net will win and with more time the big net.
The same might happen with nets of equal size, one net understands ladders, the other one needs a special amount of playouts to calculate ladders.
jann wrote:
It was observed that higher playouts usually match the results of lower playouts, only with increased differences.
With every match you evaluate the strength in a specific condition. What you are talking about is maybe an effect when matching similar engine nets. It might be different with other types of engines. In fact the opposite is true as Bill easily proved. But that is not at all the point. The only point I am talking about is the statistical significance of a result.
That means if you have a result of A vs. B of 220:180 the chance that B is stronger in this match conditions is still > 2%. Regardless the number of playouts.
And btw, good points don't begin with "it was observed that..." ;-)
jann wrote:
as0770 wrote:
The statistical significance for every match condition depends exclusively on the number of games.
No, it also depends on the quality/representativeness of the games. A less representative / more random game sample can be though of like having N% chance of being replaced by a random value (thus resulting in lower number of effective samples).
The outcome of a game is like rolling a dice and the result depends on probabilities. It won't change the outcome if you roll the dice stronger. The results of a game with more playouts are more important for us, no doubt, but to get a statistical significant result you don't need less games as with few playouts. This is simple mathematics.
...
The data are related to the strength of nets of different sizes. Of course their strength depends on the number of playouts.
The issue of this topic is only the statistical significance of results...
When I wrote, that Your tests are "synthetic" with these small amounts of the thinking time (playouts), and You answered, that my tests aren't "statistically significant" with these amounts of games, I answered, that Your tests aren't "practically significant", because they can get another results, than in sparring with real time control. I wrote too (I didn't remember: there or in PM), that in case of pure MC engines with amount of time (playouts) on move --> 0 the game will --> to random and the match result --> to 50%/50% regardless of engine strength (but stronger engine can get <50% of win because of statistical deviation).
I don't know, will it be or not the same U-shaped curves in case if x-axes will be in time (with constant PC performance) or playouts on move and y-axes will be in win % (and much more in case of other neuronets and engines), that in the data (with amounts of visits) above (not all curves even there are U-shaped), but if these curves will cross the straight line of 50%, the results will depend from the number of playouts not only quantitatively, but also qualitatively...
The number of playouts must be high enough to get a statistical significant result.
I am glad, that You understood the main idea...