It is currently Mon Nov 11, 2024 6:27 am

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 402 posts ]  Go to page Previous  1 ... 16, 17, 18, 19, 20, 21  Next
Author Message
Offline
 Post subject: Re: Engine Tournament
Post #361 Posted: Sat Jun 19, 2021 3:45 am 
Lives with ko

Posts: 145
Liked others: 1
Was liked: 1
Rank: 30 kyu
As the result of finished distributed learning LeelaZero neuronet project, the strongest weight file is the leelaz-model-swa-4-32000_quantized.txt, not the one in best-network.gz (details).
It won 55 games and lost 39 games - 59%:41% (when the best-network won 43 games and lost 39 games - 52%:48%). This not a very small statistics proves that big statistics chase to the detriment of real conditions compliance may lead to a bit wrong results.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #362 Posted: Sat Jul 10, 2021 1:11 am 
Lives with ko

Posts: 145
Liked others: 1
Was liked: 1
Rank: 30 kyu
The rate of SAI weight files on 2021 yr beginning (details):
"bantamweight" <|= 2 ^ 23 B (< 12 MiB) - rw19x19.txt; (5)
"featherweight" 2 ^ 24 B (12 - 24 MiB) - I haven't;
"lightweight" 2 ^ 25 B (24 - 48 MiB) - b12a30551826858ce24a21e48cf4c20fea3c25bacba00c01c9763d020908185e; (4)
"welterweight" 2 ^ 26 B (48 - 96 MiB) - 4433b9162a5ad473120d0731b951b649829e0c155e0590f9f1e51a808f5a3263; (3)
"middleweight" 2 ^ 27 B (96 - 192 MiB) - 87022c79f36dc8dc81c7b7aa1f1f250858b44d137434304b910dd3a60612716a; (2)
"light heavyweight" 2 ^ 28 B (192 - 384 MiB) - af94bb0c79cc88d2ce8ca6459f5d1b603e9053182d5a47e457cc3af9180f66f1; (1)
"heavyweight" 2 ^ 29 B (384 - 768 MiB) - I haven't;
"super heavyweight" >|= 2 ^ 30 B (> 768 MiB) - I haven't.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #363 Posted: Sat Jul 31, 2021 1:04 am 
Lives with ko

Posts: 145
Liked others: 1
Was liked: 1
Rank: 30 kyu
SAI is weaker than LeelaZero (details).

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #364 Posted: Sat Aug 14, 2021 1:11 am 
Lives with ko

Posts: 145
Liked others: 1
Was liked: 1
Rank: 30 kyu
The rate of GTP engines without using GPU on the 2021 year beginning (details):

Top level
1) KataGo
2) LeelaZero
3) SAI

High level
4) Leela
5) Rayon
6) Zenith
7) Pachi_DCNN
8) Hiratuka

Middle level
9) Ray
10)Pachi
11)MoGo

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #365 Posted: Sat Oct 02, 2021 4:11 am 
Lives with ko

Posts: 145
Liked others: 1
Was liked: 1
Rank: 30 kyu
KataGo v.1.8.0 - v.1.9.1: 15 - 15 (details).

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #366 Posted: Sat Nov 13, 2021 3:31 am 
Lives with ko

Posts: 145
Liked others: 1
Was liked: 1
Rank: 30 kyu
LeelaZero "next" branch v. 24.08.21 - release v. 0.17: 25 - 29 (details).

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #367 Posted: Sat Dec 04, 2021 5:31 am 
Lives with ko

Posts: 145
Liked others: 1
Was liked: 1
Rank: 30 kyu
KataGo v.1.10.0 - v.1.9.1: 19 - 17 (details).

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #368 Posted: Sat Jan 15, 2022 3:51 am 
Lives with ko

Posts: 145
Liked others: 1
Was liked: 1
Rank: 30 kyu
The strongest SAI weight file of 2021 year is 527ae617c8a61caae4473a69b8eb1411175fc2c3bcd35d13ca42dbf5a98090fa (details).

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #369 Posted: Sat Jan 29, 2022 2:48 am 
Lives with ko

Posts: 145
Liked others: 1
Was liked: 1
Rank: 30 kyu
I don't understand, what is the purpose of SAI project: this LeelaZero fork updates weight file, but it is weaker than LeelaZero's year old ones (details), updates source files, but doesn't have compiled for different CPU and GPU types binary releases (like KataGo)...
The 2021 year rate of SAI "weight categories" is next:
"bantamweight".........<|= 2 ^ 23 B (< 12 MiB)...- rw19x19.txt.....................................................................................(5)
"featherweight".........2 ^ 24 B (12 - 24 MiB).....- I haven't
"lightweight".............2 ^ 25 B (24 - 48 MiB).....- b12a30551826858ce24a21e48cf4c20fea3c25bacba00c01c9763d020908185e..(4)
"welterweight"..........2 ^ 26 B (48 - 96 MiB).....- 4433b9162a5ad473120d0731b951b649829e0c155e0590f9f1e51a808f5a3263....(3)
"middleweight"..........2 ^ 27 B (96 - 192 MiB)....- 87022c79f36dc8dc81c7b7aa1f1f250858b44d137434304b910dd3a60612716a...(2)
"light heavyweight"....2 ^ 28 B (192 - 384 MiB)...- 527ae617c8a61caae4473a69b8eb1411175fc2c3bcd35d13ca42dbf5a98090fa..(1)
"heavyweight"...........2 ^ 29 B (384 - 768 MiB)...- I haven't
"super heavyweight"..>|= 2 ^ 30 B (> 768 MiB)..- I haven't

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #370 Posted: Sat Feb 12, 2022 2:54 am 
Lives with ko

Posts: 145
Liked others: 1
Was liked: 1
Rank: 30 kyu
The last in 2021 year KataGo weight files are stronger than the year beginning ones (details).
The KataGo weight files 2021 year rating by "category" is next:
"bantamweight".........<|= 2 ^ 23 B (< 12 MiB)....- g170e-b10c128-s1141046784-d204142634.bin (6)
"featherweight".........2 ^ 24 B (12 - 24 MiB)......- I haven't
"lightweight".............2 ^ 25 B (24 - 48 MiB)......- g170e-b15c192-s1672170752-d466197061.bin (5)
"welterweight"..........2 ^ 26 B (48 - 96 MiB)......- g170e-b20c256x2-s5303129600-d1228401921.bin (4)
"middleweight"..........2 ^ 27 B (96 - 192 MiB)....- kata1-b40c256-s10638505984-d2592890214.bin (1)
"light heavyweight"....2 ^ 28 B (192 - 384 MiB)...- g170-b30c320x2-s4824661760-d1229536699.bin (2)
"heavyweight"...........2 ^ 29 B (384 - 768 MiB)...- kata1-b60c320-s5026470912-d2583431160.bin (3)
"super heavyweight"..>|= 2 ^ 30 B (> 768 MiB)...- I haven't

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #371 Posted: Sat Feb 12, 2022 9:35 am 
Lives in sente

Posts: 758
Liked others: 114
Was liked: 916
Rank: maybe 2d
Partly cross-posting from a github thread where q30 has linked this result:
Quote:
I'm glad you're enthusiastic, but I still don't understand why you insist on using such a tiny number of games (only 4 per network!!) and justifying it on the basis of wanting to serve "end users".

If that is the only computation power you can afford, sure. I absolutely respect and appreciate doing the best one can with limited resources. No problem! :)

But instead if it's a deliberate choice to use fewer games to better match what end users would experience, then it's silly. Rather than deliberately using an error-prone measurement because you think most users will not notice, it's certainly at least no harm to use an accurate measurement (more games) and report the accurate difference. Then each user can decide for themselves if the accurately-reported difference is big enough to care about.

Four games per test is especially few. Consider a bot A that beats B 60% of the time. I would guess most people would consider that not a huge difference, but still a respectable one. However, with only 4 games, the chance that B beats A 3-1 or 4-0 is about 18%! So there is an 18% chance you'd come up with the entirely backwards conclusion.

You've argued many times in the past that "end users" will only use the bot for few games themselves, therefore the way to make the best recommendation is to test using only a few games because it better matches the usage, rather than tests with a large number of games. We can see by the following example that such logic isn't very good:

  • Suppose we did do a 4 game test and we did get a 3-1 result in favor of B (getting a result that was only 18% likely is very possible!).
  • Suppose we also did a 1000 game test and this time, the result was that A won 613 games and B won 387 games.

Consider a user who plans to use either bot A or bot B in a tournament where it will play 4 games, and they want the bot with the best chance of doing well. Based on the above two tests, which bot should we recommend to them? Should we trust the 4 game test and recommend B because the tournament will also be 4 games, therefore a 4-game test is the most reliable? Our should we trust the 1000 game test and recommend A because the 1000 game test is overall more accurate measurement?

Obviously we should recommend bot A to them!

We can see here a clear demonstration that the principle "if end users will only notice larger differences and will only be using the bot for a very few games, then the best way to make a good recommendation to to also run tests using only a very few games" is a bad principle. The way to make a good recommendation to an end user that will run few games is to test many times more games than they will use.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #372 Posted: Sat Feb 12, 2022 3:32 pm 
Lives in sente

Posts: 758
Liked others: 114
Was liked: 916
Rank: maybe 2d
Maybe a last try to explain it more intuitively: suppose you are trying to serve only users who only care about a large enough effect that they might notice themselves it in 3-5 games.

If you run only 3-5 games yourself, you simulate what a single such user would notice. This is already of some usefulness. But we know there will be significant variation in the results different users will experience, because the way we are measuring has some randomness.

So even if that single user would notice some difference or not, due to this random variation and luck maybe some other users would get a different result. If we want our result to be confidently useful to many users, not just one user, we should simulate many users, not just one user. Maybe we could simulate what 5-10 different users each would see.

In other words, we might want to run 3-5 games, 5-10 times. And there you go. :) As long as we can afford it, if we want to be reliable and responsible in our conclusion, we should run at least several times *more* games than the minimum a single user would need to have a chance to notice something.

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #373 Posted: Sat Feb 19, 2022 6:47 am 
Lives with ko

Posts: 145
Liked others: 1
Was liked: 1
Rank: 30 kyu
So, if the test time You distribute as follows: number of games -> infinity and timings -> 0, then You get the most accurate results. Did I understand Your point of view correctly?

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #374 Posted: Sat Apr 16, 2022 3:07 am 
Lives with ko

Posts: 145
Liked others: 1
Was liked: 1
Rank: 30 kyu
KataGo v.1.10.0-v.1.11.0: 17-15 (details).

_________________
Go board with attached strong engines on few strength levels from random to "God-like" playing (instructions for automated attachment in Russian) & sparring games of Go engines

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #375 Posted: Sat Sep 10, 2022 5:49 am 
Lives with ko

Posts: 145
Liked others: 1
Was liked: 1
Rank: 30 kyu
Ray became even weaken, weaker than MoGo (details)...

_________________
Go board with attached strong engines on few strength levels from random to "God-like" playing (instructions for automated attachment in Russian) & sparring games of Go engines

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #376 Posted: Sat Jan 14, 2023 5:57 am 
Lives with ko

Posts: 145
Liked others: 1
Was liked: 1
Rank: 30 kyu
SAI continues to regress (details). Thanks to all of them, who gives for that their computing time...

_________________
Go board with attached strong engines on few strength levels from random to "God-like" playing (instructions for automated attachment in Russian) & sparring games of Go engines

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #377 Posted: Sat Feb 18, 2023 4:53 am 
Lives with ko

Posts: 145
Liked others: 1
Was liked: 1
Rank: 30 kyu
KataGo became stronger after the year training (details). The "heavyweight" file became a bit stronger (12-8), than the two years old "light heavyweight" one...
The 2022 year rate of KataGo "weight categories" is next:
"bantamweight" <|= 2 ^ 23 B (< 12 MiB) - g170e-b10c128-s1141046784-d204142634.bin (6)
"featherweight" 2 ^ 24 B (12 - 24 MiB) - I haven't
"lightweight" 2 ^ 25 B (24 - 48 MiB) - g170e-b15c192-s1672170752-d466197061.bin (5)
"welterweight" 2 ^ 26 B (48 - 96 MiB) - g170e-b20c256x2-s5303129600-d1228401921.bin (4)
"middleweight" 2 ^ 27 B (96 - 192 MiB) - kata1-b40c256-s12350780416-d3055274313.bin (1)
"light heavyweight" 2 ^ 28 B (192 - 384 MiB) - g170-b30c320x2-s4824661760-d1229536699.bin (3)
"heavyweight" 2 ^ 29 B (384 - 768 MiB) - kata1-b60c320-s6782286336-d3070935549.bin (2)
"super heavyweight" >|= 2 ^ 30 B (> 768 MiB) - I haven't

_________________
Go board with attached strong engines on few strength levels from random to "God-like" playing (instructions for automated attachment in Russian) & sparring games of Go engines

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #378 Posted: Sat Feb 18, 2023 5:50 pm 
Beginner

Posts: 17
Liked others: 1
Was liked: 10
Try using the new b18c384nbt-uec.bin.gz net, it's 95MB in size and currently the strongest available net. As of now it's only available through KataGo's github page, but in a couple days/weeks it should replace 60b as the main training net. You'll need KataGo 1.12 to run it https://github.com/lightvector/KataGo/releases/tag/v1.12.4

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #379 Posted: Wed Feb 22, 2023 5:04 am 
Lives with ko

Posts: 145
Liked others: 1
Was liked: 1
Rank: 30 kyu
I'm using sizes of unpacked files. It will replace the weight file of the same "weight category"...

Top
 Profile  
 
Offline
 Post subject: Re: Engine Tournament
Post #380 Posted: Sat Mar 04, 2023 6:46 am 
Lives with ko

Posts: 145
Liked others: 1
Was liked: 1
Rank: 30 kyu
Newer (eigen) version of KataGo is a bit (statistically insignificant) weaker than older one (again): v1.12.4-v1.11.0 15-17 (details)...

_________________
Go board with attached strong engines on few strength levels from random to "God-like" playing (instructions for automated attachment in Russian) & sparring games of Go engines

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 402 posts ]  Go to page Previous  1 ... 16, 17, 18, 19, 20, 21  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group