KataGo Distributed Training and new networks

And · **#81**

@akigo many thanks!!!

And · **#82**

BadukAI v1.4 40b(original version s604, maxPlayouts 1, numSearchThreads 1) - Zenith 7 9d, komi 0, 2:2
BadukAI v1.4 40b(optimized version s604, maxPlayouts 1, numSearchThreads 1) - Zenith 7 9d, komi 0, 1:3
all games BadukAI plays white

BadukAI (optimized version) - Zenith:

VietGo · **#83**

Katago s616 70 visits, 1 playout, NumberOfSearchThread: 1 vs Golaxy Lion two star(3700 elo rating): 1-0.
Incredible power of the strongest network.

And · **#84**

BadukAI v1.5 40b(optimized version s604, maxPlayouts 1, numSearchThreads 1) - Zenith 7 9d, komi 0, 3:1
all games BadukAI plays white

go4thewin · **#85**

Wow, so am i understanding correctly that out of 8 games with no komi, the optimized net won 4? so with 1 playout it is stronger than zen 7 9d. That is really incredible for such a fast net. It really makes katago very mobile friendly to play against, especially with the kyu_rank option. I think the non-optimized net was too big and slow for some very older or underpowered phones. Thanks for the tests!

And · **#86**

go4thewin wrote:

Wow, so am i understanding correctly that out of 8 games with no komi, the optimized net won 4?

Yes

And · **#87**

BadukAI v1.5 40b(optimized version s604, maxPlayouts 1, numSearchThreads 1) - CS Zero 9d, komi 0, 0:5
all games BadukAI plays white
BadukAI resigned on move 74, 112, 128, 74, 96

And · **#88**

BadukAI v1.5 40b(optimized version s604, maxPlayouts 1, numSearchThreads 1) - CS Pro 5d(android), H5

go4thewin · **#89**

VietGo wrote:

Katago s616 70 visits, 1 playout, NumberOfSearchThread: 1 vs Golaxy Lion two star(3700 elo rating): 1-0.
Incredible power of the strongest network.

@Vietgo you may like this. Kata s634 1 playout beat galaxy cow 9d

newer s640 1 playout (sgf not uploaded) may be stronger than golaxy cow 9d, won both as black and white: 2-0

And · **#90**

Is it possible to estimate the limit according to this schedule?
https://katagotraining.org By Data Rows (linear)
or calculate its value for example for 10G? (of course the relative values are interesting)

lightvector · **#91**

If you're going to calculate any values or do any estimation, the log scale is the proper way to do it, not the linear scale.

In general across a wide variety of games, to first order Elo grows logarithmically with the amount of learning, or the amount of computation time invested per move, or the size of a model or a pattern or feature database, etc. If it makes it more intuitive for you, you could also mentally conceptualize it as there being some underlying amount of "computation" (where learned knowledge is like a cached form of computation too) that scales linearly with all these things, and then your Elo is just the logarithm of that value.

So if you are going to extrapolate anything based on the existing data, the correct relationship to start with to try to extrapolate would be Elo versus log(data), not Elo versus data.

Maharani · **#92**

spook wrote:

@Maharani,
Right now, it's just the latest.

I use an API to query the list of networks. And I wasn't really sure how to determine which one's the "strongest confidently rated" network.
If I would have to make an educated guess, it would be the latest network with "log_gamma_uncertainty" < 0.05.
Maybe somebody reading this, can confirm or deny.

From this thread: https://lifein19x19.com/viewtopic.php?p=263568#p263568

lightvector · **#93**

Strongest confidently rated network filters down to networks whose standard deviation of Elo uncertainty is less than 100... which is almost all of them, this usually just excludes a new net that barely has any data yet... and picks the net with the highest (mean - 2 * standard deviation).

ez4u · **#94**

It seems to me that the " Strongest confidently-rated network": kata1-b40c256-s6485784576-d1573360039 declined significantly as a result of recent rating games. As of this post it is showing "13226.9 ± 12.4 - (3,292 games)". However, just a couple of days ago it was above 13250 [+/-??? (I don't remember)] based on 2,000 games or so. At that time it appeared as a clear anomaly in the rating graph on the project page as the only net over 13,250.

Is there anywhere that we can see the historical development of each net's rating?
Do we need a new definition of "confidently-rated"?

lightvector · **#95**

No, we don't need a new definition.

Let's clarify what the purpose of this selection metric is - to pick a reliably good network with high confidence:

Even if the gap is smaller now, "kata1-b40c256-s6485784576-d1573360039" is still rated as being among the strongest all nets before and nearby it, right?

And also the error bar on that net is small, so even it turns out not to be the strongest, with very high likelihood it's not one of the nets which performs unusually poorly, right?

So in both ways, the selection criterion is doing its job well. With decent reliability, it picks out a recent and strongest or nearly-strongest net compared to its neighbors, and with high reliability avoids really bad nets, despite major uncertainty in the ratings relative to the magnitude of differences it's attempting to discriminate between.

Additionally, keep in mind about Elo values in general:

In general, across almost all Elo systems, pay more attention to Elo differences, than to absolute Elo numbers.. This is true in general for Elo systems, except for perhaps ones that take extreme pains to maintain stability across time. You can see how Go server and association ratings are all over the place relative to one another, as well as sometimes having inflation or deflation over time. In Chess world things are more stable, but still there is sometimes a little bit of noise or drift, and mild inconsistency between systems. And every different published research paper also uses an Elo scale whose absolute offset is incomparable to that of any other paper. In all cases, the differences are more meaningful than the absolute numbers.

In KataGo's case, the anchor point of the graph right now is arbitrarily chosen as 0 = random, and new rating games are played all the time even between very old nets. If back when KataGo was moving through "DDK" level, new-games indicate that over a span of some nets only 2000 Elo was gained instead of 2050, the entire rating graph above it will shift by 50 Elo, even though nothing practical has changed about our belief of the strength of the current nets. So the absolute number, really, really doesn't matter here.

And, a note about Elo locality:
Even more than ignoring absolutes and paying attention to just differences, in any Elo system you ever find in practice, you usually should only consider the local differences reliable - the ratings difference between a player and other players near them. For larger differences, they are the transitive sum of smaller differences, rather than directly measured. So when P1 is 1150 Elo better than P2 in *any* practical system (not just KataGo), that should be understood to mean something like:

"P1 is measured to approximately win 3:1 against players who win 3:1 against players who win 3:1... against P2", in total iterated 6 times.

It does NOT mean:

"P1 is measured to approximately win 750:1 against P2".

Because in practice, no Elo system will have the games to measure that accurately. Plus, we know that Elo itself is only an approximation of reality. In truth "skill level" is more complex and multidimensional, and precisely one of the places that approximation starts becoming unreliable is in very large differences. So that means that the interpretation of the vertical confidence bands in KataGo's rating graph is a bit subtle. The confidence bands around the nets should be understood to be confidence bands with respect to the Elos of the population of nets around it, say, within the nearby +/ 300 Elo or so. If the local population as a whole moves up or down by more, it doesn't matter.

lightvector · **#96**

Or, if you want the TLDR:

Only the Elo differences between nets are meaningful. If tomorrow I were to reanchor or recalibrate the graph and it causes everyone's Elo to shift by 1000 points, but all the differences between the latest nets stay about the same, it does not mean something like 13200+/-25 was wrong by 1000 points. Because the absolute Elos are not meaningful, only the Elo differences.

Bill Spight · **#97**

lightvector wrote:

Because in practice, no Elo system will have the games to measure that accurately. Plus, we know that Elo itself is only an approximation of reality. In truth "skill level" is more complex and multidimensional, and precisely one of the places that approximation starts becoming unreliable is in very large differences. So that means that the interpretation of the vertical confidence bands in KataGo's rating graph is a bit subtle. The confidence bands around the nets should be understood to be confidence bands with respect to the Elos of the population of nets around it, say, within the nearby +/ 300 Elo or so. If the local population as a whole moves up or down by more, it doesn't matter.

(Emphasis mine.)

Because skill level is multidimensional, it is only partially ordered. Hence, cases where player A usually beats player B, who usually beats player C, who usually beats player A are not uncommon. And because of shared history, it is not unusual for groups of players to have similar strengths and weaknesses. To some extent a player's rating will thus depend upon who they play against.

Yakago · **#98**

It comes to mind when Remi Coulom started Goratings.org, there was someone who commented about how they found it wrong that the ratings of the top go players were lower than those of top chess players.

After some discussion about why this didn't make sense, I don't know if it was on twitter or some post here, Remi then replied (sarcastically I suppose) that he adjusted all ratings upwards by 1000 points to 'show the superiority of go players over chess players'

gowan · **#99**

It seems to me that one way to compare ratings between chess and go would be to look at the histograms of the two games and match up the percentile levels. Of course that is assuming that the distributions of the ratings of the two games are similar, but on the face of it I would expect the distributions should be similar. There would be some technical issues, such as that the world Elo chess ratings include the pro grandmasters while go ratings such as AGA or European amateur associations do not include pros.

And · **#100**

New b60 net https://media.katagotraining.org/upload ... 314.bin.gz

KataGo Distributed Training and new networks

Who is online