Can We Stop Calling Kata "scoreMean" Points?

dfan · Post by **dfan** » Thu Dec 12, 2019 8:08 pm

lightvector wrote:So, my thought is to try to make KataGo estimate B instead. And, I could also continue estimating A too, but it would be extra overhead in the search to carry both around, so my inclination is to just not have A once we have B. Unless people think it should keep reporting both? Thoughts?

I think it's worth calculating and exposing both values at least for a little while, because the comparison between them in different sorts of situations could provide some insight. I do agree that B seems more meaningful and useful, though.

Yakago · Post by **Yakago** » Fri Dec 13, 2019 1:17 am

I would say that it's a bit 'bloaty' to have two score estimates. Even if it could provide insight in some situation

I think the 'B' version is to be preferred, and would 'solve' this issue up to the inaccuracy of the network.

I think it should be understable that the 'points' we see is based on the preferred line of play, and during analysis we would be able to see that the two lines of play differ in winrate and points.

TelegraphGo · Post by **TelegraphGo** » Fri Dec 13, 2019 3:35 am

Marcel Grünauer wrote:
lightvector wrote:Suppose you play the bot against itself 100 times and you find that on average it loses by 20 points in some position (winning a few games barely, losing most games by a lot). Suppose that 20 points was precisely what the bot had given as its "final score difference estimate" in that position. Great, right?

Suppose you dig further into the example and determine that actually, if the bot had just played move X, it would lose only by about 4 points - the resulting endgame is stable, and although it's not clear how to play it exactly optimally, it's highly clear that it's not going to vary by more than +/- 1 point under any reasonable lines of play. If you had 4 more points, then you'd have 50-50 winning chances playing move X. And the bot also agrees. The *reason* why the bot did not play move X and instead chose Y was that X led to an easy and predictable loss, whereas move Y is a complex and uncertain move that gives some slim winning chances instead of zero, but average seems to lead to a much bigger loss.
Doesn't that mean that a score estimate should be qualified with a probability?

In the example, it would mean "move Y loses the game by 4 points with 100% certainty" (i.e., winrate 0%) and "move X loses the game by 20 points with 50% certainty" and "move X wins by 1 point ('barely') maybe 5% of the time".

Statistics is not my strong suit so I'm sure my example is flawed, but I hope it conveys what I mean.

If you want an AI's opinion for which move is easy for AI to handle in an AI v. AI match, then you shouldn't be looking at KataGo scores. That's literally exactly the metric that percentages are designed to give. ELF, Leela-Zero, and maybe some other AI are (I believe) a little stronger than KataGo, and thus probably better at giving percentages. You should be keeping in mind that none of these AI can tell us how easy a move is for humans to handle.

The way that AI complicates games is different than the way humans complicate games - AI is much more confident in its ability (and thus its opponent's ability) to invade than the typical human, for example. If you want to learn how to create complications that are hard for humans to deal with while losing slightly, KataGo by itself is probably not the way.

KataGo's purpose is to give useful score estimates. I see no need to dilute that, just let KataGo do KataGo's job well. I'm very excited to see the B-style network, and very impressed that lightvector seems to think it won't be that hard to create.

spook · Post by **spook** » Fri Dec 13, 2019 7:45 am

lightvector wrote: I think B is more useful.

I agree.

lightvector wrote: So, my thought is to try to make KataGo estimate B instead. And, I could also continue estimating A too, but it would be extra overhead in the search to carry both around, so my inclination is to just not have A once we have B. Unless people think it should keep reporting both? Thoughts?

Out with the old, in with the new.

xela wrote:What software did you use to make these graphs?

It is a preview of the next ZBaduk release. For brevity (to reduce spam here): https://github.com/lightvector/KataGo/issues/57.

lightvector · Post by **lightvector** » Fri Dec 13, 2019 10:46 am

I'm going to keep both internally, since actually I'm a bit nervous there's a mathematical principledness that would break in the formulation of winloss utility + score utility if simply swapping it out. So the old value will continue to be used in the utility computation (utility is the name for what KataGo aims to maximize, which blends winning and score).

But I'm going to outright replace the "scoreMean" value which is what different GUIs are showing to the user. The old value will be hanging around in an extra new field of kata-analyze if some GUI app really really wants to show it.

The computation of the old value actually is also changing nontrivially due to some architectural changes in the neural net's outputs. The latest test run of KataGo I actually found the value to *underestimate* differences, rather than overestimate it! (Which I guess supports the point of this value not being very stable between different versions).

Gomoto · Post by **Gomoto** » Fri Dec 13, 2019 3:05 pm

lightvector, it is great that we have you around in this forum and that you give us some views on the inside of your work.

spook · Post by **spook** » Sat Dec 14, 2019 6:48 pm

lightvector wrote: But I'm going to outright replace the "scoreMean" value which is what different GUIs are showing to the user.

Does it also have an indirect influence on the calculation of the stddev field ?

Life In 19x19

Can We Stop Calling Kata "scoreMean" Points?

Re: Can We Stop Calling Kata "scoreMean" Points?

Re: Can We Stop Calling Kata "scoreMean" Points?

Re: Can We Stop Calling Kata "scoreMean" Points?

Re: Can We Stop Calling Kata "scoreMean" Points?

Re: Can We Stop Calling Kata "scoreMean" Points?

Re: Can We Stop Calling Kata "scoreMean" Points?

Re: Can We Stop Calling Kata "scoreMean" Points?