Can We Stop Calling Kata "scoreMean" Points?

For discussing go computing, software announcements, etc.
dfan
Gosei
Posts: 1598
Joined: Wed Apr 21, 2010 8:49 am
Rank: AGA 2k Fox 3d
GD Posts: 61
KGS: dfan
Has thanked: 891 times
Been thanked: 534 times
Contact:

Re: Can We Stop Calling Kata "scoreMean" Points?

Post by dfan »

lightvector wrote:So, my thought is to try to make KataGo estimate B instead. And, I could also continue estimating A too, but it would be extra overhead in the search to carry both around, so my inclination is to just not have A once we have B. Unless people think it should keep reporting both? Thoughts?
I think it's worth calculating and exposing both values at least for a little while, because the comparison between them in different sorts of situations could provide some insight. I do agree that B seems more meaningful and useful, though.
Yakago
Dies in gote
Posts: 53
Joined: Tue Jan 16, 2018 10:39 am
GD Posts: 0
Has thanked: 2 times
Been thanked: 12 times

Re: Can We Stop Calling Kata "scoreMean" Points?

Post by Yakago »

I would say that it's a bit 'bloaty' to have two score estimates. Even if it could provide insight in some situation

I think the 'B' version is to be preferred, and would 'solve' this issue up to the inaccuracy of the network.

I think it should be understable that the 'points' we see is based on the preferred line of play, and during analysis we would be able to see that the two lines of play differ in winrate and points.
TelegraphGo
Lives with ko
Posts: 131
Joined: Sat Oct 05, 2019 12:32 am
Rank: AGA 4 dan
GD Posts: 0
Universal go server handle: telegraphgo
Has thanked: 1 time
Been thanked: 18 times

Re: Can We Stop Calling Kata "scoreMean" Points?

Post by TelegraphGo »

Marcel Grünauer wrote:
lightvector wrote:Suppose you play the bot against itself 100 times and you find that on average it loses by 20 points in some position (winning a few games barely, losing most games by a lot). Suppose that 20 points was precisely what the bot had given as its "final score difference estimate" in that position. Great, right?

Suppose you dig further into the example and determine that actually, if the bot had just played move X, it would lose only by about 4 points - the resulting endgame is stable, and although it's not clear how to play it exactly optimally, it's highly clear that it's not going to vary by more than +/- 1 point under any reasonable lines of play. If you had 4 more points, then you'd have 50-50 winning chances playing move X. And the bot also agrees. The *reason* why the bot did not play move X and instead chose Y was that X led to an easy and predictable loss, whereas move Y is a complex and uncertain move that gives some slim winning chances instead of zero, but average seems to lead to a much bigger loss.
Doesn't that mean that a score estimate should be qualified with a probability?

In the example, it would mean "move Y loses the game by 4 points with 100% certainty" (i.e., winrate 0%) and "move X loses the game by 20 points with 50% certainty" and "move X wins by 1 point ('barely') maybe 5% of the time".

Statistics is not my strong suit so I'm sure my example is flawed, but I hope it conveys what I mean.
If you want an AI's opinion for which move is easy for AI to handle in an AI v. AI match, then you shouldn't be looking at KataGo scores. That's literally exactly the metric that percentages are designed to give. ELF, Leela-Zero, and maybe some other AI are (I believe) a little stronger than KataGo, and thus probably better at giving percentages. You should be keeping in mind that none of these AI can tell us how easy a move is for humans to handle.

The way that AI complicates games is different than the way humans complicate games - AI is much more confident in its ability (and thus its opponent's ability) to invade than the typical human, for example. If you want to learn how to create complications that are hard for humans to deal with while losing slightly, KataGo by itself is probably not the way.

KataGo's purpose is to give useful score estimates. I see no need to dilute that, just let KataGo do KataGo's job well. I'm very excited to see the B-style network, and very impressed that lightvector seems to think it won't be that hard to create.
User avatar
spook
Lives with ko
Posts: 151
Joined: Thu Jul 24, 2014 1:34 pm
Rank: 2d
GD Posts: 0
KGS: LordVader
Location: Belgium
Has thanked: 11 times
Been thanked: 48 times
Contact:

Re: Can We Stop Calling Kata "scoreMean" Points?

Post by spook »

lightvector wrote: I think B is more useful.
I agree.
lightvector wrote: So, my thought is to try to make KataGo estimate B instead. And, I could also continue estimating A too, but it would be extra overhead in the search to carry both around, so my inclination is to just not have A once we have B. Unless people think it should keep reporting both? Thoughts?
Out with the old, in with the new.
xela wrote:What software did you use to make these graphs?
It is a preview of the next ZBaduk release. For brevity (to reduce spam here): https://github.com/lightvector/KataGo/issues/57.
Enjoy LeeLaZero and KataGo from your webbrowser, without installing anything !
https://www.zbaduk.com
lightvector
Lives in sente
Posts: 759
Joined: Sat Jun 19, 2010 10:11 pm
Rank: maybe 2d
GD Posts: 0
Has thanked: 114 times
Been thanked: 916 times

Re: Can We Stop Calling Kata "scoreMean" Points?

Post by lightvector »

I'm going to keep both internally, since actually I'm a bit nervous there's a mathematical principledness that would break in the formulation of winloss utility + score utility if simply swapping it out. So the old value will continue to be used in the utility computation (utility is the name for what KataGo aims to maximize, which blends winning and score).

But I'm going to outright replace the "scoreMean" value which is what different GUIs are showing to the user. The old value will be hanging around in an extra new field of kata-analyze if some GUI app really really wants to show it.

The computation of the old value actually is also changing nontrivially due to some architectural changes in the neural net's outputs. The latest test run of KataGo I actually found the value to *underestimate* differences, rather than overestimate it! (Which I guess supports the point of this value not being very stable between different versions).
Gomoto
Gosei
Posts: 1733
Joined: Sun Nov 06, 2016 6:56 am
GD Posts: 0
Location: Earth
Has thanked: 621 times
Been thanked: 310 times

Re: Can We Stop Calling Kata "scoreMean" Points?

Post by Gomoto »

lightvector, it is great that we have you around in this forum and that you give us some views on the inside of your work.
User avatar
spook
Lives with ko
Posts: 151
Joined: Thu Jul 24, 2014 1:34 pm
Rank: 2d
GD Posts: 0
KGS: LordVader
Location: Belgium
Has thanked: 11 times
Been thanked: 48 times
Contact:

Re: Can We Stop Calling Kata "scoreMean" Points?

Post by spook »

lightvector wrote: But I'm going to outright replace the "scoreMean" value which is what different GUIs are showing to the user.
Does it also have an indirect influence on the calculation of the stddev field ?
Enjoy LeeLaZero and KataGo from your webbrowser, without installing anything !
https://www.zbaduk.com
Post Reply