Can We Stop Calling Kata "scoreMean" Points?

emerus · Post by **emerus** » Wed Dec 11, 2019 3:16 pm

xela wrote:I really think an example would help.

3rd game I opened: Game here

Not +/-10 but I am not going to look very hard for something that I've seen at least 1/10 of the games I open into KataGo. If you are a user of KataGo and haven't noticed this by now, then you should look for it.

How often do you think professionals in post-AI age actually have such a large (>5 scoreMean) deficit by move 41? KataGo thinks it is like 10% of the time. It is ludicrous to me.

edit:

lightvector wrote:How about I just make the next training run of KataGo include a prediction target that consists of "what number of points would be needed to make the estimated winning chance close to 50-50" rather than "what is the average difference in final points that will result from self-play" and use this prediction as the value to report to users instead?

With some thought, I think I have settled on a training method that I think should be effective for this.

I chose this forum for my plea/rant because I know you are active(also thought about your GitHub). I do think a simple clarity fix would go a long way, though maybe the cat is already out of the bag. Any improvement (especially this one) is also great. ^^

Gomoto · Post by **Gomoto** » Wed Dec 11, 2019 3:32 pm

emerus, the straw man fallacy is that you imply people who are talking about Katago score as "points" are dumb because they are using a certain analogy that does not exist according to you. When in fact talking about points is just a convinient way to compare the values of plays.

I am by the way not offended in any way by your argument, I just think it is wrong.

Gomoto · Post by **Gomoto** » Wed Dec 11, 2019 3:35 pm

lightvector, I think the Katago score is a fine tool as it is.

Gomoto · Post by **Gomoto** » Wed Dec 11, 2019 3:53 pm

The Katago score is a really good indication of how many points score difference there will be at the end of the game if both players make optimal plays (never the case) or similar size mistakes (often the case if both players have a similar strength).

emerus, if you think this is a misleading way to analyze go games with Katago and should be avoided or improved, than I am guilty.

emerus · Post by **emerus** » Wed Dec 11, 2019 4:19 pm

Gomoto wrote:emerus, if you think this is a misleading way to analyze go games with Katago and should be avoided or improved, than I am guilty.

It isn't a misleading way to study. It's a useful, good tool.

Comments like "White is ahead by 3 points on move 6" are misleading. I am not sure how often you (or other forum regulars) interact with 2k-10k players who use KataGo. They usually believe these are points or that KataGo is at least trying to determine points(in the cases where they are aware it isn't an easy/possible task). This is what is misleading.

I honestly do not know where calling them points began, I assume it is because nothing intuitive or catchy was proposed. Endgame scoreMean estimations based of training data is a mouthful and doesn't have quite the ring to it.

edit:

Gomoto wrote:The Katago score is a really good indication of how many points score difference there will be at the end of the game if both players make optimal plays (never the case) or similar size mistakes (often the case if both players have a similar strength).

Had to break this quote after re-reading it. How can you say that KataGo score is a "really good indication" ...? Do you know what optimal plays are? KataGo doesn't. The fact that the scoreMean is from training data and not match data also clearly says that it isn't even trying to use optimal plays to gather this value.

Eh, last point for emphasis. It is a fine tool. It surely beats AI %'s. We can strive for better and at the very least at least make it clearer what exactly the value that the tool is giving you means.

Gomoto · Post by **Gomoto** » Wed Dec 11, 2019 4:28 pm

I understand your argument, but I can not see any kind of "danger" or "harm" right now, that a incomplete or unaccurate concept of points (as we are talking about in this context) could do to a weaker player. (Especially because there is no defined accurate score of the game without playing the variations to the end to determine the score of each possible variation.) But I am open to expand my view here.

I am a teacher by profession and a local politican by the way, perhaps that explains my preference for catchy phrases

Perhaps I should take more care not to act as a go populist

Bill Spight · Post by **Bill Spight** » Wed Dec 11, 2019 5:42 pm

emerus wrote:
Bill Spight wrote: Points do not already exist, in the sense that scores do. They are not scores, nor are they estimates of scores. They have a definition, but are intractable to calculate before the endgame.
They can be calculated before endgame simply(speaking of move values). If you remove an opponent's stone from the board in Chinese rules, you deny them a point and if you capture a prisoner in Japanese rules, you gain a point. It is clear that they are not as intractable as you make it sound.

If that's what you were talking about as points, sure, they may currently exist, but they are even more misleading than KataGo's score estimates, except at the end of the game. Not that they are useless. For instance, if find that you are behind by 10 guranteed points, you have to find 10 points somewhere else. But that does not mean that your opponent has the lead. You may be well ahead.

emerus · Post by **emerus** » Wed Dec 11, 2019 6:07 pm

Bill Spight wrote:
emerus wrote:
Bill Spight wrote: Points do not already exist, in the sense that scores do. They are not scores, nor are they estimates of scores. They have a definition, but are intractable to calculate before the endgame.
They can be calculated before endgame simply(speaking of move values). If you remove an opponent's stone from the board in Chinese rules, you deny them a point and if you capture a prisoner in Japanese rules, you gain a point. It is clear that they are not as intractable as you make it sound.
If that's what you were talking about as points, sure, they may currently exist, but they are even more misleading than KataGo's score estimates, except at the end of the game. Not that they are useless. For instance, if find that you are behind by 10 guranteed points, you have to find 10 points somewhere else. But that does not mean that your opponent has the lead. You may be well ahead.

Misleading? How so? You can observe them objectively. When you observe them and what you do with that data is up to you.

Misleading was chosen in the original post because people are regularly referring to KataGo's score estimation function as points. Points are something that predates KataGo by a few thousand years... sure, the term is not as clear as a pedant would like but it is a term that already has a usage. It is misleading to call something else by that same term, as you've agreed.

Bill Spight · Post by **Bill Spight** » Wed Dec 11, 2019 7:01 pm

emerus wrote:It is misleading to call something else by that same term, as you've agreed.

No, I have not agreed to that. Words typically have more than one meaning. Human language is a wonderful thing. I rather expect that the people you were talking to who talked about points were interested in evaluating go positions. KataGo score estimates do that. In addition, your critique used early opening positions with no points at all, just possibly estimated points. There is nothing wrong with estimated points, and nothing wrong with calling them points, as humans will.

Yakago · Post by **Yakago** » Thu Dec 12, 2019 3:31 am

lightvector wrote:How about I just make the next training run of KataGo include a prediction target that consists of "what number of points would be needed to make the estimated winning chance close to 50-50" rather than "what is the average difference in final points that will result from self-play" and use this prediction as the value to report to users instead?

With some thought, I think I have settled on a training method that I think should be effective for this.

I thought about mentioning this yesterday, but didn't want to push since it seems you already have several good projects/ideas to pursue

- Didn't expect you to come and suggest it yourself !

But definitely that would be a nice addition, if it's feasible to implement. Seems reasonable that you can change the komi and ask if this new output value gives close to 50% for the current network and update weights accordingly

It's a more intuitive number with respect to saying that something is an 'x point mistake' early in the game

Jujube · Post by **Jujube** » Thu Dec 12, 2019 5:20 am

Just me chiming in: I don't find it misleading, and I know that points is a heuristic to quantify the magnitude of ahead/behind. It doesn't directly mean 3 points, here they are, C16, C17, C18, count'em. That would be a stretch to interpret the behaviour in that way, in my opinion.

Uberdude · Post by **Uberdude** » Thu Dec 12, 2019 6:32 am

Bill Spight wrote:
emerus wrote:
Bill Spight wrote: Points do not already exist, in the sense that scores do. They are not scores, nor are they estimates of scores. They have a definition, but are intractable to calculate before the endgame.
They can be calculated before endgame simply(speaking of move values). If you remove an opponent's stone from the board in Chinese rules, you deny them a point and if you capture a prisoner in Japanese rules, you gain a point. It is clear that they are not as intractable as you make it sound.
If that's what you were talking about as points, sure, they may currently exist, but they are even more misleading than KataGo's score estimates, except at the end of the game. Not that they are useless. For instance, if find that you are behind by 10 guranteed points, you have to find 10 points somewhere else. But that does not mean that your opponent has the lead. You may be well ahead.

emerus wrote: Misleading? How so? You can observe them objectively. When you observe them and what you do with that data is up to you.

When you (emerus) talk of points do you mean:
- minimal guaranteed territory (i.e. even if opponent gets the gote endgames in the area you still get these points). I think Myungwan Kim 9p tended to count like this in his videos and called it "confirmed territory".
- expected local territory (i.e. if an endgame move is your sente but opponent's gote you assume you get the sente, if gote for both then split the difference, if ambiguous, or boundaries are not pure endgame but have life and death and aji implications with other areas then very hard)
- expected territory plus a point quantification of the value of influence (e.g. projecting 2 points of territory in front of a wall), which is essentially what I was trying to do in counting the early game position at viewtopic.php?p=243147#p243147, but with simplifying assumptions of similar stones cancelling out so the absolute value is off, just the difference.
- something else?

For example, how many points is a lone 4-4 stone? Or a 3-4 stone? Or a 3-4 5-3 shimari? In terms of guaranteed territory a 4-4 has 0 points. Whilst a 3-3 has maybe 4 points. But in terms of "quantification of value on the same scale as points" as in the third definition a 4-4 is obviously similar to that 3-3 if not a little better.

xela · Post by **xela** » Thu Dec 12, 2019 6:34 am

emerus wrote:
xela wrote:I really think an example would help.
3rd game I opened: Game here

Not +/-10 but I am not going to look very hard for something that I've seen at least 1/10 of the games I open into KataGo. If you are a user of KataGo and haven't noticed this by now, then you should look for it.

How often do you think professionals in post-AI age actually have such a large (>5 scoreMean) deficit by move 41? KataGo thinks it is like 10% of the time. It is ludicrous to me.

OK, thanks! Now you're a much stronger player than me, so probably I'm about to learn something important here. But so far I still feel as though I'm missing something. To me it's not looking all that ludicrous.

For anyone else who wants to check it out: we're looking at this game --

Position at move 41:

Click Here To Show Diagram Code: [go]$$Bc19m41 $$ +---------------------------------------+ $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . X O . O X . . | $$ | . . . . . . . . . . O . . . O X . . . | $$ | . . . O . . . . . X . . X . . . X . . | $$ | . . . . . . . . . . . X . O O . . . . | $$ | . . . . . . . . . . . . . O X X . . . | $$ | . . . . . . . . . . . . . . . . 1 . . | $$ | . . . . . . . . . . . . . . . O . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . O . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . O . . . | $$ | . . X . . . . . . . . . . . X . . . . | $$ | . . . O . . . . . . . . . . . O O . . | $$ | . . X O . . . . . . . . . X . . X . . | $$ | . . X O . X . O . . O . X . . X . . . | $$ | . . X X O O . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ +---------------------------------------+[/go]

At move 40, KataGo on my machine with 20,000 playouts has white just over 7 "points" ahead, with 71% winrate. In other words, white has caught up on the board. We know this is possible because white used to win sometimes in the no-komi era. And recently Bill has posted 26 games where one player is at a 90% winrate during the opening.

Back to our Li-Chen game: KataGo doesn't like move 41, so the "average score" changes to W+8, 72% winrate. (On a small number of playouts it actually says W+9, but the number adjusts past a few thousand playouts.) In the first 40 moves, there's no single move that KataGo thinks is a blunder, it's more a matter of several small "errors" adding up to a white lead. I can see a few black moves that go against Uberdude's descriptions of AI style on these forums -- move 5, black approaching a 4-4 instead of making an enclosure from 3-4; move 7 pincer; move 27 hane, so no surprise that KataGo judges things this way.

The idea that "white has caught up on the board" is something I find useful. Looking at the diagram, I can see that black has territory in three places, whereas all that white has is potential -- a framework on the right, maybe a chance to attack black's stones at the top, and first move at top left. So KataGo is trying to teach me that this potential is almost exactly equal to a certain amount of solid territory.

Then KataGo thinks there's some mistakes by both players in the next few moves after 41, and by move 100, black has caught up again. The rest of the game is pretty dramatic. White does indeed attack the black group at the top, and there's a capturing race in the centre. KataGo thinks white doesn't get enough out of the attack, and at move 122 it's looking like a won game for black. But then if KataGo is to be believed, move 153 is a blunder: black just needed to connect against a peep but tried to be too clever, and it's suddenly a close game again. There are a few more swings back and forth in the early endgame. The final result is W+0.5. Overall an interesting game, thanks for sharing this one!

I tried with some other bots. ELF is known for giving more extreme winrates. But here, ELF says W is up 72% at move 40 and 78% at move 41, still less extreme than Bill's examples. LZ with network number 242 has 76% and 76%: it doesn't think move 41 is bad, but agrees that black has fallen behind earlier. An older, gentler LZ (network 157) has 65% and 67%. They're all telling much the same story.

So what's the misleading bit here? Is it that it looked like white was "miles ahead" yet it ended up as a very close game? Are you saying that the position is even at move 41 and all the AIs are giving us the wrong judgement, you don't think it's likely that black made mistakes early in the game then white made mistakes later? Or are you happy with a 70% or 80% winrate at move 40 but don't like to see this translated into a score difference?

Personally I actually would expect to see large swings in the opening, and more than 1/10 of the time. I suspect that many pros aren't going to be happy playing safe, conventional opening moves all the time. There will often be at least one person at the board who thinks they are stronger than the opponent (or better prepared, or luckier on that day) and that the best way to get the win is to unbalance the game. So you take a risk and depart from the usual patterns -- if it pays off, you secure a massive territory or kill a group, you're +15 or more, the opponent resigns. If it doesn't pay off, you're -15 and you're the one resigning. A lot of games do end by resignation, so it seems obvious that even pros make significant mistakes in well over 10% of their games. Why shouldn't some of those mistakes happen before move 40?

Seriously, these are genuine questions, I'm not trying to criticise you. But it's obvious that your instincts are very different from mine here (and you spoke earlier of "anyone who understands networks or computer programming" -- I've done a fair bit of study on those topics), so I want to see what I can learn from this conversation. Thanks again for replying to my first question and showing us an interesting game.

Uberdude · Post by **Uberdude** » Thu Dec 12, 2019 7:48 am

How many points (of what definition) do we humans think white is ahead at move 41 in that position? I'll have a go at counting, somewhere between the minimal and expected approaches.

Bottom left black I'd say black can expect 9 points, though I know that this corner can often end up as 5-6 points if black resolutely ignores white moves in the area until necessary to live.
Bottom right I assume white gets sente s5 descend and hane connect, and m2 kosumi. 12 points.
Top right is black q19 sente or should we assume white can get r19 yose? (but then black o19 is sente and as white might be wanting to connect up there wouldn't do that. So I will count as q19 sente for 2 extra points for black in corner so 10 points.
If we expect white to spend her sente on stopping l17 dying then black there is no points.

Black total 31 points.

Click Here To Show Diagram Code: [go]$$Bcm41 $$ +---------------------------------------+ $$ | . . . . . . . . . . . . . . . . . S S | $$ | . . . . . . . . . . . . X O . O X S S | $$ | . . . . . . . . . . O . . . O X . S S | $$ | . . . O . . . . . X . . X . . . X S S | $$ | . . . . . . . . . . . X . O O . . S S | $$ | . . . . . . . . . . . . . O X X . . . | $$ | . . . . . . . . . . . . . . . . 1 . . | $$ | . . . . . . . . . . . . . . . O . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . O . . . | $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . . . . O . . . | $$ | . . X . . . . . . . . . . . X . . . . | $$ | S . . O . . . . . . . . . . . O O . . | $$ | S S X O . . . . . . . . . X . . X . . | $$ | S S X O . X . O . . O . X . . X S . . | $$ | S S X X O O . . . . . . . S S S S S S | $$ | S S . . . . . . . . . . . . S S S S S | $$ +---------------------------------------+[/go]

For white how to count the lone 4-4? 6 points is a rule-of-thumb I've come across before and seems vaguely reasonable and it's about half of the 12 points a small shimari could be counted as (but that's not counting the value of its influence) . It is of course 0 minimal territory.
Lower side is about 6 points. It has potential to be more towards centre, but also to be less if black does mean things inside using n3. My feeling is if we also want to quantify the value of influence it should be a little more than 6, maybe 8 say. There is a cut at e3 though. So let's stick to 6.
Right side assume black gets s4 sente descend yose. If black gets a-d in sente then it's about 6 points, but if white gets to block r12 then white gets maybe 6 more. But slide is not really sente for black for a while, so it's not black privilege but more like a mutual gote which would halve the 6 as current value, but it's more senteish for black so let's give white a third of the 6. 8 points total.
Upper side 0 points.
6+6+8+7.5komi = 27.5

Click Here To Show Diagram Code: [go]$$Bcm41 $$ +---------------------------------------+ $$ | . . . . . . . . . . . . . . . . . . . | $$ | . . . . . . . . . . . . X O . O X . . | $$ | . . . . . . . . . . O . . . O X . . . | $$ | . . . O . . . . . X . . X . . . X . . | $$ | . . . . . . . . . . . X . O O . . . . | $$ | . . . . . . . . . . . . . O X X . . . | $$ | . . . . . . . . . . . . . . . . 1 . . | $$ | . . . . . . . . . . . . . . . O . . . | $$ | . . . . . . . . . . . . . . . . . a . | $$ | . . . . . . . . . . . . . . . . b c . | $$ | . . . . . . . . . . . . . . . O . d . | $$ | . . . . . . . . . . . . . . . . S S S | $$ | . . . . . . . . . . . . . . . O S S S | $$ | . . X . . . . . . . . . . . X . . . . | $$ | . . . O . . . . . . . . . . . O O . . | $$ | . . X O . . . . . . . . . X . . X . . | $$ | . . X O . X . O . . O . X . . X . . . | $$ | . . X X O O . S S S . . . . . . . . . | $$ | . . . . . . S S S . . . . . . . . . . | $$ +---------------------------------------+[/go]

So in terms of a minimal-to-expected territory count I get black 4.5 ahead. I think the error bars on my counting like this are about +/- 5. There's also the relative strengths of the 3 weak groups on the top side to factor in, the potential of the 4-4 (e.g. if white next moves are m18 n17 j17 then maybe a handful of white points start to appear on top side). So nothing terribly conclusive.

spook · Post by **spook** » Thu Dec 12, 2019 8:19 am

Just one thing I would like to add to this discussion:

Unlike a winrate, the score estimation shouldn't anticipate risks too much in my opinion.
Because if it does, the perfect estimation may always lead to extremely small margins like B+0.5.
After all, if you anticipate risks, half a point is enough to win.

Life In 19x19

Can We Stop Calling Kata "scoreMean" Points?

Re: Can We Stop Calling Kata "scoreMean" Points?

Re: Can We Stop Calling Kata "scoreMean" Points?

Re: Can We Stop Calling Kata "scoreMean" Points?

Re: Can We Stop Calling Kata "scoreMean" Points?

Re: Can We Stop Calling Kata "scoreMean" Points?

Re: Can We Stop Calling Kata "scoreMean" Points?

Re: Can We Stop Calling Kata "scoreMean" Points?

Re: Can We Stop Calling Kata "scoreMean" Points?

Re: Can We Stop Calling Kata "scoreMean" Points?

Re: Can We Stop Calling Kata "scoreMean" Points?

Re: Can We Stop Calling Kata "scoreMean" Points?

Re: Can We Stop Calling Kata "scoreMean" Points?

Re: Can We Stop Calling Kata "scoreMean" Points?

Re: Can We Stop Calling Kata "scoreMean" Points?

Re: Can We Stop Calling Kata "scoreMean" Points?