The two statements:
"AI says Black is about 2 points ahead"
"AI would be be unsure which side it prefers if Black were given a penalty of about 2 points"
...can mean the same thing if you interpret them one way, but they can also mean different things with a different interpretation. The whole point of the second phrasing is to make it easy for people to gravitate to the most accurate interpretation. (Let's brush under the rug for the moment that due to how MCTS works, even the second statement isn't exactly accurate, such as getting winrates and scores with mismatching signs. The second statement is a good enough approximation for now).
The first statement sounds like it is a claim of an objective fact about Black's lead. But in what sense? Is it a claim about the game-theoretic optimal value of the position? Not really, in general we have no idea what the optimal value of an arbitrary 19x19 game position is, and likely AI is far enough from optimal that even if we knew it wouldn't necessarily even be useful. For example, it's quite plausible that there are positions where two equally-matched top bots would win more as White, achieve a positive average score as White, prefer White (at their levels), but where the game-theoretic optimal score is positive for Black. The absoluteness of the statement also makes it easy for the listener to forget this fact that different players may value a position differently, and leaves ambiguous with respect to what standard the score is being measured.
The absoluteness of the statement also immediately leads to the question of how certain that statement is. What is the chance it is "wrong", or what is the range of uncertainty? It leads one to also wonder why bots mostly don't provide meaningful confidence ranges on these scores, and hides the fact that asking for confidence ranges in the first place is sort of a category error (and the natural question being a category error and thereby nonsensical is partly why they are hard to provide).
The second statement is much harder to misinterpret:
* It's immediately more clear that "2 points" is not an objective claim about the position itself, it's a claim about the bot's subjective
preference in that position if 2 points were lost.
* It's immediately more clear what it's not: It's not the game-theoretic value. It might not be what average score would result if you *actually* took the bot and played a million full games with itself from that position (the bot's preference may or may not not match up with this kind of rollout). It's not necessarily how
you should value the position, but it might be, you can be smart about it (e.g. are you talking about pro-level play or amateur-level play from that position? Is that score predicated on an inhumanly precise sequence of play or does anything work? Does the bot seem to have any blind spots in upcoming tactics, etc.).
* It is clearer now why asking for a confidence range on that particular specific score output "Black +2" is sort of a category error. How likely is it to be correct? It's correct. This is the number that reflects that bot's preference when it was run with those settings on that hardware for that amount of time with that random seed. It's correct because it was intended to be a measure of the bot's preference in that instance, and it was indeed a measure of the bot's preference in that instance. What is the confidence range on it? 0, I guess. Either that, or else the question is categorically nonsensical. Or else if we want to talk about average error with respect to something else (e.g. the average score difference that a random 1 dan human amateur would achieve a 50% win chance at, or perhaps the amount such a player would win by on average, or the bot's own new judgment after 10 self-play turns), we need to specify what that something else is.
I hope that made sense. I guess this is all off topic though for the thread.
