This 'n' that

Bill Spight · Post by **Bill Spight** » Thu Aug 01, 2019 6:49 pm

Thanks, Ed.

What is the result?

EdLee · Post by **EdLee** » Thu Aug 01, 2019 8:53 pm

Hi Bill

Bill Spight · Post by **Bill Spight** » Thu Aug 01, 2019 9:21 pm

Hi, Ed.

What else?

EdLee · Post by **EdLee** » Thu Aug 01, 2019 9:52 pm

Hi Bill,

Bill Spight · Post by **Bill Spight** » Fri Aug 02, 2019 2:30 am

Hi, Ed.

What else?

lightvector · Post by **lightvector** » Fri Aug 02, 2019 6:45 pm

Bill Spight wrote:Hi, Ed.

What else?

Bill Spight · Post by **Bill Spight** » Fri Aug 02, 2019 7:50 pm

Hi, lightvector.

That's certainly an important variation.

Bill Spight · Post by **Bill Spight** » Sun Aug 04, 2019 1:43 am

OK, here we go.

Click Here To Show Diagram Code: [go]$$Wc Approach ko $$ ------------------ $$ | . 6 . X . . . . . $$ | 4 O 1 X . . . . . $$ | 5 2 3 X . . . . . $$ | . O X , . . . . . $$ | . O X . . . . . . $$ | . O X . . . . . . $$ | . O X . . . . . . $$ | X X X . . . . . . $$ | . . . . . . . . .[/go]

Ed's play,

, is correct. Then

-

makes an approach ko.

Click Here To Show Diagram Code: [go]$$Wc Direct ko $$ ------------------ $$ | . 5 . X . . . . . $$ | 4 O 1 X . . . . . $$ | 7 2 3 X . . . . . $$ | 6 O X , . . . . . $$ | . O X . . . . . . $$ | . O X . . . . . . $$ | . O X . . . . . . $$ | X X X . . . . . . $$ | . . . . . . . . .[/go]

makes a direct ko, which on average is not as good as the approach ko.

Click Here To Show Diagram Code: [go]$$Wc White dies $$ ------------------ $$ | 9 5 . X . . . . . $$ | 4 O 1 X . . . . . $$ | 8 2 3 X . . . . . $$ | 6 O X , . . . . . $$ | 7 O X . . . . . . $$ | . O X . . . . . . $$ | . O X . . . . . . $$ | X X X . . . . . . $$ | . . . . . . . . .[/go]

This is lightvector's line.

Click Here To Show Diagram Code: [go]$$Bcm10 White dies, continued $$ ------------------ $$ | O O . X . . . . . $$ | . O O X . . . . . $$ | 1 3 O X . . . . . $$ | . O X , . . . . . $$ | O O X . . . . . . $$ | . O X . . . . . . $$ | 2 O X . . . . . . $$ | X X X . . . . . . $$ | . . . . . . . . .[/go]

is the vital point.

makes a one point eye. But then

makes a Golden Cock Stands on One Leg shape, which White cannot take because of damezumari.

----

My most difficult tsumego, I think. It was inspired by a problem by emeraldemon. See viewtopic.php?t=2271

EdLee · Post by **EdLee** » Mon Aug 05, 2019 12:02 am

Thanks, Bill.

Bill Spight · Post by **Bill Spight** » Mon Aug 19, 2019 8:42 am

On winrate estimates, territory estimates, margins of error, and the last play

OC, as humans we are used to territory estimates, but we are past the hype about how top bots think differently, and better than humans, in some mysterious way about the probability of winning the game. Unless we are talking about certain situations such as the 5x5 board, where we know that the probability is 100% that Black wins with perfect play, and even reasonably good play, or the late endgame where we can figure out perfect play, or a pro vs pro game where one player leads by, say, 50 pts. and the largest play gains 10 pts., there is no a priori knowable probability of winning the game. A posteriori, we could have a position played to the end many times by certain players, or by players of comparable levels, and get winrate estimates thaty way, but we do not know how well those winrates would generalize, and to whom. In general, as the skill of the players decreases towards random play, the winrates get closer to 50%. And the bots do not estimate winrates in that manner, anyway. The mystery of winrates is baked into the cake. We really do not know enough about the factors involved. Perhaps there will be a Ph.D. dissertation about winrates in the near future.

(BTW, I have found another example where Elf is way wrong about the value of a play by a top player — Dosaku in this case. More later.

)

My purpose here is not to casts doubt on winrate estimates. They are useful. It was the hype that got me started, but that has pretty well blown over. One problem that still remains is that of their margins of error. If a top bot estimates, given sufficient playouts — and we don't know how many that is, either —, that one play has a winrate 10% worse than that of the bot's top choice, we can be pretty sure that it is a mistake. OTOH, if the winrate estimate is only 2% worse, we have little assurance that it is an error. I have recently downsized my margin of error for Elf to 4%, but that is still an educated guess. Nobody has worked out the margins of error for winrate estimates, and I doubt if anybody is going to do so anytime soon. The margin of error may be important for a human attempting to interpret winrate estimates, but any bot that picks a play with a smaller winrate estimate, given sufficient playouts, is likely to play worse. And today's bots are written to win games, not analyze positions.

Now, when we can actually work out territory estimates, we can determine the margins of error. For example, if a gote gains 5 pts., its margin of error is 5 pts., as well, since we do not know who will make the play. Assuming correct play, that is. If the players make mistakes, the margin of error could be greater. But the gain is not a territory estimate, it is something that we find out when we make the estimate. Now, some bots make territory estimates as well as winrate estimates. This is good, but, AFAIK, they do not yet estimate the margin of error of the territory estimates. In terms of the whole board the gain from making the largest gote or reverse sente is the temperature. If we are going to use territory estimates, we need temperature estimates, as well.

That brings me to the topic of the last play. If I am 1 pt. behind and make a play that gains 3 pts., then I am 2 pts. ahead. The opponent might still win. But if my play was the last play of the game, then I win. Such a situation would be unusual, because the temperature would drop from 3 to 0, and such a large temperature drop is unusual in go. The average drop in temperature between moves is less than 1 pt. It is probably less than 0.1 pt. But larger temperature drops do occur. For instance, suppose that after my play the temperature dropped by 2 pts., i.e., to 1 pt. Then I would still (very likely) win, since I would be 2 pts. ahead and the best my opponent could do would be to gain 1 pt., not enough to catch up. (A very unusual ko situation could still give her the win, since the margin of error for ko positions is greater than their temperature.) The play just before a significant temperature drop is also called a last play.

In fact, one of the traditional dogmas of go is that of getting the last big play of the opening. Now, what that play is is not well defined, but good players can usually sense it, and sense the related temperature drop, as well. Unless the bots prove that it is hokum, which I don't think they will.

In fact, I have found an example where I think the bots back up the idea of a significant temperature drop in the opening.

It has to do with the 5-3 approach to the 3-4 point.

Now, humans have known, or at least strongly suspected, that the 5-3 approach to the 3-4 point is not as big, as a rule, as the original 3-4 play itself. Certainly by the 19th century the idea was that, usually at move 4, White should play the 5-3 approach to a 3-4 stone before occupying an empty corner, even though occupying the empty corner was better objectively, because White needed to complicate the game to overcome Black's advantage. Today, with komi, the empty corner beckons, although approaching a 3-4 stone, even at move 2, is not unknown. Writing in the mid-20th century, even Takagawa could not unequivocally say that the approach at move 2 was a mistake. Obviously, the 3-4 makes more territory, on average, but the 5-3 has more influence towards the center and the side. Which is better? Probably the 3-4, but quien sabe?

In the 17th century the 5-3 approach to the 3-4 stone was common at move 2. Did the players think that the 5-3 stone was objectively not quite as good? Maybe so, but Dosaku played a number of games as White where he played the 5-3 in all four corners, playing it as the first play in empty corners. Did he think that the 5-3 was objectively as good as, or better than, the 3-4? Obviously, he was extremely skilled at utilizing the influence of the 5-3, but would he have played it to occupy an empty corner if he were playing against himself?

Well, Elf has an opinion, expressed in terms of winrates. What does Elf say?

In a game against Yasui Chitetsu (GoGoD 1671-08-25a) Dosaku played the 5-3 approach as

against Chitetsu's 3-4

, a very common opening at the time. Elf estimates that the approach loses 5½% versus a 4-4 play in an empty corner. (I don't regard winrate estimates as precise enough to warrant reporting tenths of a point difference near 50%. Half point precision is good enough, IMHO.

) Next, Chitetsu played

as a two space pincer against

, which was also common back then. Elf regards

as a 4% winrate error. (Within decades human players had dropped the

pincer, which indicates that they also had come to regard it as an error. When both bots and humans think a play is a mistake, it probably is.

) Dosaku played

on the 5-3 in the adjacent corner closest to

. Elf considers it a 7% error. OK, Elf considers the 5-3 to be a mistake, whether as an approach to the 3-4 or as the first play in an empty corner. What does this have to do with the last play, if anything?

OK. Today's bots consider the corners to be worth more, by comparison with the sides, than humans. In the late 20th century we were starting to see humans devalue the sides by a little bit. For instance, the sanrensei was devalued, but the nirensei was still considered good. Even today, the bots like the nirensei.

But we see plays on the side that top humans played without a second thought regarded as losing 10% by today's top bots. Shoulder hits, side attachments, or other plays against enclosures are usually considered to be bigger than extensions on the side. This represents a big difference in opening theory. IOW, the temperature of the corners remains hotter than the temperature of the sides for longer than we humans have thought. A temperature drop is coming up.

GoGoD 1665-00-00a, Aoki Guseki (W) vs. Dosaku.

plays the 5-3 approach instead of occupying the last empty corner. Elf estimates a winrate loss of 6½%.

GoGoD 1667-12-05b, Castle Game, Honinbo Doetsu (W) vs. Yasui Chitetsu.

is a 5-3 approach, estimated loss of 5½%,

plays on the 3-4 in an open corner.

approaches on the 5-3. Estimated loss: only 2%.

(But there are two empty corners.)

GoGoD 1669-07-16, Dosaku (W) vs. Doetsu.

is a 5-3 approach. Estimated winrate loss: 6½%.

is a 5-3 approach. Estimated winrate loss: 2%. (Two empty corners.)

is a 5-3 approach instead of occupying the last empty corner. Estimated winrate loss: 7½%.

If I were writing an article or thesis, I would OC, examine many instances, either of actual games, or of computer generated positions. And I have looked at more games than I report here. The number of empty corners seems to matter to the winrate loss estimate of the 5-3 approach. Here is my hypothesis as to why.

Winrate loss estimates depend, not only upon the play made, but upon the alternative, presumably best, play. The value of the 5-3 approach in each corner is approximately the same in each case, I assume. Then the difference in winrates reflects the difference in the value of occupying an empty corner, assuming that that is the best play. When there is only one empty corner, that difference is around 6½% in terms of winrates. But when there are two empty corners, they are miai, if not exactly so. And then the difference is pretty much the loss in the corner of the 5-3 approach versus the play after the two corners are occupied, which comes to around 2%. The difference of around 4½% reflects a temperature drop after the last empty corner is occupied. Occupying the last empty corner is significant.

When there are three empty corners, there is some uncertainty about who will get to occupy the last empty corner, at least as bots calculate winrates. That uncertainty reduces the winrate estimate of the loss of the 5-3 approach by around 1½%.

OC, if I did the research I could get better estimates, and there may be other factors to consider.

But I think these results are suggestive. There does seem to be a last play effect in the opening, namely occupying the last empty corner. It comes earlier than humans have thought, but there may be another significant temperature drop a bit later on at the threshold of the middle game, and yet another at the cusp of the endgame.

lightvector · Post by **lightvector** » Mon Aug 19, 2019 9:31 pm

Bill, regarding winrates specifically, when you say you want a margin of error, presumably you are talking about the error in the bot's estimate relative to something. What precisely is that something?

Obviously it's not "theoretical perfect play", because under perfect play the position must either be entirely won or entirely lost, so the true winrate will either be 100% or it will be 0%. In that case, the error in a winrate estimate like 60% would of course be precisely either 60% or 40%, and it would be generally be impossible to determine which.
Is is "the probability that the bot would win from here against itself, using the actual self-play settings and parameters used in training?". Well that could be either 100% or 0% too! Because it is not atypical for bots to only randomize self-play early in the game, for the rest of the game they might actually just deterministicly always choose the move that got the best search results. Or maybe they may randomize just a little too, in which case it might not be exactly 100% or 0%, but could still vary wildly depending sensitively on the details. And these details don't actually matter much! The neural net during training sees pretty much the same thing either way: it sees a game with mostly good moves ending in a win or loss. You're not going to go back and replay exactly that same game again, so it doesn't matter if the later moves were deterministic or not. And it would be weird if what we wanted was an error estimate relative to something that might vary so sensitively with respect to details of training that actually don't matter much.
Is it "the average probability that randomly chosen professional human players would win from here against other randomly chosen pro opponents"? Well in that case the error is going to be often vastly greater than small numbers like 4%, as human pro players routinely lose highly-winning games or win highly-losing games or make other huge swings from strong bots' perspectives. And of course you need to consider possible issues like move A is definitely better than B for bots and the bots are "right" to evaluate it so, maybe it's even better in some "objective" sense, but move B actually leads to better practical chances for a human because relative to human strengths/weaknesses, move A makes it both harder for you and easier for your opponent to handle the resulting fight.
Is it "the winrate that the bot itself will report in the future after more moves are played", with the hopes that with more moves the bot can better judge whether it was 'right' or 'wrong'?". In that case, you need to specify some sort of time horizon. With a way-too-long horizon, of course we're back to 100% or 0%, because that's what it will be at the end of the game. With a very short horizon though, you're measuring short-term fluctuation noise. So you want some intermediate horizon, but what horizon is tricky, as it may take highly variable numbers of moves for the bot to realize, depending on the potential judgment/misjudgment involves a short-term fight or a long-term shape that will only come into play much later in the game. Either way, you still actually need to say what the time horizon you care about is (possibly different for different situations?). And of course, it's not guaranteed that the numbers you get will apply to humans, who have different strengths and biases.
Or maybe you actually do mean the move-to-move fluctuation noise, i.e. you want something like the error with respect to "the winrate that the bot itself will report on the very next move"? That's pretty easy to quantify, but that it doesn't seem like an ideal metric. If the bot rates move A 5% higher than move B, and you play both A and B on the board, the winrate will then fluctuate a bit for each, but the magnitude of that fluctuation isn't necessarily tied to whether A is really a "better move" than B. Similar to earlier, it depends on things like whether it's a short term tactic the bot can realize imminently, or it's a longer-term judgment difference that won't get resolved soon. And of course, here too it's not guaranteed that the numbers you get will apply to humans.

Or you do mean something else entirely? Apologies if you've explained this already somewhere and I missed it.

Basically, it's kind of hard to think about how one would add an error estimate (or how one would research how to add it) when not sure what precisely that error is supposed to measure in the first place.

EdLee · Post by **EdLee** » Mon Aug 19, 2019 10:32 pm

under perfect play the position must either be entirely won or entirely lost, so the true winrate will either be 100% or it will be 0%.

Probably matters little, if at all, to this point: but how do we know perfect play doesn't always lead to no-result (e.g. triple ko, etc.) ?

Bill Spight · Post by **Bill Spight** » Mon Aug 19, 2019 11:54 pm

lightvector wrote:Bill, regarding winrates specifically, when you say you want a margin of error, presumably you are talking about the error in the bot's estimate relative to something. What precisely is that something?

"That's not my department, says Wernher Von Braun." — Tom Lehrer

Color me oldfashioned, but when I come up with an approximate measure, I am interested in its error function. Now, the inventors of the winrate estimate have good reasons for not providing an error function. For one reason, it's not the only thing they use to choose plays. For another, the number of playouts or visits indicates the degree of confidence in the winrate estimate. For another, for choosing the best play, the order is more important than the absolute value.

{This paragraph may be skipped.} My first foray into game related evaluation was coming up with a point count for Quick Tricks in contract bridge. I took a Chebyshev approach and minimized the maximum error, given knowledge of the total of the point counts of the two partner's hands and certain assumptions about the play. Starting with the errors is what enabled me to come up with the evaluation function.

(I had not assumed a point count, it just worked out that way.

)

However, human reviewers are obviously interested in winrate estimates and, from my point of view, are hampered by the lack of error estimates. If LZ, Elf, or KataGo says that my play has a winrate estimate 2% lower than its first choice, does that mean that my play was a mistake? It is apparent from reading reviews that some people even think that a difference of ½% is significant (in the playing sense, not the statistical sense), something that strikes me as absurd. There are other questions that I have, as an analyst, but this is a basic question that human reviewers have, but they have no guidance in the matter.

Now, it would be possible to use the data from bots to come up with margins of error, however defined, but 1) it would take a good bit of time and effort, 2) you would have to make assumptions that people could challenge, 3) the landscape keeps changing as bots improve and new methods may be devised. Look at the exciting progress of chess engines, several years after they got better than humans. Those who devise bots have not provided margins of error, and I doubt that they will, any time soon. Perhaps some academic will do the research.

Is is "the probability that the bot would win from here against itself, using the actual self-play settings and parameters used in training?".

If I understand dfan correctly, that's pretty much the idea, But that's not how the estimates are derived.

Or maybe you actually do mean the move-to-move fluctuation noise, i.e. you want something like the error with respect to "the winrate that the bot itself will report on the very next move"?

That's pretty much the reinforcement learning approach, isn't it? A winrate estimate estimates the winrate estimate after the next move is played. But, IIUC, that is not tested directly, either. Rather the test is how well the bot plays the whole game, not how well it evaluates each position or play. It is a player, not an analyst.

That's pretty easy to quantify, but that it doesn't seem like an ideal metric.

True enough. But when you see a winrate estimate with 700 playouts and after the next play, which is the bot's first choice, the new winrate estimate differs by 2% with 12,000 playouts, you have to suspect that the margin of error with 700 playouts is at least 2%.

A few years ago, with a little cleverness I compared winrate estimates for Leela 11 with 100k playouts per position (not each option) versus 200k playouts where I could argue that the difference between the two was not random, but the result of evaluation errors with 100k playouts, and came up with a minimum margin of error of around 3%. Nowadays, OC, who cares about Leela 11's margin of error?

And of course, here too it's not guaranteed that the numbers you get will apply to humans.

True enough. But if a bot's winrate margin of error is 3% with superhuman play, surely it is larger when applied to human play.

Basically, it's kind of hard to think about how one would add an error estimate (or how one would research how to add it) when not sure what precisely that error is supposed to measure in the first place.

Sure. The researcher has to specify what he means. Hard to do when the developers talk as though there were such a thing as the probability of winning the game. You have to make assumptions, and the developers may not even know what the assumptions are. Or maybe they don't want to say.

ez4u · Post by **ez4u** » Tue Aug 20, 2019 2:44 am

Could someone explain the relationship between the lower confidence bounds (LCB) and upper confidence bounds (UCB) and the winrate? I have naively thought that the change in the use of LCB in LZ 0.17 was in a sense a conservative adjustment for the degree of uncertainty in the winrate. Is this completely off base?

Tryss · Post by **Tryss** » Tue Aug 20, 2019 4:29 am

Bill Spight wrote:Sure. The researcher has to specify what he means. Hard to do when the developers talk as though there were such a thing as the probability of winning the game. You have to make assumptions, and the developers may not even know what the assumptions are. Or maybe they don't want to say.

For bots like LZ, the winrate given by the network is an interpolation (for this position) based on the results of positions encountered in self play by previous networks.

Basically, you feed the algorithm positions and results, and it fit a function (the network) to these datapoints. Then you apply this function to all the positions you encounter

Playouts just apply this function to positions further in the tree, and the "final winrate" is the winrate of the last position in the "best line" (if I'm not mistaken)

Life In 19x19

This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that

Re: This 'n' that