MikeKyle analyses Hoshi, low approach, low 1 space pincer

Create a study plan, track your progress and hold yourself accountable.
sorin
Lives in gote
Posts: 389
Joined: Wed Apr 21, 2010 9:14 pm
Has thanked: 418 times
Been thanked: 198 times

Re: MikeKyle analyses Hoshi, low approach, low 1 space pince

Post by sorin »

MikeKyle wrote:Elf really dislikes jumping out a lot of the time. It's a mistake (>3%) on 58% of occasions when it was played by 9d pros. Elfs response is a close call between 'a' and 'b':
Click Here To Show Diagram Code
[go]$$W
$$ --------------------+
$$ . . . . . . . . . . |
$$ . . . . . . . . . . |
$$ . . . . . . . . . . |
$$ , . . b a . X . . . |
$$ . . . . . . . . . . |
$$ . . . . . 1 . O . . |
$$ . . . . . . . . . . |
$$ . . . . . . . X . . |
$$ . . . . . . . . . . |
$$ . . . . . . , . . . |
$$ . . . . . . . . . . |[/go]
'a' is the common human move. It was probably among the first 'joseki' moves I learned. Elf broadly seems to think that it's fine - it's rarely much of a mistake at all.
'b' is one that personally I wasn't aware of at all before AI (I don't have my game database to hand to check if go seigen played it 80 years ago).

When I've briefly looked at AI playouts or pro games where it features, I haven't really got a flavour of the different features of 'b' compared to 'a'. Do any stronger players have any ideas of what the positives and negatives are with this choice?
I don't pretend to have an answer to your original question (besides the intuition that bots may prefer the looser extension instead of the human one space jump for efficiency reasons, just like they prefer looser corner enclosures while humans used to play tighter ones).

However I find this position very interesting for the following reason: humans tend to follow up with "a" in the following diagram (giving black the choice of either peacefully living on the right side while letting white develop thickness in the center, or fight by pushing on the 4th line and cut), while AlphaGo and LeelaZero strongly prefer white "b" instead, forcing a fight. Maybe bots prefer that since it allows fewer options to the opponent?

This seems to be another blind spot in human positional evaluation.
Click Here To Show Diagram Code
[go]$$W
$$ --------------------+
$$ . . . . . . . . . . |
$$ . . . . . . . . . . |
$$ . . . . . . . . . . |
$$ , . . . X . X . . . |
$$ . . . . . . . . . . |
$$ . . . . . O . O . . |
$$ . . . . . . . . . . |
$$ . . . . . . . X . . |
$$ . . . . . . a . . . |
$$ . . . . . . , b . . |
$$ . . . . . . . . . . |[/go]
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: MikeKyle analyses Hoshi, low approach, low 1 space pince

Post by Bill Spight »

sorin wrote:This seems to be another blind spot in human positional evaluation.
Click Here To Show Diagram Code
[go]$$W
$$ --------------------+
$$ . . . . . . . . . . |
$$ . . . . . . . . . . |
$$ . . . . . . . . . . |
$$ , . . . X . X . . . |
$$ . . . . . . . . . . |
$$ . . . . . O . O . . |
$$ . . . . . . . . . . |
$$ . . . . . . . X . . |
$$ . . . . . . a . . . |
$$ . . . . . . , b . . |
$$ . . . . . . . . . . |[/go]
Human blind spot? Do top bots have a decided preference one way or another? I.e., > 3% for LZ, >5% for Elf?
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
sorin
Lives in gote
Posts: 389
Joined: Wed Apr 21, 2010 9:14 pm
Has thanked: 418 times
Been thanked: 198 times

Re: MikeKyle analyses Hoshi, low approach, low 1 space pince

Post by sorin »

Bill Spight wrote:
sorin wrote:This seems to be another blind spot in human positional evaluation.
Click Here To Show Diagram Code
[go]$$W
$$ --------------------+
$$ . . . . . . . . . . |
$$ . . . . . . . . . . |
$$ . . . . . . . . . . |
$$ , . . . X . X . . . |
$$ . . . . . . . . . . |
$$ . . . . . O . O . . |
$$ . . . . . . . . . . |
$$ . . . . . . . X . . |
$$ . . . . . . a . . . |
$$ . . . . . . , b . . |
$$ . . . . . . . . . . |[/go]
Human blind spot? Do top bots have a decided preference one way or another? I.e., > 3% for LZ, >5% for Elf?
By "human blind spot" I meant pros only seem to play the counter-attack at "b" in special circumstances (in the pre-2016 games), and they seem to think "a" is the natural move; while bots seems to think "b" is the natural move.

The win-rate deltas are pretty small, see details embedded in the screenshots below; for LeelaZero the number of visits for "b" (which is what decides which candidate move is chosen) is overwhelming, compared to "a", in both cases I looked at.
AlphaZero gives "b" 2% more compared to "a", but it doesn't publish the number of visits; and it only has one of the two positions I looked at in their online tool database.
Attachments
I also tried the kakari oriented differently - AlphaGo doesn't have this position in their online tool; LeelaZero strongly prefers the counter-attack again (by number-of-visits).
I also tried the kakari oriented differently - AlphaGo doesn't have this position in their online tool; LeelaZero strongly prefers the counter-attack again (by number-of-visits).
lz2.png (462.29 KiB) Viewed 17365 times
For the same position as the one compared with AlphaGo, LeelaZero rates "b" only very slightly higher than "a" winrate-wise, but looking at the number of visits we can see that it will "almost never" play "a".
For the same position as the one compared with AlphaGo, LeelaZero rates "b" only very slightly higher than "a" winrate-wise, but looking at the number of visits we can see that it will "almost never" play "a".
lz.png (470.23 KiB) Viewed 17365 times
AlphaGo rates "b" almost 2% higher than "a" in this case.
AlphaGo rates "b" almost 2% higher than "a" in this case.
alphago.png (259.45 KiB) Viewed 17365 times
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: MikeKyle analyses Hoshi, low approach, low 1 space pince

Post by Bill Spight »

sorin wrote:
Bill Spight wrote:
sorin wrote:This seems to be another blind spot in human positional evaluation.
Click Here To Show Diagram Code
[go]$$W
$$ --------------------+
$$ . . . . . . . . . . |
$$ . . . . . . . . . . |
$$ . . . . . . . . . . |
$$ , . . . X . X . . . |
$$ . . . . . . . . . . |
$$ . . . . . O . O . . |
$$ . . . . . . . . . . |
$$ . . . . . . . X . . |
$$ . . . . . . a . . . |
$$ . . . . . . , b . . |
$$ . . . . . . . . . . |[/go]
Human blind spot? Do top bots have a decided preference one way or another? I.e., > 3% for LZ, >5% for Elf?
By "human blind spot" I meant pros only seem to play the counter-attack at "b" in special circumstances (in the pre-2016 games), and they seem to think "a" is the natural move; while bots seems to think "b" is the natural move.

The win-rate deltas are pretty small, see details embedded in the screenshots below; for LeelaZero the number of visits for "b" (which is what decides which candidate move is chosen) is overwhelming, compared to "a", in both cases I looked at.
AlphaZero gives "b" 2% more compared to "a", but it doesn't publish the number of visits; and it only has one of the two positions I looked at in their online tool database.
Thanks. :) I wonder if both human and bot preferences are the result of path dependency, of different sorts: historical for humans, computational for bots. IIRC, in chess Emmanuel Lasker said if you find a good move, look for a better one. MCTS bots, it seems, don't do that so much.

Playing around with Deep Leela, when I compare plays I try to make the number of playouts more equal by forcing the program to make each play, rather than comparing the original winrate estimates. I played out the main variations on AlphaGo Teach until tenuki (a 3-3 corner invasion in both cases) and had Deep Leela make the same plays. Both programs preferred the pincer, Deep Leela by 2%, AlphaGo by 1% (down from an original 2% preference). Even if the bots started out trying both plays fairly equally, the results could favor the pincer to a moderate depth, even if only by a small amount, and it would make sense not to waste time searching the variations starting with "a".
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
User avatar
MikeKyle
Lives with ko
Posts: 205
Joined: Wed Jul 26, 2017 2:27 am
Rank: EGF 2k
GD Posts: 0
KGS: MKyle
Has thanked: 49 times
Been thanked: 36 times

Re: MikeKyle analyses Hoshi, low approach, low 1 space pince

Post by MikeKyle »

Thanks Sorin, Bill Spight for your thoughts.
sorin wrote: ..
AlphaZero gives "b" 2% more compared to "a"
..
Are you referring to the original alphago teaching tool? I was under the impression that the teaching tool was based on AlphaGo Master ie. somewhere around the version that beat Ke Jie. I'd love to be proven wrong, but I didn't think that we had any resources based on AlphaGo Zero or AlphaZero except for the bot vs bot games that they published? (and of course the papers, leading to all these brilliant bots we now have!)

If we are comparing AG master vs Elf/LZ then it's an interesting comparison. Elf certainly seems to be more opinionated than Master. I'm perfectly happy to be challenged on this, but I think that the zero method, with it's emphasis on building a good network rather than mcts, means that I trust it more in the opening. I think that while Master on Google's hardware would have no problem beating Elf/lz on sensible pc hardware, I might actually trust the zero bots more on opening theory. I would trust Master more on reading, ladders/blindspots, life and death, and I accept that the lines between these are always blurred. Elf/LZ also offer the benefit of being able to probe their choices, which helps to investigate, as Bill points out, and make sure the bot isn't missing something important further down the tree. This might be another discussion though.
Bill Spight wrote: ..
Human blind spot? Do top bots have a decided preference one way or another? I.e., > 3% for LZ, >5% for Elf?
..
My preferred metric is to set the non-trivial mistake threshold at 3% and look at a range of pro board positions and see how frequently the move is a mistake (above this threshold.)
Click Here To Show Diagram Code
[go]$$W
$$ --------------------+
$$ . . . . . . . . . . |
$$ . . . . . . . . . . |
$$ . . . . . . . . . . |
$$ , . . . X . X . . . |
$$ . . . . . . . . . . |
$$ . . . . . O . O . . |
$$ . . . . . . . . . . |
$$ . . . . . . . X . . |
$$ . . . . . . a . . . |
$$ . . . . . . , b . . |
$$ . . . . . . . . . . |[/go]
From the sgf at the top of the thread (according to Elfv1):
'a':
52 out of 108 times this was a mistake(48.0%) (median delta is -2.88, half of deltas in the range -4.68 to -1.44)
'b':
there is less data - fewer 9d vs 9d games include this move
0 out of 10 times this was a mistake(0.0%) (median delta is -1.04, half of deltas in the range -1.39 to -0.49)
If anyone is interested in seeing all the win rate deltas:
(x - excluded due to one player being >80% in the lead when the move was played)
'a' (move 010_03)
-0.95
-4.26
-2.22
-2.88
-2.26
-2.78
0.08
-9.64
-1.96
-1.98
-6.52
-1.34
-5.23
-2.05
-9.51
-4.93
-1.42
-3.86
-0.57
-2.24
-2.77
-8.53
-4.36
-1.33
-1.43
-1.73
-1
-2.71
-9.24
-2.91
-7.94
-10.07
-1.44
-2.13
-1.27
-4.33
-3.48
-7.82
-1.73
-2.2
0.11
-3.86
-8.78
-5.85
-7.36
-6.04
-4.41
1.09
-3.05
-4.69
0.12
-1.8
-9.17
-1.46 x
-1.46
-3.57
-1.07
-6.81
-2.31
-3.31
-2.49
0.91
0.84
1.71
-3.01
-4.51
-4.55
-0.44
-0.52
-3.55
-3.74
-1.05
-4.9
-7
-5.12
-13.07
-8.16
-2.49
-3.11
-2.62
-4.02
-3.11
-8.3
-4.85
-0.12
-2.27
-6.12
-2.13
-4.68
-10.02 x
-4.91
-3.23
-2.88
-3.19
-3.51
-4.37
-4.59
-4.26
-1.61
-2.67
0.25
1.04
0.48
-1.85
-11.52
-1.54
0.17
0.2
1.17
1.39
'b' (move 014_01)
-0.35
-1.24 x
-1.77
-0.87
-1.74
-2.23 x
-1.45
-0.36
-0.92
-1.21
-0.04
-1.17
It's tough to be conclusive with only 10 board positions for b, but it looks to me like Elfv1 has a systematic preference for the counter pincer.
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: MikeKyle analyses Hoshi, low approach, low 1 space pince

Post by Bill Spight »

According to AlphaGo Teach, there is a potentially serious human blind spot in a later position in this joseki. That is, humans prefer, by a ratio of 5 to 1 (49 to 10 by Waltheri's database), a play that is likely to be a mistake.
Click Here To Show Diagram Code
[go]$$Bcm11 Obvious wedge
$$ ---------------------------------------
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . O . . . . . , . . . . . X . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . , . . . . . , . . . . . , 8 . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . O . . . . . . . . . . . . . . . |
$$ | . . . . . X . 1 . . . . 7 . . . . . . |
$$ | . . . O . . . O X 3 5 . . . . X . . . |
$$ | . . . . . X . O 2 4 . 6 . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ ---------------------------------------[/go]
Given Black's thickness on the bottom side, this wedge, :w18:, has to be a good play, right? Lee Changho played it agains Cho Hunhyun in 1996. Well, AlphaGo doesn't like it. It gives Black an estimated win rate of 44.6%, 3.2% higher (worse for White) than its chosen play. That difference suggests that :w18: is a mistake.

What's the problem with it?
Click Here To Show Diagram Code
[go]$$Bcm19 Attack
$$ ---------------------------------------
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . O . . . . . , . . . . . X . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . , . . . . . , . . . . . , O . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . 3 . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . O . . . . . . . . . . . . . . . |
$$ | . . . . . X 1 X . . . . X . . . . . . |
$$ | . . . O . . . O X X X . . . . X . . . |
$$ | . . . . . X . O O O . O . 2 . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ ---------------------------------------[/go]
Cho Hunhyun continued with a sequence AlphaGo likes. He connected at :b19: with sente and then :b21: attacked White in the bottom left.

What should White have played instead of the wedge?
Click Here To Show Diagram Code
[go]$$Wcm18 Connect
$$ ---------------------------------------
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . d b . . |
$$ | . . . O . . . . . , . . . . . X c . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . , . . . . . , . . . . . a 6 . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . O . . 2 4 . . . . . . . . . . . |
$$ | . . . . . X 1 X . . . . X . . . . . . |
$$ | . . . O . 5 3 O X X X . . . . X . . . |
$$ | . . . . . X . O O O . O . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ ---------------------------------------[/go]
:w18: - :w22: connects White's groups, obviously forestalling an attack against either one. Then :b23: takes a big point on the right side. This actually is in line with the go proverb to make urgent plays before big plays, but humans haven't particularly seen it that way. In Waltheri's database only two humans, Kato Masao and Zhang Wengdong picked AlphaGo's play. Then their opponents played at "a", but that's another question.

BTW, how does AlphaGo Teach continue from here, to deal with Black's imposing moyo? With the 3-3 invasion of at "b", of course. :o But doesn't that allow Black to build up his moyo with a block at "c"? Maybe so, but AlphaGo blocks at "d". :shock: Like I said, I don't understand this game. :lol:

BTW, AlphaGo doesn't particularly like :b17:, the 5th line keima, giving it a winrate of only 41.5%. Well, that kind of makes sense, if the moyo isn't that big a deal. What does AlphaGo like? Why, the 3-3 invasion in the top left corner, of course! ;-) It gives it a winrate of 44.0%, 2.5% better than the keima. :b17: is another potential human blind spot, chosen by humans more than 95% in Waltheri's database. Needless to say, humans chose the 3-3 invasion 0% of the time. That will change. ;)

Edit: And if the moyo isn't that big a deal, so that the keima is at least questionable, then maybe playing the press to build thickness isn't that big a deal, either. So maybe the pincer is better, eh?
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
Tryss
Lives in gote
Posts: 502
Joined: Tue May 24, 2011 1:07 pm
Rank: KGS 2k
GD Posts: 100
KGS: Tryss
Has thanked: 1 time
Been thanked: 153 times

Re: MikeKyle analyses Hoshi, low approach, low 1 space pince

Post by Tryss »

With a recent LZ network, LZ wants to play the same way (as white) and also wish to block at d with black. If black block the "old way", it the continuation looks like this :
Click Here To Show Diagram Code
[go]$$Wcm
$$ ---------------------------------------
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . 5 . . . . . |
$$ | . . . . . . . . . . . . 7 . . 3 1 . . |
$$ | . . . O . . . . . , . . . . 4 X 2 . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . , . . . . . , . . . 6 . . X . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . O . . X X . . . . . . . . . . . |
$$ | . . . . . X O X . . . . X . . . . . . |
$$ | . . . O . O O O X X X . . . . X . . . |
$$ | . . . . . X . O O O . O . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ ---------------------------------------[/go]
And LZ think white is fine, with 61.5% winrate.

And white like the R4 (or R9) invasion a couple moves later (when it get sente back)
sorin
Lives in gote
Posts: 389
Joined: Wed Apr 21, 2010 9:14 pm
Has thanked: 418 times
Been thanked: 198 times

Re: MikeKyle analyses Hoshi, low approach, low 1 space pince

Post by sorin »

Bill Spight wrote:I wonder if both human and bot preferences are the result of path dependency, of different sorts: historical for humans, computational for bots. IIRC, in chess Emmanuel Lasker said if you find a good move, look for a better one. MCTS bots, it seems, don't do that so much.
That is a fascinating question! In particular, I wonder how many unexplored things get left behind in the process of a Zero-bot training - it seems obvious that, while spending limited amount of computational resources during training, there will be moves that are only "superficially" analyzed, then the bot is building some "misconception" (relatively speaking) which translates later in not considering enough some moves that can turn out to be better (if judged by the full game tree).
sorin
Lives in gote
Posts: 389
Joined: Wed Apr 21, 2010 9:14 pm
Has thanked: 418 times
Been thanked: 198 times

Re: MikeKyle analyses Hoshi, low approach, low 1 space pince

Post by sorin »

MikeKyle wrote:Thanks Sorin, Bill Spight for your thoughts.
sorin wrote: ..
AlphaZero gives "b" 2% more compared to "a"
..
Are you referring to the original alphago teaching tool? I was under the impression that the teaching tool was based on AlphaGo Master ie. somewhere around the version that beat Ke Jie.
Yes, this tool: https://alphagoteach.deepmind.com/
You must be right, given the time when this was released, I guess it is based on the version of AlphaGo that had some influence from human games.

Which means to me that the places in the opening where it differs fundamentally from human play are even more interesting, and those will be even more differentiated in the AlphaZero version.
MikeKyle wrote: I'd love to be proven wrong, but I didn't think that we had any resources based on AlphaGo Zero or AlphaZero except for the bot vs bot games that they published? (and of course the papers, leading to all these brilliant bots we now have!)
Oh, how I would love to prove you wrong! :-)
On the other hand, it matters less and less - since we have access to open source bots which eventually will reach (and surpass) the published AlphaGo version.
sorin
Lives in gote
Posts: 389
Joined: Wed Apr 21, 2010 9:14 pm
Has thanked: 418 times
Been thanked: 198 times

Re: MikeKyle analyses Hoshi, low approach, low 1 space pince

Post by sorin »

Bill Spight wrote:According to AlphaGo Teach, there is a potentially serious human blind spot in a later position in this joseki. That is, humans prefer, by a ratio of 5 to 1 (49 to 10 by Waltheri's database), a play that is likely to be a mistake.

[...]

What should White have played instead of the wedge?

[...]

BTW, how does AlphaGo Teach continue from here, to deal with Black's imposing moyo?
AlphaGo proverb #1: "Wedge is not an option."

AlphaGo proverb #2: "There is no such thing as an 'imposing moyo'."

Sorry, I couldn't resist the temptation to anthropomorphise AlphaGo :-)

(EDIT: maybe it is really just proverb #0: "Because there is no such thing as an 'imposing moyo', wedge is not an option" :-) )
dfan
Gosei
Posts: 1598
Joined: Wed Apr 21, 2010 8:49 am
Rank: AGA 2k Fox 3d
GD Posts: 61
KGS: dfan
Has thanked: 891 times
Been thanked: 534 times
Contact:

Re: MikeKyle analyses Hoshi, low approach, low 1 space pince

Post by dfan »

Bill Spight wrote:IIRC, in chess Emmanuel Lasker said if you find a good move, look for a better one.
Indeed! Don't forget dfan's corollary: "If you find a bad move, look for a better one too."
MCTS bots, it seems, don't do that so much.
Of course it is an open question whether they do this as much as they should, but this question (called "exploration vs exploitation") is definitely one of the dominant issues for anyone studying reinforcement learning (in which agents choose actions and learn from experience), including the DeepMind folks. It is really unclear how you should balance the two behaviors! Alpha Zero and its progeny use an approach that tries to maximize effort towards lines with the greatest "upper confidence bound": that is, the moves that seem to have the maximum plausible upside. As it explores one good move, the maximum plausible upside of it will often decrease (because it learns more about it, so the uncertainty of its evaluation, both positive and negative, goes down), which causes it to then devote more energy to other moves that still have some unexplored promise.

Of course you can play with these parameters and algorithms all you like, and people have, but it's difficult. There are many instances on the Leela Zero project page of the following play in three acts:
  • Someone notices that Leela Zero got too excited about a suboptimal move in a certain position and didn't sufficiently explore another promising one.
  • They suggest a modification to Leela Zero's parameters to make it explore more in certain circumstances, with the result that Leela Zero now correctly finds the optimal move in that test position.
  • The change turns out to make Leela Zero weaker overall.
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: MikeKyle analyses Hoshi, low approach, low 1 space pince

Post by Bill Spight »

dfan wrote:
Bill Spight wrote:IIRC, in chess Emmanuel Lasker said if you find a good move, look for a better one.
Indeed! Don't forget dfan's corollary: "If you find a bad move, look for a better one too."
;) :D
MCTS bots, it seems, don't do that so much.
Of course it is an open question whether they do this as much as they should, but this question (called "exploration vs exploitation") is definitely one of the dominant issues for anyone studying reinforcement learning (in which agents choose actions and learn from experience), including the DeepMind folks. It is really unclear how you should balance the two behaviors! Alpha Zero and its progeny use an approach that tries to maximize effort towards lines with the greatest "upper confidence bound": that is, the moves that seem to have the maximum plausible upside. As it explores one good move, the maximum plausible upside of it will often decrease (because it learns more about it, so the uncertainty of its evaluation, both positive and negative, goes down), which causes it to then devote more energy to other moves that still have some unexplored promise.

Of course you can play with these parameters and algorithms all you like, and people have, but it's difficult. There are many instances on the Leela Zero project page of the following play in three acts:
  • Someone notices that Leela Zero got too excited about a suboptimal move in a certain position and didn't sufficiently explore another promising one.
  • They suggest a modification to Leela Zero's parameters to make it explore more in certain circumstances, with the result that Leela Zero now correctly finds the optimal move in that test position.
  • The change turns out to make Leela Zero weaker overall.
Thanks. I was responding to the remarkable imbalance in this case of the number of visits per candidate move by Leela Zero. In first position shown by sorin, the pincer at 43.3% winrate gets 13K visits, while the press at 43.0% gets 43 visits. In addition there are 7 more candidate moves, the worst winrate among them being 42.2%, and the most visits being 811. In terms of the goal of winning the game, it is hard to tell which of those 9 moves is better, but the bots consistently spend much more time considering the pincer and end up choosing it. That suggests a path dependency, but bots with different histories do the same thing. :scratch: OC, all a bot has to do is choose a good enough move. :)

In the second position there are also nine candidates shown. The pincer has the highest winrate of 44.2% and the highest visit count of 12K. In this case the three candidates in the top left corner have decent visit counts of more than 1K, and two of the plays have win rates of 44.1%. Still, the smallest winrate is 43.0%. Any of those candidates could be best. But the bots like the pincer. ;)

I did not mean to suggest that the bots did not make the best choices to play well. That is a different question. But, as you know, my advice to humans who are learning from bots is don't strain after gnats. OC, it would help to know the margin of error of a bot's winrate estimates, but nobody knows that. It is quite clear that it is at least 2% for AlphaGo.

----

On the question of missing the right play, in playing around with Deep Leela I have discovered that yes, just as it seemed, Leela 11 makes mistakes in the endgame. This afternoon I was going over a pro game where White resigned when DL said that its winrate was hovering around 30%. Just to see how the winrate changed as the end of the game approached, I let Deep Leela play on. White actually won the game when Black let White make a Bent Four in the Corner. :lol: Actually, for some time Black could have thrown in to make a ko, but eventually White made a bent four.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
Uberdude
Judan
Posts: 6727
Joined: Thu Nov 24, 2011 11:35 am
Rank: UK 4 dan
GD Posts: 0
KGS: Uberdude 4d
OGS: Uberdude 7d
Location: Cambridge, UK
Has thanked: 436 times
Been thanked: 3718 times

Re: MikeKyle analyses Hoshi, low approach, low 1 space pince

Post by Uberdude »

Regarding the 2-space jump in the corner b after the pincer and jump, I can't claim to understand it but I have played around with it a bit in Lizzie and tried it in a few games. Here are some possible ways it can be a pro rather than a con.
Click Here To Show Diagram Code
[go]$$W
$$ --------------------+
$$ . . . . . . . . . . |
$$ . . . . . . . . . . |
$$ . . . . . . . . . . |
$$ , . . b a . X . . . |
$$ . . . . . . . . . . |
$$ . . . . . 1 . O . . |
$$ . . . . . . . . . . |
$$ . . . . . . . X . . |
$$ . . . . . . . . . . |
$$ . . . . . . , . . . |
$$ . . . . . . . . . . |[/go]
1) It's easy to tenuki white's next one point jump at 5 (which is quite likely to happen in a natural flow if you counter pincer with 3 because it didn't get ahead with the happy followup at a you would have with 2 one line to the right. (But actually even that followup is not as good as I thought is a general lesson from bots, ie jump to 5 is not clear sente even with black one-point jump).
Click Here To Show Diagram Code
[go]$$W
$$ --------------------+
$$ . . . . . . . . . . |
$$ . . . . . . . . . . |
$$ . . . . . . . . . . |
$$ , . a 2 . . X . . . |
$$ . . . . . . . . . . |
$$ . . . 5 . 1 . O . . |
$$ . . . . . . . . . . |
$$ . . . . . 4 . X . . |
$$ . . . . . . . . . . |
$$ . . . . . . , 3 . . |
$$ . . . . . . . . . . |[/go]
2) If at a later point white ends up playing 3-3 and you want to block the top side it's more efficient to be further away. Obviously 4 will often prefer to separate at a if there is a good attack on the outside stones, but even then white can get a fairly comfortable corner life (even with the high one space jump) so you give up a lot of cash for that speculative profit, and if the outside is already settled you probably prefer this direction switch.
Click Here To Show Diagram Code
[go]$$W
$$ --------------------+
$$ . . . . . . . . . . |
$$ . . . . . . . 6 5 . |
$$ . . . . . . 4 1 7 . |
$$ , . . X . . X 2 3 . |
$$ . . . . . . . . a . |
$$ . . . . . O . O . . |
$$ . . . . . . . . . . |
$$ . . . . . . . X . . |
$$ . . . . . . . . . . |
$$ . . . . . . , . . . |
$$ . . . . . . . . . . |[/go]
3) Black closing the corner with iron pillar later is more efficient and therefore more worthwhile to spend a move on, bigger (not-quite) territory. Iron pillar also threatens the 2nd line connection to the pincer stone so is a nice way to take profit, probing opponent do they want to play a less valuable move to prevent that emergency connection in gote. Here's a half board example (5 solid connect rather than extend joseki is also interesting to avoid giving momentum to settle):
Click Here To Show Diagram Code
[go]$$B
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . O . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . X . 3 5 . . . . . . . . . . |
$$ | . . . O . . . 2 1 , . . . . . X . . . |
$$ | . . . 8 . X . O 4 . . 6 . 7 . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ | . . . . . . . . . . . . . . . . . . . |
$$ +---------------------------------------+[/go]
Of course there are also cons, the most obvious being the thinness of 2-space jump to various attaches and cuts. LZ is sometimes willing to allow them to be split if you get a solid chunky corner territory out of it (e.g. from a ponnuki).
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: MikeKyle analyses Hoshi, low approach, low 1 space pince

Post by Bill Spight »

One thing that bothers me about the highly unbalanced comparison between plays is that, when the winrate estimates are close between two plays, shouldn't the number of visits for second place be roughly the same as the number of visits for the eventual winner of the comparison? That is, in general the more visits a play gets, the smaller the margin of error of its evaluation should be. So it seems that we have a case where the margins of error of two competing evaluations overlap, and instead of reducing the larger margin of error, much more effort is spent reducing the smaller margin of error. You could get it down to a point and there would still be overlap.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
sorin
Lives in gote
Posts: 389
Joined: Wed Apr 21, 2010 9:14 pm
Has thanked: 418 times
Been thanked: 198 times

Re: MikeKyle analyses Hoshi, low approach, low 1 space pince

Post by sorin »

Bill Spight wrote:One thing that bothers me about the highly unbalanced comparison between plays is that, when the winrate estimates are close between two plays, shouldn't the number of visits for second place be roughly the same as the number of visits for the eventual winner of the comparison? That is, in general the more visits a play gets, the smaller the margin of error of its evaluation should be. So it seems that we have a case where the margins of error of two competing evaluations overlap, and instead of reducing the larger margin of error, much more effort is spent reducing the smaller margin of error. You could get it down to a point and there would still be overlap.
The relative difference in number of visits may be actually "noise" since the total number of visits is still quite small in my analysis earlier in this thread. Maybe someone with a stronger computer or more time can let LeelaZero do a deeper analysis and, with many more total visits, the pattern that you expect may be true, as in relatively close winrate moves would also have relatively close number of visits...
Post Reply