it's not just tenuki

djhbrown · #1

MCTS bots play the percentages - that's what statistical sampling means.

Mike Novack · #2

djhbrown wrote:

MCTS bots play the percentages - that's what statistical sampling means.............. because no opponent with half a brain would be so dumb as to tenuki during a forced sequence........ they don't have emotional reactions like we humans - it's only that it looks like that to us. Whereas, in fact, they are Quixotic all the time, even when playing at their best.........

And it's not just tenuki, as i found out the hard way this morning against Hirabot33 (and Lee Sedol found out the hard way in game 2 against Alphago).

At move 32 in the above game, Hirabot33 played what looked to me to be an unbelievably stupid move. And followed it up with the even more banal 34. What on earth was going on its little bot-mind??

It is difficult to understand that you sometime seem to understand that the bots aren't "thinking" like we do while at other times seem to imagine that human style thinking is the only way to go.

You shouldn't assume that "statistical sampling" (by including lines a "thinking human" wouldn't try) is necessarily bad. The bots might have different strengths. They don't need a "plan" but will be analyzing afresh each position. That means they might be better at making use of little bits of aji scattered over the board, none of which individually appear to offer very much (and so the human can't plan around them) but that collectively add up to an advantage that will eventually materialize.

Those odd moves and odd tenukis might be good moves, just ones too difficult for a human to see the point of because the benefit is remote. There isn't SPECIFIC plan that the move affects.

Maybe the way to try to look at this is to think back when you would have a hard time understanding a correct (good) tenuki. Say you are playing in a joseki sequence and all of a sudden the opponent tenukis. At some point you learned to look at that (odd) move and recognized that if you didn't respond you would suffer a disadvantage at that location but the move also was a ladder breaker affecting the way you were playing the joseki. As a human player you were able to recognize the "plan" involved. In other words, the human opponent could conceive of that particular tenuki and you to recognize why it would work.

But now suppose it was one of these bots doing that. The reason might not be because of a potential ladder in the area you are now playing but the likelihood of several other ladders in areas not currently being played in (and the collective value of those might be more than the local loss in the area being played in).

oren · #3

djhbrown wrote:

At move 32 in the above game, Hirabot33 played what looked to me to be an unbelievably stupid move. And followed it up with the even more banal 34. What on earth was going on its little bot-mind??

32 looked like an obvious move to me. 34 requires a little bit of reading. I'm not sure you can compare this at all to Lee Sedol vs AlphaGo.

jeromie · #4

While I'm about your level and it's difficult to tell for sure, it looks to me like HiraBot gets a good result even if you play correctly because there are forcing moves to build white's outside wall. The bot has "decided" that the solidification of black's territory is worth the outside gain. Of course, if black makes a mistake the program may as well take the profit that is offered.

I think this is similar to many of AlphaGo's moves: the software has calculated that a small local loss is worth the global gain. This is what has made professionals and amateurs alike inspect the games with great detail: AlphaGo evaluates the position differently than most humans, and that means we have an opportunity to learn.

Remember that the neural networks that restrict the moves ALphaGo considers were developed through many, many iterations of self play. Since the ability of white and black to follow complex lines would be entirely equal, trick moves would be unlikely to show favorable results under these conditions.

I do think that many of the problems you are describing have been a part of existing bots, especially when the outcome of the game is mostly decided. For the most part, the addition of neural networks has limited this problem when the game is still competitive. But we must tread lightly as we begin studying the play of professional level bots. We shouldn't accept every move just because the bot played it (perhaps it is displaying some of the problems you highlight!), but neither should we reject moves because we don't immediately understand them (perhaps the move is right after all). Amateurs who play stronger players are familiar with this tension every time they play!

djhbrown · #5

jeromie wrote:

it looks to me like HiraBot gets a good result even if you play correctly because there are forcing moves to build white's outside wall.

My interest is this: Where does AI go from here?

Kirby · #6

I thought AlphaGo was still improving to this day through its self play.

EdLee · #7

Quote:

I thought AlphaGo was still improving to this day through its self play.

A reasonable assumption, and that DM continues to add architectural improvements to it.

pookpooi · #8

djhbrown wrote:

One obvious way to improve Alphago is to add yet more processors to increase the size of the samples and maybe a few more gerzillion self-play RL exercises (although i feel that RL's hill-climbing levels out pretty quickly). Alphago uses about 2000 parallel processors, whereas Zen and others are limited to about 4 or so. That's an increase of 3 orders of magnitude and may be worth 2 or even 3 stones at their level. Or it may not. We won't know until they play each other.

In commercial version, Zen is limited to 8 cores (2013 version, deep learning version could parallel even more processors) and Crazy Stone is limited to 64 cores (deep learning version, Remi answer this himself, was 32 cores in 2015 version) For experimental version Zen use two Xeon E5-2623 v3 x2 and four GeForce GTX TITAN X while Crazy Stone use Xeon, 18 cores, 36 threads, 2.9GHz. But I agree that these hardware can't even compare to the single machine version of AlphaGo (48 cpu, 8 gpu).

djhbrown wrote:

in http://papers.ssrn.com/sol3/papers.cfm? ... id=2818149 i showed that (2) just a little commonsense would have guided Alphago to finding a workable defence to Lee's magic wedge in game 4.

I think you already know that DeepMind eradicate game 4 bug by training AlphaGo even more, what do you think about this method? They said the bug is 'horizontal effect' but did not elaborate that term. It's like they're not quite sure either.

djhbrown · #9

i would imagine that DM are currently more focussed on producing something useful in image analysis for differential diagnosis of medical conditions.

pookpooi · **#10**

djhbrown wrote:

pookpooi wrote:

I think you already know that DeepMind eradicate game 4 bug by training AlphaGo even more, what do you think about this method? They said the bug is 'horizontal effect' but did not elaborate that term. It's like they're not quite sure either.

i didn't know that; if you know of a public statement to that effect, please share it. as to "horizontal effect", i agree with them. as i said before, it's a kind of "horizon effect" - but a horizon width rather than depth. in the case of game 4 black 79, Alphago hadn't looked wide enough.

Here https://www.reddit.com/r/baduk/comments ... _is_fixed/

and here https://www.youtube.com/watch?v=LX8Knl0g0LE it's the last question of q&a section, so nearly the end of the video

djhbrown · **#11**

thanks for the links, pookpooi.

re: fixing the bug

i enjoyed Fan Hui's anecdote about imagining that they wanted to wire him up to probe his brain while he was playing Go.

oren · **#12**

For fun, I ran the game through to see what crazystone and zen thought. They also looked at Hirabot's moves for 32 and 34 early and then started moving away. So shapewise, it's a good move to look at for a first try and the stronger bots decide not to play them.

Kirby · **#13**

djhbrown wrote:

as to "fixing the bug", it is conceivable that more RL trials would improve performance, but there is evidence that RL tails off asymptotically [1], so i guess they found a different way, by simply presenting the position after white 78 to the policy network, telling it that Kim's move of L10 is the correct reply. And telling the value network that the position after black L10 is a win for black.

Actually, I think Aja said that the "bug" was fixed simply by continuing self play. They didn't explicitly give information tailored to the situation, and let it just keep improving itself. Later, they presented the same board position, and the new version of AlphaGo found the correct answer.

RL may tail off at an asymptote, but I don't think AlphaGo has reached that point yet. So far, it appears to have continued improvement simply through self play.

djhbrown · **#14**

If you were in charge of training Alpha, and had new data from 5 games against one of the world's best players, it would be rather remiss of you not to tell Alpha to learn from that experience and instead just hope she would learn enough solely through self-play.

pookpooi · **#15**

If I'm in charge with AlphaGo then I'd do what you recommend, directly feed the correct positions, force AlphaGo to learn.
But the real question is, is it that easy? What's more convenience for programmers between doing that or let AlphaGo selfplay correct itself? DeepMind knows, I don't.

Kirby · **#16**

Sounds to me you are just speculating, djhbrown. Of course I am, too, but at least it's an opinion based on what Aja said. To me, the power behind their approach is the limited domain knowledge. I don't see a reason to stray from that philosophy.

Besides, if the current version of AlphaGo is really as strong as they say, self play provides better quality games than against Lee Sedol.

Mike Novack · **#17**

djhbrown wrote:

but that would be the DCNN equivalent of patching the code to fix a single case; it would not remedy the systemic underlying design flaw, which i perceive to be a lack of focus due to a lack of a conceptual overview - a lack of positional judgement!

I think you might be helped understanding the difference between a neural net not (yet) getting something right and a bug in its implementation. That a neural net cannot yet do something (cannot "correctly" evaluate the function) for an input it has not yet been trained on is not a "bug". Nor does correcting this one case ONLY fix that one case. Were that the situation, neural nets wouldn't be good for very much.

In the beginning (before training) a neural net can't do anything. It is then trained (cell values adjusted, for the moment ignore how) so that for each input from its training set, it produces the correct output. Again ignoring the process, except to point out that the adjustments must not just get the new input/result correct but must not mess up all the previous input/result pairs. What happens (what a neural net is good for) is that not only will the neural net give the correct input/result pairs it has been trained on, it becomes likely that given an input it has never seen before (one it has NOT been trained on) it will also give the correct result.

So fixing this one KNOWN "error" is actually likely to fix other errors not yet encountered.

it's not just tenuki

Who is online