Life In 19x19

Posted: **Fri Oct 20, 2017 2:14 pm**

A really common technique in ML is to reduce the "learning rate" as a model starts to converge. And it produces bumps in the model performance exactly like that. So it probably didn't learn specific knowledge at that point, or anything more important than other parts of the learning process, it was just a momentary acceleration of learning.

For those confused, learning rate is a parameter used in gradient descent (the standard algorithm used for training machine learning models these days). It's a bit misleadingly named; it's the size of the steps that the model should make as it's progressively walking down the gradient of the loss function. There is usually an empirically discovered "optimal" learning rate for any given model that gets the best performance. Either a higher or lower learning rate results in slower learning. And the ideal learning rate usually decreases as the model learns.

For those who are now more confused, there is a number that you can tweak when training models to make them work better, and it causes bumps in graphs when you tweak it.

Posted: **Sat Oct 21, 2017 10:50 am**

Bill Spight wrote:Even rather dumb programs can learn the rules in that sense through self-play and reinforcement learning. Illegal moves are penalized, that is enough.

The pictures posted says that the program already had the "basic rules" as input when it started learning.

Posted: **Sat Oct 21, 2017 5:43 pm**

RobertJasiek wrote:Elsewhere, I read about a plan to learn "the" rules on its own. a) If this relies on an input of existing played games, it is possible. b) If there is nothing but the playing material, there cannot be a learning of _the_ rules - there can only be a learning of possible rulesets of possible games that might be played with the playing material.

IMO it is impossible to learn Go from scratch, i.e. just the board and stones. The program could learn the playing rules, but because there are countable infinitely ways to score and to determine an outcome of a game, it would get lost. The real interesting things will be if they apply their algorithm to real-life problems, because there you don't have precise rules, definite results and much more complexity (e.g. Go's sample space has about 10^170 elements, but soccer should have easily 10^10^170 even if you look at it discretely.)

Posted: **Sun Oct 22, 2017 12:30 pm**

Pippen wrote:IMO it is impossible to learn Go from scratch, i.e. just the board and stones. The program could learn the playing rules, but because there are countable infinitely ways to score and to determine an outcome of a game, it would get lost.

What do you mean - AlphaGo Zero did just that, it learned Go from scratch (from just a board and stones location representation).

Posted: **Sun Oct 22, 2017 12:47 pm**

alphaville wrote:
Pippen wrote:IMO it is impossible to learn Go from scratch, i.e. just the board and stones. The program could learn the playing rules, but because there are countable infinitely ways to score and to determine an outcome of a game, it would get lost.
What do you mean - AlphaGo Zero did just that, it learned Go from scratch (from just a board and stones location representation).

I think the 'rule' was absent from 'IMO it is impossible to learn Go from scratch'

Posted: **Mon Oct 23, 2017 12:12 pm**

Some go knowledge was involved, but indirectly. A winner is determined (by a simple, non-ML portion of the program) at the terminal state of each game. For training the network, the position on the board is given, along with a history of recent moves, and an estimated winning probability given various possible moves. The network is trained to predict the likelihood of next moves and the probabilities for the eventual winner. It learns what board configurations are likely wins or losses, and how to get there. (this is a simplification)

There is still a part external to the neural network that has enough of the rules in order to handle capture, to be able to score the game, to handle ko rules, etc. The AI is not asked to reinvent the rules of go (any number of other games could be played on a go board too). It's more precise to say that there was no go strategy hardwired in and no human games to learn from.

So the network learns how to select moves that maximize the probability of arriving at a winning condition. It doesn't itself determine which side won the game. How its go knowledge is represented in the network (and whether it represents something like rules) is probably not interpretable.

How ko and other illegal moves is handled is not in the paper, but there are several ways to do it (e.g. simply masking out illegal moves from the network's predictions, or disqualifying the player that plays them and scoring it as a loss, or imposing a penalty of some kind.)

This is similar to Deepmind's Atari game demonstrations. The network is given raw pixels, and the score. It's not told the rules of Breakout or whatever, it just learns how to make moves to get to the highest score.

Posted: **Mon Oct 23, 2017 4:12 pm**

Obviously AG had to be given 1) the basic rules, 2) counting & scoring and 3) the goal of the game in advance. You cannot learn that as an AI. We has humans can because we can look outside of the game and learn 1)-3) from that "higher perspective", e.g. when we see multiple times that someone smiles with a trophy after having more points on the board surrounded than his opponent we can figure that in this game someone wins if he has more points. AG doesn't have this perspective, but soon it might have.

Posted: **Tue Oct 24, 2017 10:14 am**

I've looked in the various threads about AlphaGo Zero and failed to find mention of the article linked below. It begins with some famous Go history then goes on to quote some thoughts from Michael Redmond and others.

"The AI That Has Nothing to Learn From Humans" | The Atlantic

Posted: **Tue Oct 24, 2017 10:34 am**

Recusant wrote:I've looked in the various threads about AlphaGo Zero and failed to find mention of the article linked below. It begins with some famous Go history then goes on to quote some thoughts from Michael Redmond and others.

"The AI That Has Nothing to Learn From Humans" | The Atlantic

I read almost every article on Zero, this article has pro in that it include go history and very well-written, but Redmond, Shi Yue, Lockhart comments are actually toward Master version, not Zero. I'd take any expert in a non go-related field comments on Zero over old information anytime. But yes, this article was posted in reddit baduk and many people like it, and many people in lifein19x19 will like it too, that's for sure https://www.reddit.com/r/baduk/comments ... om_humans/

Posted: **Tue Oct 24, 2017 6:54 pm**

I found the following exchange on reddit interesting ( https://www.reddit.com/r/MachineLearnin ... ittwieser/ )

cassandra wrote:Do you think that AlphaGo would be able to solve Igo Hatsuyôron's problem 120, the "most difficult problem ever", i. e. winning a given middle game position, or confirm an existing solution (e.g. http://igohatsuyoron120.de/2015/0039.htm)?

David_SilverDeepMind wrote: We just asked Fan Hui about this position. He says AlphaGo would solve the problem, but the more interesting question would be if AlphaGo found the book answer, or another solution that no one has ever imagined. That's the kind of thing which we have seen with so many moves in AlphaGo’s play!

As if Fan Hui would know!

Besides, Fan Hui is a cheerleader for AlphaGo. I doubt whether AlphaGo would solve that problem without spending a very long time on it, to build a humungous search tree. Chess programs are more search oriented than AlphaGo, and, despite their superhuman strength, sometimes miss plays that humans have found in actual play.

Posted: **Wed Oct 25, 2017 12:59 am**

Pippen wrote:Obviously AG had to be given 1) the basic rules, 2) counting & scoring and 3) the goal of the game in advance. You cannot learn that as an AI. We has humans can because we can look outside of the game and learn 1)-3) from that "higher perspective", e.g. when we see multiple times that someone smiles with a trophy after having more points on the board surrounded than his opponent we can figure that in this game someone wins if he has more points. AG doesn't have this perspective, but soon it might have.

There's no need for AG to have this perspective, and it did not. It just needs someone to tell it at the end of the game "you won" or "you lost". It can then figure out who won by itself eventually - that's the value network.

Of course they just implemented that in the program rather have a human arbiter hand out trophies, but it's not an intrinsic part of the program. They could have made it pop up 2 buttons for Aja and have him count the final positions out

Posted: **Wed Oct 25, 2017 3:36 am**

Bill Spight wrote:I doubt whether AlphaGo would solve that problem without spending a very long time on it, to build a humungous search tree.

I do not just doubt AlphaGo could solve it, I think it is flat out impossible that it would even get close to a solution. The tree would be too large.

The position is artificial, and very specifically crafted, and the way to solve it is not really related to general go strength. It requires specific domain knowledge, such as hanezeki, rather than things like intuition, shape, or positional judgement. The fact that three amateurs could improve on previous professional work should be ample evidence that playing strength is not the main requirement.

Similarly, I do not think AlphaGo would be able to reliably solve any of Bill's tedomari puzzles, again because it requires very different skills not directly related to playing strength.

Posted: **Wed Oct 25, 2017 3:49 am**

HermanHiddema wrote:I do not think AlphaGo would be able to reliably solve any of Bill's tedomari puzzles

AlphaGo Zero uses reading on top of the NN so it could find the correct solutions in most cases of such small puzzles. The question is rather whether it would find them for puzzles with enough reading complexity. Would adding 30 simple corridors already provide sufficient confusion? If it has learnt to play simple gotes in decreasing order, corridors with a rich end before empty corridors and empty corridors in order of decreasing length, such puzzles might be played correctly. Without having learnt such, the brute force reading complexity is too great.

Posted: **Wed Oct 25, 2017 7:29 am**

RobertJasiek wrote:
HermanHiddema wrote:I do not think AlphaGo would be able to reliably solve any of Bill's tedomari puzzles
AlphaGo Zero uses reading on top of the NN so it could find the correct solutions in most cases of such small puzzles. The question is rather whether it would find them for puzzles with enough reading complexity. Would adding 30 simple corridors already provide sufficient confusion? If it has learnt to play simple gotes in decreasing order, corridors with a rich end before empty corridors and empty corridors in order of decreasing length, such puzzles might be played correctly. Without having learnt such, the brute force reading complexity is too great.

Yes, I do mean those puzzles where the end of the game is beyond a reasonable horizon for tree search. The small ones it could well solve.

The term corridor is another good example of an abstraction that is not likely to be part of AlphaGo's repertoire. The advantage of such specific abstraction is far too small (1 point max for tedomari) to find it's way into a generic go playing neural net (though you could perhaps train one if for all the inputs the win depended on tedomari). It would be drowned out by other patterns.

Posted: **Wed Oct 25, 2017 8:05 am**

HermanHiddema wrote:
Bill Spight wrote:I doubt whether AlphaGo would solve that problem without spending a very long time on it, to build a humungous search tree.
I do not just doubt AlphaGo could solve it, I think it is flat out impossible that it would even get close to a solution. The tree would be too large.

Maybe so. But give it a few days and a few terabytes and who knows?

However, I do think that AlphaGo's machine learning algorithm could solve it by training on that specific problem. Maybe even in one day. Maybe even starting from scratch. (But starting from AlphaGo Zero would probably be quicker.)

Similarly, I do not think AlphaGo would be able to reliably solve any of Bill's tedomari puzzles, again because it requires very different skills not directly related to playing strength.

As you and Robert have pointed out, my puzzles are small enough and easy enough that any of the strong programs could probably solve them very quickly. But large problems like those in Mathematical Go are another story.

Life In 19x19

AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch

Re: AlphaGo Zero: Learning from scratch