Life In 19x19
http://lifein19x19.com/

Studying Rémi Coulom's paper
http://lifein19x19.com/viewtopic.php?f=18&t=16059
Page 1 of 1

Author:  Knotwilg [ Tue Sep 11, 2018 1:59 am ]
Post subject:  Studying Rémi Coulom's paper

I'm studying Rémi Coulom's paper https://hal.inria.fr/inria-00116992/document

It may already have been discussed here or in stack overflow / reddit ... but there's a bunch of smart people in here, maybe even Rémi himself and although I'm interested in the AI (or should I say machine learning) aspect of it, I will naturally approach it from a go angle.

No questions for now.


Things I have already figured out for myself, I think:

- minimax algorithm: a positive score means player A wins, a negative score means player B wins. When it's A's move, the program chooses the node with the maximum value, when it's B's move, the minimum.
- alpha-beta pruning: alpha is the minimum (worst) score A is assured of, beta is the maximum (worst) score B is assured of. When alpha becomes greater than beta, the node is discarded for further research because such a node cannot be reached by best play for both
- a "backup operator" is nothing else than the mechanism of computing the value (score) of a node from the values of its children nodes, starting from the leaves of the tree and going "back up". The name had me confused because I thought it was something that computed a "backup value" for the real value.
- "averaging" is a generalized term for evaluation. In traditional programs, an evaluation function computes a value for the leaf by using heuristics such as piece loss or central position in chess. I assume that with the introduction of randomization, which is core to Monte-Carlo, such evaluation has in general a degree of averaging over the random selection underneath the leaf
- the "count" of a node is the number of times the algorithm passed through the node
- an "internal node" is one for which the count is higher than a threshold value (Coulom sets this value = the number of points on the goban, which he claims to be empirically validated as a good compromise)
- an "external node" is one with a lower count than the threshold

The core idea is to no longer separate "backing up" and "averaging", and instead of completely cutting of less promising nodes, search them considerably less than promising nodes. This introduces "the probability of being the best move" as the score.

(to be continued)

Author:  Knotwilg [ Tue Sep 11, 2018 3:14 am ]
Post subject:  Re: Studying Rémi Coulom's paper

First question:

In the chapter on selecitivy, Coulom says: "n-armed bandit algorithms and stochastic optimization assume stationary distributions of evaluations, which is not the case when searching recursively"

I read this as, "when we go up and down the search tree to generate new nodes and random continuations, the probabilities (of being the best move) for the nodes change"

Is this a correct interpretation?

Author:  Tryss [ Tue Sep 11, 2018 3:55 am ]
Post subject:  Re: Studying Rémi Coulom's paper

I think it's "the probability to explore each node change with the tree exploration" instead.

In this algorithm, when you're at a (fixed) node, the probability to explore each node depend on the explored tree, not just on the position at this node.

Author:  yoyoma [ Tue Sep 11, 2018 12:08 pm ]
Post subject:  Re: Studying Rémi Coulom's paper

"n-armed bandit algorithms and stochastic optimization assume stationary distributions of evaluations, which is not the case when searching recursively"

From a certain position if you only do rollouts from there (no tree expansion), you will sample from a stationary distribution. You will get e.g. 50% wins, and this win probability does not change over time. But if you start constructing a tree and use some algorithm to go down the tree, and only from the leaves do your rollouts from those leaves, things can change. The search may discover a way to kill a large group of stones. Now the search will focus on lines that always kill the large group, and only after the group is dead do the rollouts. Now instead of 50% wins you get 90% wins. So the distribution has changed.

Compare to the multi armed bandit model that these algorithms build on. Each arm has a fixed payoff probability.

BTW my guess is "external nodes" are leaf nodes and "internal nodes" are those that have children already. Back when this paper was written and rollouts were used, it was common to do ~100 rollouts from a leaf before deciding it was time to expand that leaf and incorporate it into the tree search plus backup procedure.

Page 1 of 1 All times are UTC - 8 hours [ DST ]
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/