It is currently Thu Mar 28, 2024 5:39 pm

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 4 posts ] 
Author Message
Offline
 Post subject: Studying Rémi Coulom's paper
Post #1 Posted: Tue Sep 11, 2018 1:59 am 
Oza
User avatar

Posts: 2408
Location: Ghent, Belgium
Liked others: 359
Was liked: 1019
Rank: KGS 2d OGS 1d Fox 4d
KGS: Artevelde
OGS: Knotwilg
Online playing schedule: UTC 18:00 - 22:00
I'm studying Rémi Coulom's paper https://hal.inria.fr/inria-00116992/document

It may already have been discussed here or in stack overflow / reddit ... but there's a bunch of smart people in here, maybe even Rémi himself and although I'm interested in the AI (or should I say machine learning) aspect of it, I will naturally approach it from a go angle.

No questions for now.


Things I have already figured out for myself, I think:

- minimax algorithm: a positive score means player A wins, a negative score means player B wins. When it's A's move, the program chooses the node with the maximum value, when it's B's move, the minimum.
- alpha-beta pruning: alpha is the minimum (worst) score A is assured of, beta is the maximum (worst) score B is assured of. When alpha becomes greater than beta, the node is discarded for further research because such a node cannot be reached by best play for both
- a "backup operator" is nothing else than the mechanism of computing the value (score) of a node from the values of its children nodes, starting from the leaves of the tree and going "back up". The name had me confused because I thought it was something that computed a "backup value" for the real value.
- "averaging" is a generalized term for evaluation. In traditional programs, an evaluation function computes a value for the leaf by using heuristics such as piece loss or central position in chess. I assume that with the introduction of randomization, which is core to Monte-Carlo, such evaluation has in general a degree of averaging over the random selection underneath the leaf
- the "count" of a node is the number of times the algorithm passed through the node
- an "internal node" is one for which the count is higher than a threshold value (Coulom sets this value = the number of points on the goban, which he claims to be empirically validated as a good compromise)
- an "external node" is one with a lower count than the threshold

The core idea is to no longer separate "backing up" and "averaging", and instead of completely cutting of less promising nodes, search them considerably less than promising nodes. This introduces "the probability of being the best move" as the score.

(to be continued)

Top
 Profile  
 
Offline
 Post subject: Re: Studying Rémi Coulom's paper
Post #2 Posted: Tue Sep 11, 2018 3:14 am 
Oza
User avatar

Posts: 2408
Location: Ghent, Belgium
Liked others: 359
Was liked: 1019
Rank: KGS 2d OGS 1d Fox 4d
KGS: Artevelde
OGS: Knotwilg
Online playing schedule: UTC 18:00 - 22:00
First question:

In the chapter on selecitivy, Coulom says: "n-armed bandit algorithms and stochastic optimization assume stationary distributions of evaluations, which is not the case when searching recursively"

I read this as, "when we go up and down the search tree to generate new nodes and random continuations, the probabilities (of being the best move) for the nodes change"

Is this a correct interpretation?

Top
 Profile  
 
Offline
 Post subject: Re: Studying Rémi Coulom's paper
Post #3 Posted: Tue Sep 11, 2018 3:55 am 
Lives in gote

Posts: 502
Liked others: 1
Was liked: 153
Rank: KGS 2k
GD Posts: 100
KGS: Tryss
I think it's "the probability to explore each node change with the tree exploration" instead.

In this algorithm, when you're at a (fixed) node, the probability to explore each node depend on the explored tree, not just on the position at this node.


This post by Tryss was liked by: Knotwilg
Top
 Profile  
 
Offline
 Post subject: Re: Studying Rémi Coulom's paper
Post #4 Posted: Tue Sep 11, 2018 12:08 pm 
Lives in gote

Posts: 653
Location: Austin, Texas, USA
Liked others: 54
Was liked: 216
"n-armed bandit algorithms and stochastic optimization assume stationary distributions of evaluations, which is not the case when searching recursively"

From a certain position if you only do rollouts from there (no tree expansion), you will sample from a stationary distribution. You will get e.g. 50% wins, and this win probability does not change over time. But if you start constructing a tree and use some algorithm to go down the tree, and only from the leaves do your rollouts from those leaves, things can change. The search may discover a way to kill a large group of stones. Now the search will focus on lines that always kill the large group, and only after the group is dead do the rollouts. Now instead of 50% wins you get 90% wins. So the distribution has changed.

Compare to the multi armed bandit model that these algorithms build on. Each arm has a fixed payoff probability.

BTW my guess is "external nodes" are leaf nodes and "internal nodes" are those that have children already. Back when this paper was written and rollouts were used, it was common to do ~100 rollouts from a leaf before deciding it was time to expand that leaf and incorporate it into the tree search plus backup procedure.


This post by yoyoma was liked by 3 people: Rémi, Waylon, zermelo
Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group