It is currently Fri Jul 19, 2019 5:43 pm

 All times are UTC - 8 hours [ DST ]

 Page 1 of 1 [ 4 posts ]
 Print view Previous topic | Next topic
Author Message
 Post subject: Studying Rémi Coulom's paper #1 Posted: Tue Sep 11, 2018 1:59 am
 Lives in sente

Posts: 1306
Location: Ghent, Belgium
Liked others: 184
Was liked: 582
Rank: Bel 2d KGS 3d TG 4d
KGS: Artevelde
Tygem: Knotwilg
I'm studying Rémi Coulom's paper https://hal.inria.fr/inria-00116992/document

It may already have been discussed here or in stack overflow / reddit ... but there's a bunch of smart people in here, maybe even Rémi himself and although I'm interested in the AI (or should I say machine learning) aspect of it, I will naturally approach it from a go angle.

No questions for now.

Things I have already figured out for myself, I think:

- minimax algorithm: a positive score means player A wins, a negative score means player B wins. When it's A's move, the program chooses the node with the maximum value, when it's B's move, the minimum.
- alpha-beta pruning: alpha is the minimum (worst) score A is assured of, beta is the maximum (worst) score B is assured of. When alpha becomes greater than beta, the node is discarded for further research because such a node cannot be reached by best play for both
- a "backup operator" is nothing else than the mechanism of computing the value (score) of a node from the values of its children nodes, starting from the leaves of the tree and going "back up". The name had me confused because I thought it was something that computed a "backup value" for the real value.
- "averaging" is a generalized term for evaluation. In traditional programs, an evaluation function computes a value for the leaf by using heuristics such as piece loss or central position in chess. I assume that with the introduction of randomization, which is core to Monte-Carlo, such evaluation has in general a degree of averaging over the random selection underneath the leaf
- the "count" of a node is the number of times the algorithm passed through the node
- an "internal node" is one for which the count is higher than a threshold value (Coulom sets this value = the number of points on the goban, which he claims to be empirically validated as a good compromise)
- an "external node" is one with a lower count than the threshold

The core idea is to no longer separate "backing up" and "averaging", and instead of completely cutting of less promising nodes, search them considerably less than promising nodes. This introduces "the probability of being the best move" as the score.

(to be continued)

Top

 Post subject: Re: Studying Rémi Coulom's paper #2 Posted: Tue Sep 11, 2018 3:14 am
 Lives in sente

Posts: 1306
Location: Ghent, Belgium
Liked others: 184
Was liked: 582
Rank: Bel 2d KGS 3d TG 4d
KGS: Artevelde
Tygem: Knotwilg
First question:

In the chapter on selecitivy, Coulom says: "n-armed bandit algorithms and stochastic optimization assume stationary distributions of evaluations, which is not the case when searching recursively"

I read this as, "when we go up and down the search tree to generate new nodes and random continuations, the probabilities (of being the best move) for the nodes change"

Is this a correct interpretation?

Top

 Post subject: Re: Studying Rémi Coulom's paper #3 Posted: Tue Sep 11, 2018 3:55 am
 Lives in gote

Posts: 403
Liked others: 1
Was liked: 110
Rank: KGS 2k
GD Posts: 100
KGS: Tryss
I think it's "the probability to explore each node change with the tree exploration" instead.

In this algorithm, when you're at a (fixed) node, the probability to explore each node depend on the explored tree, not just on the position at this node.

 This post by Tryss was liked by: Knotwilg
Top

 Post subject: Re: Studying Rémi Coulom's paper #4 Posted: Tue Sep 11, 2018 12:08 pm
 Lives in gote

Posts: 614
Location: Austin, Texas, USA
Liked others: 43
Was liked: 205
"n-armed bandit algorithms and stochastic optimization assume stationary distributions of evaluations, which is not the case when searching recursively"

From a certain position if you only do rollouts from there (no tree expansion), you will sample from a stationary distribution. You will get e.g. 50% wins, and this win probability does not change over time. But if you start constructing a tree and use some algorithm to go down the tree, and only from the leaves do your rollouts from those leaves, things can change. The search may discover a way to kill a large group of stones. Now the search will focus on lines that always kill the large group, and only after the group is dead do the rollouts. Now instead of 50% wins you get 90% wins. So the distribution has changed.

Compare to the multi armed bandit model that these algorithms build on. Each arm has a fixed payoff probability.

BTW my guess is "external nodes" are leaf nodes and "internal nodes" are those that have children already. Back when this paper was written and rollouts were used, it was common to do ~100 rollouts from a leaf before deciding it was time to expand that leaf and incorporate it into the tree search plus backup procedure.

 This post by yoyoma was liked by 3 people: Rémi, Waylon, zermelo
Top

 Display posts from previous: All posts1 day7 days2 weeks1 month3 months6 months1 year Sort by AuthorPost timeSubject AscendingDescending
 Page 1 of 1 [ 4 posts ]

 All times are UTC - 8 hours [ DST ]

#### Who is online

Users browsing this forum: Google Feedfetcher and 1 guest

 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum

Search for:
 Jump to:  Select a forum ------------------ Life In 19x19.com General Topics    Introductions and Guidelines    Off Topic    Announcements    General Go Chat    Beginners    Amateurs    Professionals       Lee Sedol vs Gu Li    Go Rules    Forum/Site Suggestions and Bugs    Creative writing    Tournaments       Ride share to tournaments Improve Your Game    Game Analysis    Study Group    Teachers/Club Leaders       Teacher advertisements    Study Journals L19²GO (Malkovich)    1-on-1 Malkovich games    Big Brother Malkovich games    Rengo Games    Other versions of turn-based games Go Gear    Go Books    Go Book Reviews    Computer Go    Gobans and other equipment    Trading Post    New Products/Upgrades/Sales Go Club Forums    Go Club Discussions       Honinbo Go League    American Go Association Forum       Go Congress 2011 volunteers       AGA volunteers ( non-congress)    Australian Go Association    European Go Federation Forum    Singapore Weiqi Association    KGS    ASR League    IGS    OGS    Tygem    WBaduk    Turn Based Servers    Insei League Events    Kaya.gs       King of the Hill