I would much appreciate if someone could explain me what exactly "upper confidence bounds applied to trees" is.
I added Monte Carlo to my engine, and now it plays even worse
Thank you. I didn't understand what author of the article (Magnus Persson?) wanted to say. It seems that English isn't his native language. Or I'm not a smart guyLi Kao wrote: Those articles are a good starting point:
http://senseis.xmp.net/?UCT
Code: Select all
The sum of the winrate and the number (UCTvalue) for each candidate move n is computed with
UCTValue(parent, n) = winrate + sqrt((ln(parent.visits))/([b]5[/b]*n.nodevisits))
Thanks a lot!emeraldemon wrote:The most comprehensive explanation is Sylvain Gelly's thesis, where he introduces it and explains how Mogo works:
http://www.lri.fr/~gelly/paper/SylvainGellyThesis.pdf
Code: Select all
/ \
/\ /\
Code: Select all
/ \
/\ /\
\
/
\
/
...
Code: Select all
/ \
/\ /\
\
Code: Select all
winrate := Wins/Visitsemeraldemon wrote:What Li Kao said. The depth changes slowly: each simulation adds 1 node to the tree (usually). So if the tree looks like this:
Code: Select all
O
/ \
(a1) (b1)Code: Select all
O
/ \
(a1) (b1)
(0/1)(1/1)Code: Select all
O
/ \
(a1) (b1)
(0/1)(1/1)
/ | \
a2 b2 c2Code: Select all
O
/ \
(a1) (b1)
(0/1)(3/4)
/ | \
a2 b2 c2
0 1 1Since the komi is usually half integral draws are not possible(at least with superko). But if you're using a game where there are draws you could count it as half a win.ChessGo wrote:Why winrate takes into account only wins?
I think the problem is that we have over simplified the description? What you have concluded would be the case if the algorithm at each node were to CERTAINLY take the path currently higher. But suppose that were a matter of probability? The greater the difference between the two paths the more likely the higher will be taken but a smaller (but non zero) chance remains that the lower will be tried. Then if as you suggest that (currently) lower path is "the only move" eventually its count will increase until it isn't the lower count.ChessGo wrote:
Sorry, I'm still not sure that UCT works good.
....................
So the program will explore b2 and c2 (starting from b1), but maybe a2 is the only refuting move. Maybe a1 is better, but the program will continue to explore b1, because utc of a1 is 0 and utc of b1 is more than 0.
You are correct. It contains two parts: Win probability and SQRT(ln...)Li Kao wrote:To counter that problem the UCT score has two parts. One is the winning probability, and one represents the which decreases as the node gets explored. And this term causes unexplored nodes to played. I think there exists a prove that MCTS gives the same results as normal minimax search when the number of playouts goes to infinity.
Adding a term for the error when choosing the move doesn't make sense. You don't want to play a move with an uncertain result. So winrate is definitely the better one of the two.ChessGo wrote: Ok, we played all simulations. Which move the program must choose? The one with maximim winrate (Wins/Visits) or the one with maximum winrate + SQRT?