Monte Carlo (upper confidence bounds applied to trees)

ChessGo · Post by **ChessGo** » Fri Oct 22, 2010 9:41 am

Hello all!
I would much appreciate if someone could explain me what exactly "upper confidence bounds applied to trees" is.

I added Monte Carlo to my engine, and now it plays even worse

Li Kao · Post by **Li Kao** » Fri Oct 22, 2010 11:22 am

It's an expansion rule for monte carlo. You always need to decide from which point in the move tree you want to start the monte carlo playout.
You can use the node where WinningProbability+SomeConstant/Sqrt(PlayoutsOfThatNode) is maximal.
This tries to balance playing out the most promising node with exploring new untried regions of the tree. 1/Sqrt(PlayoutsOfThatNode) scales like the standard deviation. So the expression is something like ExpectancyValue+C*StandardDeviation, so the real value of the move is probably below the estimated value.
There are several papers out there. And I think SL has some articles too.

But MonteCarlo only works well if the moves in playouts aren't aren't evenly distributed, but "good looking" moves are played more often.

edit:
Those articles are a good starting point:
http://senseis.xmp.net/?MonteCarlo
http://senseis.xmp.net/?UCT

emeraldemon · Post by **emeraldemon** » Fri Oct 22, 2010 3:31 pm

The most comprehensive explanation is Sylvain Gelly's thesis, where he introduces it and explains how Mogo works:

http://www.lri.fr/~gelly/paper/SylvainGellyThesis.pdf

ChessGo · Post by **ChessGo** » Sat Oct 23, 2010 6:19 am

Li Kao wrote: Those articles are a good starting point:
http://senseis.xmp.net/?UCT

Thank you. I didn't understand what author of the article (Magnus Persson?) wanted to say. It seems that English isn't his native language. Or I'm not a smart guy

Code: Select all

The sum of the winrate and the number (UCTvalue) for each candidate move n is computed with

 UCTValue(parent, n) = winrate + sqrt((ln(parent.visits))/([b]5[/b]*n.nodevisits))

The question is why 5?

ChessGo · Post by **ChessGo** » Sat Oct 23, 2010 6:22 am

emeraldemon wrote:The most comprehensive explanation is Sylvain Gelly's thesis, where he introduces it and explains how Mogo works:

http://www.lri.fr/~gelly/paper/SylvainGellyThesis.pdf

Thanks a lot!

The questions still remain. Am I correct thinking that the program creates a tree search (with which depth, by the way) and then plays simulations for terminal nodes?

Li Kao · Post by **Li Kao** » Sat Oct 23, 2010 8:15 am

One simple variant of MCT works like this:
1) From the existing tree choose the node that has the maximum UCT score
2) Run a MC playout from that node on
3) Add the first move of the MC playout as a new childnode
repeat

Important for high playing strength is that the playout is guided by a good heuristic.

Another trick is that because go moves don't depend that much on order one can score the result of one playout not only for its first move but for other moves as well(see RAVE).

emeraldemon · Post by **emeraldemon** » Sat Oct 23, 2010 11:39 am

What Li Kao said. The depth changes slowly: each simulation adds 1 node to the tree (usually). So if the tree looks like this:

Code: Select all

 / \
/\ /\

And I run a simulation from the leftmost branch:

Code: Select all

 / \
/\ /\
\
/
\
/
...

I keep the first move of that simulation as the next node added to the tree:

Code: Select all

 / \
/\ /\
\

hope that helps, sorry I can't make the trees look right

ChessGo · Post by **ChessGo** » Mon Oct 25, 2010 9:11 pm

Why winrate takes into account only wins?

Code: Select all

winrate := Wins/Visits

What is better:

3 wins, 0 draws, 7 loses

or

2 wins, 6 draws, 2 loses

?

ChessGo · Post by **ChessGo** » Mon Oct 25, 2010 11:50 pm

emeraldemon wrote:What Li Kao said. The depth changes slowly: each simulation adds 1 node to the tree (usually). So if the tree looks like this:

Sorry, I'm still not sure that UCT works good.

For example, I have a position with two possible moves (a1 and b1):

Code: Select all

    O
  /   \
(a1) (b1)

First random game (using a1 as first move) returns 0, second (b1) returns 1:

Code: Select all

    O 
  /   \
(a1) (b1)
(0/1)(1/1)

Ok, let's try to explore b1 move (the program likes it, because uct of a1 = 0):

Code: Select all

    O 
  /   \
(a1) (b1)
(0/1)(1/1)
     / | \
   a2 b2 c2

a2 returns 0, b2 and c2 returns 1:

Code: Select all

    O 
  /   \
(a1) (b1)
(0/1)(3/4)
     / | \
   a2 b2 c2
   0   1  1

So the program will explore b2 and c2 (starting from b1), but maybe a2 is the only refuting move. Maybe a1 is better, but the program will continue to explore b1, because utc of a1 is 0 and utc of b1 is more than 0.

Li Kao · Post by **Li Kao** » Tue Oct 26, 2010 12:57 am

ChessGo wrote:Why winrate takes into account only wins?

Since the komi is usually half integral draws are not possible(at least with superko). But if you're using a game where there are draws you could count it as half a win.

Mike Novack · Post by **Mike Novack** » Tue Oct 26, 2010 6:22 am

ChessGo wrote:

Sorry, I'm still not sure that UCT works good.
....................
So the program will explore b2 and c2 (starting from b1), but maybe a2 is the only refuting move. Maybe a1 is better, but the program will continue to explore b1, because utc of a1 is 0 and utc of b1 is more than 0.

I think the problem is that we have over simplified the description? What you have concluded would be the case if the algorithm at each node were to CERTAINLY take the path currently higher. But suppose that were a matter of probability? The greater the difference between the two paths the more likely the higher will be taken but a smaller (but non zero) chance remains that the lower will be tried. Then if as you suggest that (currently) lower path is "the only move" eventually its count will increase until it isn't the lower count.

Given enough tries. These algorithms work only when the number of trials is very large. And the devil in the details, performance might very well depend on just how that diffence in count affects the probablity of which path taken.

Li Kao · Post by **Li Kao** » Tue Oct 26, 2010 7:04 am

To counter that problem the UCT score has two parts. One is the winning probability, and one represents the which decreases as the node gets explored. And this term causes unexplored nodes to played. I think there exists a prove that MCTS gives the same results as normal minimax search when the number of playouts goes to infinity.

ChessGo · Post by **ChessGo** » Tue Oct 26, 2010 11:36 am

Li Kao wrote:To counter that problem the UCT score has two parts. One is the winning probability, and one represents the which decreases as the node gets explored. And this term causes unexplored nodes to played. I think there exists a prove that MCTS gives the same results as normal minimax search when the number of playouts goes to infinity.

You are correct. It contains two parts: Win probability and SQRT(ln...)

Let me ask one more question please.

Ok, we played all simulations. Which move the program must choose? The one with maximim winrate (Wins/Visits) or the one with maximum winrate + SQRT?

Li Kao · Post by **Li Kao** » Tue Oct 26, 2010 11:46 am

ChessGo wrote: Ok, we played all simulations. Which move the program must choose? The one with maximim winrate (Wins/Visits) or the one with maximum winrate + SQRT?

Adding a term for the error when choosing the move doesn't make sense. You don't want to play a move with an uncertain result. So winrate is definitely the better one of the two.
It might even make sense to subtract something representing uncertainty so it prefers moves it has well analyzed. But since I've never actually written a MCTS program, I don't know about that.

Life In 19x19

Monte Carlo (upper confidence bounds applied to trees)

Monte Carlo (upper confidence bounds applied to trees)

Re: Monte Carlo (upper confidence bounds applied to trees)

Re: Monte Carlo (upper confidence bounds applied to trees)

Re: Monte Carlo (upper confidence bounds applied to trees)

Re: Monte Carlo (upper confidence bounds applied to trees)

Re: Monte Carlo (upper confidence bounds applied to trees)

Re: Monte Carlo (upper confidence bounds applied to trees)

Re: Monte Carlo (upper confidence bounds applied to trees)

Re: Monte Carlo (upper confidence bounds applied to trees)

Re: Monte Carlo (upper confidence bounds applied to trees)

Re: Monte Carlo (upper confidence bounds applied to trees)

Re: Monte Carlo (upper confidence bounds applied to trees)

Re: Monte Carlo (upper confidence bounds applied to trees)

Re: Monte Carlo (upper confidence bounds applied to trees)