dfan wrote:Terminal positions in the tree search are actually evaluated by the game rules. However, the position is not considered terminal until both players have passed. The engines do know how to pass but I don't know how often it comes up in the tree search. Since they all use Chinese rules there is no penalty for continuing to play on for a while after the dame are filled.
As an illustration, here is a recent self play of LZ (network #166) with resign disabled :
dfan wrote:
AlphaGo did actual Monte Carlo simulations to the end of the game. AlphaGo Zero and all its descendants (AlphaZero, Leela Zero, ELF OpenGo, etc.) do not, despite continuing to use the (now misleading) term "Monte Carlo Tree Search".
Thanks. But Leela still uses Monte Carlo playouts, right? I noticed "MCWR" in Bojanic's analysis of the Metta-Ben David game, and figured that stood for Monte Carlo win rate. That's one reason for my confusion on that issue.
Leela and Crazy Stone (from 2016 on) are based on AlphaGo and perform Monte Carlo playouts to the end of the game.
Leela Zero and ELF OpenGo are based on AlphaGo Zero and only expand one tree node at a time.
I wish very much that everyone involved had chosen better names.
dfan wrote:
Leela Zero and ELF OpenGo are based on AlphaGo Zero and only expand one tree node at a time.
I wish very much that everyone involved had chosen better names.
As I understand it, the tree search is still probabilistic, since the nodes to expand are selected according to the probabilities given by the policy net. Therefore it is still a 'Monte Carlo' way of tree search?
dfan wrote:
Leela Zero and ELF OpenGo are based on AlphaGo Zero and only expand one tree node at a time.
As I understand it, the tree search is still probabilistic, since the nodes to expand are selected according to the probabilities given by the policy net. Therefore it is still a 'Monte Carlo' way of tree search?
Node expansion in AlphaGo Zero is deterministic. At each iteration it chooses the "best" node, which is chosen by a combination of the value network, values propagating up the tree from descendant nodes, the policy network, and a exploration factor (where nodes that have been visited less get a bonus). The search algorithm continues to move to the node with the highest score in a greedy manner until it hits a leaf.
dfan wrote:Node expansion in AlphaGo Zero is deterministic. At each iteration it chooses the "best" node, which is chosen by a combination of the value network, values propagating up the tree from descendant nodes, the policy network, and a exploration factor (where nodes that have been visited less get a bonus). The search algorithm continues to move to the node with the highest score in a greedy manner until it hits a leaf.
Best first search. Just like the old days.
The Adkins Principle: At some point, doesn't thinking have to go on?
— Winona Adkins