KataGo V1.3

Limeztone · Post by **Limeztone** » Sun Mar 01, 2020 4:45 am

lightvector, could you please shortly explain how KataGo uses maxPlayouts and maxVisits?
(I think there is a bit of confusion here)

jann · Post by **jann** » Sun Mar 01, 2020 7:42 am

inbae wrote:IMHO, benchmarks should be done in playout parity, not in visit parity.
...
Playout parity, on the other hand, is more appropriate for measuring strength of engines, since number of playouts is proportional to time spent.

The reason you test with fixed amount of search instead of fixed amount of time is to make the test independent of external factors like hw speed or code optimizations, and focus on network strength. With fixed playouts you reintroduce some such further factors, to reward the side with better tree reuse, and randomize the amount of effective search for each position. Such wider test can also be useful, but may not be always appropriate.

xela wrote:I think a lot of people tend to use "visits" and "playouts" interchangeably.

I think so too, and I doubt "visit" would necessarily mean tree reuse, and "playout" ignoring reuse. But LZ started to use them like this, so this is often implied (IIRC 1 playout = 1 actually performed simulation, 1 visit = 1 simulation whether from reuse or actually performed now).

If there's a difference, my understanding is that "one playout" is one round of exploring from the root to a leaf node, and one playout adds one visit to every node along the way, so that one playout = multiple visits.

Visits usually refer to the visit count of root node, so this is less relevant.

Uberdude · Post by **Uberdude** » Sun Mar 01, 2020 8:04 am

xela wrote:I think a lot of people tend to use "visits" and "playouts" interchangeably. (The Lizzie interface doesn't help, showing "playouts" and "visits/second" where both are measuring the same thing.)

If there's a difference, my understanding is that "one playout" is one round of exploring from the root to a leaf node, and one playout adds one visit to every node along the way, so that one playout = multiple visits. <snip/>

Xela, I think you've got this wrong. My understanding is playouts and visits (at least as the terms are used as "bot is configured at x visits/playouts per move") are both counting the same thing (one more leaf node in the tree of explored variations) but playouts are a delta per move, whilst visits are the total across tree reuse from previous moves. playouts <= visits. x playouts will increase visits by x, but visits can start at > 0 when playouts for that move is 0. Setting playouts = x means for each move add an extra x nodes to the tree and then play the best move, visits = x means keeping adding nodes to the tree (which could be non-empty if opponent played an expected move) until there are x and then play the best move. A worked example with playouts=4:

Move 1: Bot is black to play on empty board

Code: Select all

playout 1: B q4  
Variation tree (visits = 1):
    Empty board
    /
  B q4        

playout 2: B d4 
Variation tree (visits = 2):
    Empty board
    /     |
  B q4   B d4

playout 3: B q4 W d16 ie add w d16 as move 2 to existing node in tree of 1 B q4.
Variation tree (visits = 3):
    Empty board
    /     |
  B q4   B d4
   |
  W d16

playout 4: B q16 
Variation tree (visits = 4):
    Empty board
    /     |     \
  B q4   B d4   B q16
   |
  W d16

As playouts was set to 4 the bot stops exploring the tree, and picks the move with (probabilistic bias on) best averaged value from network (ie for B q4 it is an average of how good B q4 position is and B q4 W d16 position is), say it picks B d4.

Move 2, opponent human or another bot instance plays W q16.

Move 3. Is B d4 W q16 in the existing tree? No, so the search for move 3 starts with an empty tree, ie 0 initial visits.

Code: Select all

Initial position B d4 W q16
playout 1: B d16
Variation tree (visits = 1)
      B d4 W q16
    /
  B d16

playout 2: W q4 after B d16
Variation tree (visits = 2)
      B d4 W q16
    /
  B d16
    |
  W q4

playout 3: B d17
Variation tree (visits = 3)
      B d4 W q16
    /      |
  B d16   B d17
    |
  W q4

playout 4: B r17 after B d16 W q4 
Variation tree (visits = 4)
      B d4 W q16
    /      |
  B d16   B d17
    |
  W q4
    |
  B r17

In this example the bot has read relatively deeply down one line rather than broadly many choices.
In the Lizzie UI B d16 would should 3 playouts and B d17 would show 1, because there are 3 nodes in the tree starting at d16 and 1 from d17. Bot chooses B d16 for move 3.

Move 4. If opponent played W q3 for move 4, ie an unexplored one they for move 5 bot will be in a similar position to move 3 with no tree reuse. But let's say white does play q4 the previously explored move.

Move 5. Board position is B d4 W q16 B d16 W q4. Is this in the existing tree? Yes! So the tree starts with some visits (nodes) already in it:

Code: Select all

Initial tree before any playouts (visits = 1):  
   B d4 W q16 B d16 W q4
   /
  B r17

Playout 1: B o17
Variation tree (visits = 2):  NB we have 2 visits after 1 playout
   B d4 W q16 B d16 W q4
   /        |
  B r17   B o17

Playout 2: W r16 after B r17
Variation tree (visits = 3)
   B d4 W q16 B d16 W q4
   /        |
  B r17   B o17
   |
  W r16

Playout 3: B q17 after B r17 W r16
Variation tree (visits = 4)
   B d4 W q16 B d16 W q4
   /        |
  B r17   B o17
   |
  W r16
   |
  B q17

If the bot was configured with visits = 4 instead of playouts = 4, then the search would stop here now that 4 visits are reached even though it only did 3 playouts for this move (reusing 1 visit from before). But with playouts = 4:

Code: Select all

Playout 4: B r3
Variation tree (visits = 5)
   B d4 W q16 B d16 W q4
   /        |       \
  B r17   B o17    B r3
   |
  W r16
   |
  B q17

Bill Spight · Post by **Bill Spight** » Sun Mar 01, 2020 8:24 am

xela wrote:I think a lot of people tend to use "visits" and "playouts" interchangeably. (The Lizzie interface doesn't help, showing "playouts" and "visits/second" where both are measuring the same thing.)

If there's a difference, my understanding is that "one playout" is one round of exploring from the root to a leaf node, and one playout adds one visit to every node along the way, so that one playout = multiple visits.

Historically, i.e., a few years ago

, in MCTS playouts were made, not from the root, but from an unexpanded node, in order to estimate its winrate. Now, the standard is to use the value network to estimate the winrates of unexpanded nodes, so that usage has dropped out.

Bill Spight · Post by **Bill Spight** » Sun Mar 01, 2020 8:44 am

Uberdude wrote:
xela wrote:I think a lot of people tend to use "visits" and "playouts" interchangeably. (The Lizzie interface doesn't help, showing "playouts" and "visits/second" where both are measuring the same thing.)

If there's a difference, my understanding is that "one playout" is one round of exploring from the root to a leaf node, and one playout adds one visit to every node along the way, so that one playout = multiple visits. <snip/>
Xela, I think you've got this wrong. My understanding is playouts and visits (at least as the terms are used as "bot is configured at x visits/playouts per move") are both counting the same thing (one more leaf node in the tree of explored variations) but playouts are a delta per move, whilst visits are the total across tree reuse. playouts <= visits. x playouts will increase visits by x, but visits can start at > 0 when playouts for that move is 0. Setting playouts = x means for each move add an extra x nodes to the tree and then play the best move, visits = x means keeping adding nodes to the tree (which could be non-empty if opponent played an expected move) until there are x and then play the best move. A worked example with playouts=4:

Move 1: Bot is black to play on empty board
Code: Select all
playout 1: B q4  
Variation tree (visits = 1):
    Empty board
    /
  B q4        

playout 2: B d4 
Variation tree (visits = 2):
    Empty board
    /     |
  B q4   B d4

playout 3: B q4 W d16 ie add w d16 as move 2 to existing node in tree of 1 B q4.
Variation tree (visits = 3):
    Empty board
    /     |
  B q4   B d4
   |
  W d16

playout 4: B q16 
Variation tree (visits = 4):
    Empty board
    /     |     \
  B q4   B d4   B q16
   |
  W d16
As playouts was set to 4 the bot stops exploring the tree, and picks the move with (probabilistic bias on) best averaged value from network (ie for B q4 it is an average of how good B q4 position is and B q4 W d16 position is), say it picks B d4.

At the end, what about the node, B q4? Does it have 2 visits, because it has been visited twice, but only 1 playout, since only 1 playout has been made from it?

jann · Post by **jann** » Sun Mar 01, 2020 8:52 am

Bill Spight wrote:Historically, i.e., a few years ago , in MCTS playouts were made, not from the root, but from an unexpanded node, in order to estimate its winrate.

Those were often called rollouts (ie. to the end) instead.

inbae · Post by **inbae** » Sun Mar 01, 2020 9:51 am

@xela, I think Uberdude already has explained it in detail, so I will not confuse you with another set of technical explanations.

@jann, The fixed playouts vs visits issue has nothing to do with hardware or code optimization (maybe transposition can be an exception), so the only remaining factor is tree reuse, and that is precisely the reason why I am against fixed visits tests. Considering tree reuse is implemented in most of the engines, the two only major factors involved are policy sharpness and PUCT parameters. Since PUCT parameters affect fixed visits tests as well, the only remaining thing to be considered is the policy sharpness, which is a direct result of NN inference.

jann · Post by **jann** » Sun Mar 01, 2020 10:20 am

The more things you test at the same time (ie. network strength plus policy sharpness / tree reuse intensity) the harder to measure those things independently (same as with hw and other external factors).

This is no problem if you are sure that all those factors will work exactly the same way for later use as for the test (again same with hw - if possible it's best to test on time parity directly on the target hw). But in practice this is not always the case, thus testing all factors independently and as narrow as possible is a viable alternative.

inbae · Post by **inbae** » Sun Mar 01, 2020 10:37 am

jann wrote:The more things you test at the same time (ie. network strength plus policy sharpness / tree reuse intensity) the harder to measure those things independently (same as with hw and other external factors).

The policy sharpness and therefore tree reuse as well are strongly bound to the nature of the NN. I have no idea why you consider them as external factors.

Limeztone · Post by **Limeztone** » Sun Mar 01, 2020 10:53 am

jann wrote:
Bill Spight wrote:Historically, i.e., a few years ago , in MCTS playouts were made, not from the root, but from an unexpanded node, in order to estimate its winrate.
Those were often called rollouts (ie. to the end) instead.

I think some confuse playouts with rollouts which are somehow a different thing.

Why I asked lightvector is as I wanted to know if theres was some special consideration specifically for KataGo.

Normally I think limiting playouts limits the computing effort for each move made, while limiting visits limits the search space (which could be effected dramatically by the tree reuse) for each move made.

Comparing bots/nets at playout parity gives both bots the same computing power (excluding hardware differences) which seams a good idea to me.

The effect of limiting the search space instead is not so clear to me.

jann · Post by **jann** » Sun Mar 01, 2020 11:01 am

inbae wrote:The policy sharpness and therefore tree reuse as well are strongly bound to the nature of the NN. I have no idea why you consider them as external factors.

I wrote:

This is no problem if you are sure that all those factors will work exactly the same way for later use as for the test

For example, tree reuse may work quite differently for high-visit and low-visit scenarios (I'm not saying it necessarily will, but possible). Then test results that included tree reuse extent may become less relevant than narrower ones.

Limeztone wrote:The effect of limiting the search space instead is not so clear to me.

Like above, focus on less things and make results more robust and portable. But both narrower and wider tests have advantages and disadvantages (if you can test directly on target hw and conditions it's best to do just that, without synthetic limits).

lightvector · Post by **lightvector** » Sun Mar 01, 2020 11:12 am

Also one thing people sometimes forget:

Saying a fixed number of playouts you used per move is NOT enough to give a constant hardware-independent strength. You also have to specify how many threads you used to generate that many playouts.

Generally, holding playouts constant, increasing threads decreases strength. And also, the precise behavior of multithreading is hardware-dependent. So if you really want hardware-independence, technically you can only use 1 thread with fixed playouts.

inbae · Post by **inbae** » Sun Mar 01, 2020 12:10 pm

jann wrote:For example, tree reuse may work quite differently for high-visit and low-visit scenarios.

How?

jann wrote:Then test results that included tree reuse extent may become less relevant than narrower ones.

I'm not sure what you are meaning by "wide" and "narrow" here. And the search tree will be reused in fixed visits tests as well unless you somehow disable tree reuse explicitly.

jann · Post by **jann** » Sun Mar 01, 2020 12:39 pm

The problem is not tree reuse itself, but if the test results depend on tree reuse / its extent.

For example, if you clear the tree each move, fixed playout tests are heavily affected (the same amount of playouts / work will do less effective search) while fixed visit tests are less so (single threaded at least).

Another example is when you find an otherwise weaker side ahead, because of higher extent of tree reuse (thus effectively more but weaker search). Then repeat the test in a different visit/playout range, and find that these two factors are now less compensate each other, and now the other side comes out ahead.

But again, I'm not saying fixed visit tests (or narrower tests in general) are always better - advantages and disadvantages, as above.

Limeztone · Post by **Limeztone** » Sun Mar 01, 2020 3:58 pm

jann wrote:For example, if you clear the tree each move, fixed playout tests are heavily affected

As I understand visits vs playouts is that if you clear the tree for every move made, visits and playouts become the same.
If you don't have any tree reuse there is no difference in playouts and visits.

Life In 19x19

KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3