KataGo V1.3
Re: KataGo V1.3
The reason you test with fixed amount of search instead of fixed amount of time is to make the test independent of external factors like hw speed or code optimizations, and focus on network strength. With fixed playouts you reintroduce some such further factors, to reward the side with better tree reuse, and randomize the amount of effective search for each position. Such wider test can also be useful, but may not be always appropriate.inbae wrote:IMHO, benchmarks should be done in playout parity, not in visit parity.
...
Playout parity, on the other hand, is more appropriate for measuring strength of engines, since number of playouts is proportional to time spent.
I think so too, and I doubt "visit" would necessarily mean tree reuse, and "playout" ignoring reuse. But LZ started to use them like this, so this is often implied (IIRC 1 playout = 1 actually performed simulation, 1 visit = 1 simulation whether from reuse or actually performed now).xela wrote:I think a lot of people tend to use "visits" and "playouts" interchangeably.
Visits usually refer to the visit count of root node, so this is less relevant.If there's a difference, my understanding is that "one playout" is one round of exploring from the root to a leaf node, and one playout adds one visit to every node along the way, so that one playout = multiple visits.
-
Uberdude
- Judan
- Posts: 6727
- Joined: Thu Nov 24, 2011 11:35 am
- Rank: UK 4 dan
- GD Posts: 0
- KGS: Uberdude 4d
- OGS: Uberdude 7d
- Location: Cambridge, UK
- Has thanked: 436 times
- Been thanked: 3718 times
Re: KataGo V1.3
Xela, I think you've got this wrong. My understanding is playouts and visits (at least as the terms are used as "bot is configured at x visits/playouts per move") are both counting the same thing (one more leaf node in the tree of explored variations) but playouts are a delta per move, whilst visits are the total across tree reuse from previous moves. playouts <= visits. x playouts will increase visits by x, but visits can start at > 0 when playouts for that move is 0. Setting playouts = x means for each move add an extra x nodes to the tree and then play the best move, visits = x means keeping adding nodes to the tree (which could be non-empty if opponent played an expected move) until there are x and then play the best move. A worked example with playouts=4:xela wrote:I think a lot of people tend to use "visits" and "playouts" interchangeably. (The Lizzie interface doesn't help, showing "playouts" and "visits/second" where both are measuring the same thing.)
If there's a difference, my understanding is that "one playout" is one round of exploring from the root to a leaf node, and one playout adds one visit to every node along the way, so that one playout = multiple visits. <snip/>
Move 1: Bot is black to play on empty board
Code: Select all
playout 1: B q4
Variation tree (visits = 1):
Empty board
/
B q4
playout 2: B d4
Variation tree (visits = 2):
Empty board
/ |
B q4 B d4
playout 3: B q4 W d16 ie add w d16 as move 2 to existing node in tree of 1 B q4.
Variation tree (visits = 3):
Empty board
/ |
B q4 B d4
|
W d16
playout 4: B q16
Variation tree (visits = 4):
Empty board
/ | \
B q4 B d4 B q16
|
W d16
Move 2, opponent human or another bot instance plays W q16.
Move 3. Is B d4 W q16 in the existing tree? No, so the search for move 3 starts with an empty tree, ie 0 initial visits.
Code: Select all
Initial position B d4 W q16
playout 1: B d16
Variation tree (visits = 1)
B d4 W q16
/
B d16
playout 2: W q4 after B d16
Variation tree (visits = 2)
B d4 W q16
/
B d16
|
W q4
playout 3: B d17
Variation tree (visits = 3)
B d4 W q16
/ |
B d16 B d17
|
W q4
playout 4: B r17 after B d16 W q4
Variation tree (visits = 4)
B d4 W q16
/ |
B d16 B d17
|
W q4
|
B r17
In the Lizzie UI B d16 would should 3 playouts and B d17 would show 1, because there are 3 nodes in the tree starting at d16 and 1 from d17. Bot chooses B d16 for move 3.
Move 4. If opponent played W q3 for move 4, ie an unexplored one they for move 5 bot will be in a similar position to move 3 with no tree reuse. But let's say white does play q4 the previously explored move.
Move 5. Board position is B d4 W q16 B d16 W q4. Is this in the existing tree? Yes! So the tree starts with some visits (nodes) already in it:
Code: Select all
Initial tree before any playouts (visits = 1):
B d4 W q16 B d16 W q4
/
B r17
Playout 1: B o17
Variation tree (visits = 2): NB we have 2 visits after 1 playout
B d4 W q16 B d16 W q4
/ |
B r17 B o17
Playout 2: W r16 after B r17
Variation tree (visits = 3)
B d4 W q16 B d16 W q4
/ |
B r17 B o17
|
W r16
Playout 3: B q17 after B r17 W r16
Variation tree (visits = 4)
B d4 W q16 B d16 W q4
/ |
B r17 B o17
|
W r16
|
B q17
Code: Select all
Playout 4: B r3
Variation tree (visits = 5)
B d4 W q16 B d16 W q4
/ | \
B r17 B o17 B r3
|
W r16
|
B q17
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: KataGo V1.3
Historically, i.e., a few years agoxela wrote:I think a lot of people tend to use "visits" and "playouts" interchangeably. (The Lizzie interface doesn't help, showing "playouts" and "visits/second" where both are measuring the same thing.)
If there's a difference, my understanding is that "one playout" is one round of exploring from the root to a leaf node, and one playout adds one visit to every node along the way, so that one playout = multiple visits.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: KataGo V1.3
At the end, what about the node, B q4? Does it have 2 visits, because it has been visited twice, but only 1 playout, since only 1 playout has been made from it?Uberdude wrote:Xela, I think you've got this wrong. My understanding is playouts and visits (at least as the terms are used as "bot is configured at x visits/playouts per move") are both counting the same thing (one more leaf node in the tree of explored variations) but playouts are a delta per move, whilst visits are the total across tree reuse. playouts <= visits. x playouts will increase visits by x, but visits can start at > 0 when playouts for that move is 0. Setting playouts = x means for each move add an extra x nodes to the tree and then play the best move, visits = x means keeping adding nodes to the tree (which could be non-empty if opponent played an expected move) until there are x and then play the best move. A worked example with playouts=4:xela wrote:I think a lot of people tend to use "visits" and "playouts" interchangeably. (The Lizzie interface doesn't help, showing "playouts" and "visits/second" where both are measuring the same thing.)
If there's a difference, my understanding is that "one playout" is one round of exploring from the root to a leaf node, and one playout adds one visit to every node along the way, so that one playout = multiple visits. <snip/>
Move 1: Bot is black to play on empty boardAs playouts was set to 4 the bot stops exploring the tree, and picks the move with (probabilistic bias on) best averaged value from network (ie for B q4 it is an average of how good B q4 position is and B q4 W d16 position is), say it picks B d4.Code: Select all
playout 1: B q4 Variation tree (visits = 1): Empty board / B q4 playout 2: B d4 Variation tree (visits = 2): Empty board / | B q4 B d4 playout 3: B q4 W d16 ie add w d16 as move 2 to existing node in tree of 1 B q4. Variation tree (visits = 3): Empty board / | B q4 B d4 | W d16 playout 4: B q16 Variation tree (visits = 4): Empty board / | \ B q4 B d4 B q16 | W d16
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
Re: KataGo V1.3
Those were often called rollouts (ie. to the end) instead.Bill Spight wrote:Historically, i.e., a few years ago, in MCTS playouts were made, not from the root, but from an unexpanded node, in order to estimate its winrate.
-
inbae
- Dies in gote
- Posts: 25
- Joined: Tue Feb 04, 2020 11:07 am
- GD Posts: 0
- KGS: inbae
- Been thanked: 7 times
Re: KataGo V1.3
@xela, I think Uberdude already has explained it in detail, so I will not confuse you with another set of technical explanations.
@jann, The fixed playouts vs visits issue has nothing to do with hardware or code optimization (maybe transposition can be an exception), so the only remaining factor is tree reuse, and that is precisely the reason why I am against fixed visits tests. Considering tree reuse is implemented in most of the engines, the two only major factors involved are policy sharpness and PUCT parameters. Since PUCT parameters affect fixed visits tests as well, the only remaining thing to be considered is the policy sharpness, which is a direct result of NN inference.
@jann, The fixed playouts vs visits issue has nothing to do with hardware or code optimization (maybe transposition can be an exception), so the only remaining factor is tree reuse, and that is precisely the reason why I am against fixed visits tests. Considering tree reuse is implemented in most of the engines, the two only major factors involved are policy sharpness and PUCT parameters. Since PUCT parameters affect fixed visits tests as well, the only remaining thing to be considered is the policy sharpness, which is a direct result of NN inference.
Re: KataGo V1.3
The more things you test at the same time (ie. network strength plus policy sharpness / tree reuse intensity) the harder to measure those things independently (same as with hw and other external factors).
This is no problem if you are sure that all those factors will work exactly the same way for later use as for the test (again same with hw - if possible it's best to test on time parity directly on the target hw). But in practice this is not always the case, thus testing all factors independently and as narrow as possible is a viable alternative.
This is no problem if you are sure that all those factors will work exactly the same way for later use as for the test (again same with hw - if possible it's best to test on time parity directly on the target hw). But in practice this is not always the case, thus testing all factors independently and as narrow as possible is a viable alternative.
-
inbae
- Dies in gote
- Posts: 25
- Joined: Tue Feb 04, 2020 11:07 am
- GD Posts: 0
- KGS: inbae
- Been thanked: 7 times
Re: KataGo V1.3
The policy sharpness and therefore tree reuse as well are strongly bound to the nature of the NN. I have no idea why you consider them as external factors.jann wrote:The more things you test at the same time (ie. network strength plus policy sharpness / tree reuse intensity) the harder to measure those things independently (same as with hw and other external factors).
-
Limeztone
- Dies in gote
- Posts: 63
- Joined: Sun Jan 12, 2020 9:28 pm
- GD Posts: 0
- Has thanked: 8 times
- Been thanked: 4 times
Re: KataGo V1.3
I think some confuse playouts with rollouts which are somehow a different thing.jann wrote:Those were often called rollouts (ie. to the end) instead.Bill Spight wrote:Historically, i.e., a few years ago, in MCTS playouts were made, not from the root, but from an unexpanded node, in order to estimate its winrate.
Why I asked lightvector is as I wanted to know if theres was some special consideration specifically for KataGo.
Normally I think limiting playouts limits the computing effort for each move made, while limiting visits limits the search space (which could be effected dramatically by the tree reuse) for each move made.
Comparing bots/nets at playout parity gives both bots the same computing power (excluding hardware differences) which seams a good idea to me.
The effect of limiting the search space instead is not so clear to me.
Re: KataGo V1.3
I wrote:inbae wrote:The policy sharpness and therefore tree reuse as well are strongly bound to the nature of the NN. I have no idea why you consider them as external factors.
For example, tree reuse may work quite differently for high-visit and low-visit scenarios (I'm not saying it necessarily will, but possible). Then test results that included tree reuse extent may become less relevant than narrower ones.This is no problem if you are sure that all those factors will work exactly the same way for later use as for the test
Like above, focus on less things and make results more robust and portable. But both narrower and wider tests have advantages and disadvantages (if you can test directly on target hw and conditions it's best to do just that, without synthetic limits).Limeztone wrote:The effect of limiting the search space instead is not so clear to me.
-
lightvector
- Lives in sente
- Posts: 759
- Joined: Sat Jun 19, 2010 10:11 pm
- Rank: maybe 2d
- GD Posts: 0
- Has thanked: 114 times
- Been thanked: 916 times
Re: KataGo V1.3
Also one thing people sometimes forget:
Saying a fixed number of playouts you used per move is NOT enough to give a constant hardware-independent strength. You also have to specify how many threads you used to generate that many playouts.
Generally, holding playouts constant, increasing threads decreases strength. And also, the precise behavior of multithreading is hardware-dependent. So if you really want hardware-independence, technically you can only use 1 thread with fixed playouts.
Saying a fixed number of playouts you used per move is NOT enough to give a constant hardware-independent strength. You also have to specify how many threads you used to generate that many playouts.
Generally, holding playouts constant, increasing threads decreases strength. And also, the precise behavior of multithreading is hardware-dependent. So if you really want hardware-independence, technically you can only use 1 thread with fixed playouts.
-
inbae
- Dies in gote
- Posts: 25
- Joined: Tue Feb 04, 2020 11:07 am
- GD Posts: 0
- KGS: inbae
- Been thanked: 7 times
Re: KataGo V1.3
How?jann wrote:For example, tree reuse may work quite differently for high-visit and low-visit scenarios.
I'm not sure what you are meaning by "wide" and "narrow" here. And the search tree will be reused in fixed visits tests as well unless you somehow disable tree reuse explicitly.jann wrote:Then test results that included tree reuse extent may become less relevant than narrower ones.
Re: KataGo V1.3
The problem is not tree reuse itself, but if the test results depend on tree reuse / its extent.
For example, if you clear the tree each move, fixed playout tests are heavily affected (the same amount of playouts / work will do less effective search) while fixed visit tests are less so (single threaded at least).
Another example is when you find an otherwise weaker side ahead, because of higher extent of tree reuse (thus effectively more but weaker search). Then repeat the test in a different visit/playout range, and find that these two factors are now less compensate each other, and now the other side comes out ahead.
But again, I'm not saying fixed visit tests (or narrower tests in general) are always better - advantages and disadvantages, as above.
For example, if you clear the tree each move, fixed playout tests are heavily affected (the same amount of playouts / work will do less effective search) while fixed visit tests are less so (single threaded at least).
Another example is when you find an otherwise weaker side ahead, because of higher extent of tree reuse (thus effectively more but weaker search). Then repeat the test in a different visit/playout range, and find that these two factors are now less compensate each other, and now the other side comes out ahead.
But again, I'm not saying fixed visit tests (or narrower tests in general) are always better - advantages and disadvantages, as above.
-
Limeztone
- Dies in gote
- Posts: 63
- Joined: Sun Jan 12, 2020 9:28 pm
- GD Posts: 0
- Has thanked: 8 times
- Been thanked: 4 times
Re: KataGo V1.3
As I understand visits vs playouts is that if you clear the tree for every move made, visits and playouts become the same.jann wrote:For example, if you clear the tree each move, fixed playout tests are heavily affected
If you don't have any tree reuse there is no difference in playouts and visits.
Last edited by Limeztone on Sun Mar 01, 2020 4:28 pm, edited 1 time in total.