KataGo V1.3
Re: KataGo V1.3
I never said it SHOULD. There is nothing wrong with wider tests that are affected by more factors - time based tests even. But such results are inherently harder to interpret and less portable, and IF/WHEN you are unable to test on the exact conditions of later use, narrower tests that are affected by less factors and thus more consistent can be more informative with less danger of being misleading. Advantages and disadvantages.
-
inbae
- Dies in gote
- Posts: 25
- Joined: Tue Feb 04, 2020 11:07 am
- GD Posts: 0
- KGS: inbae
- Been thanked: 7 times
Re: KataGo V1.3
I think you are trying to oversimplify things. As I have said above, all of [network, playouts/visits, number of threads] should be considered at least, and given those, tests should be reproducible to an extent. Ultimately strength matters, and tests without such details or contexts are less likely to be informative for users.jann wrote:But such results are inherently harder to interpret and less portable, and IF/WHEN you are unable to test on the exact conditions of later use, narrower tests that are affected by less factors and thus more consistent can be more informative with less danger of being misleading. Advantages and disadvantages.
Re: KataGo V1.3
If you can test with the exact target conditions, by all means do so. If you can not, and need to speculate from tests performed on different conditions, you are better off with more consistent tests that only measure a specific factor each.
-
inbae
- Dies in gote
- Posts: 25
- Joined: Tue Feb 04, 2020 11:07 am
- GD Posts: 0
- KGS: inbae
- Been thanked: 7 times
Re: KataGo V1.3
I consider fixed playout tests more suitable for such purposes, since every engine will reuse the search tree and number of playouts is supposed to be more correlated to time limit than the number of visits is.
-
Limeztone
- Dies in gote
- Posts: 63
- Joined: Sun Jan 12, 2020 9:28 pm
- GD Posts: 0
- Has thanked: 8 times
- Been thanked: 4 times
Re: KataGo V1.3
What is random about it?jann wrote:Thus a big change for a playout based test (which was affected by a random search bonus without this).Limeztone wrote:As I understand visits vs playouts is that if you clear the tree for every move made, visits and playouts become the same.
Bonus compared to what?
Oh, thanks! Of courseMore search threads means weaker search (less freedom in which nodes to visit/expand).the same net with the same maxPlayouts could be different in strength depending on the number of threads (or executed on different hardware)
-
xela
- Lives in gote
- Posts: 652
- Joined: Sun Feb 09, 2014 4:46 am
- Rank: Australian 3 dan
- GD Posts: 200
- Location: Adelaide, South Australia
- Has thanked: 219 times
- Been thanked: 281 times
Re: KataGo V1.3
Thanks Uberdude for the very clear and detailed explanation! In the light of that, the other comments are making a lot more sense now :-) To summarise:
- "Visits" is often used to mean total number of visits to the root node.
- If analysing a single position in isolation, visits (in this sense) and playouts are the same thing.
- When playing a game, the convention is that "visits" includes tree reuse from previous moves but "playouts" doesn't.
- Limiting an engine to "n visits per move" means that if the visit count (including tree reuse) is less than n then you keep adding playouts until you get to n.
- Limiting an engine to "n playouts per move" means that you do n playouts every move in addition to any tree reuse.
- With tree reuse, visits/second is a higher number than playouts/second because you're counting the re-used visits again.
This one I'm still finding hard to imagine. Tree reuse happens when the opponent plays a move that you've already explored, so you can reuse that part of the tree. Tree reuse is maximised when the opponent plays the most explored move, which is often the move that you assess as best. So if the "weaker" net is getting a lot of tree reuse, that means the "stronger" net is consistently choosing the same move that the "weaker" net would have picked. It looks to me like the so-called "stronger" net in this scenario isn't actually that much stronger?jann wrote:Another example is when you find an otherwise weaker side ahead, because of higher extent of tree reuse (thus effectively more but weaker search).
Re: KataGo V1.3
The point here that the less of a branching factor a policy has, the narrower tree it builds in its memory, the more the potential for reuse. So if the weaker net only looks at 2 moves everywhere (vs, say, 3 for the stronger one), it may be weaker with blind spots, but it will benefit from tree reuse more.xela wrote:This one I'm still finding hard to imagine. Tree reuse happens when the opponent plays a move that you've already explored, so you can reuse that part of the tree. Tree reuse is maximised when the opponent plays the most explored move, which is often the move that you assess as best.jann wrote:Another example is when you find an otherwise weaker side ahead, because of higher extent of tree reuse (thus effectively more but weaker search).
As mentioned earlier, this may account for a, say, 1.5x search speed advantage. Then if you don't know this specifically, and only have the result of a test match at 1000 playouts, you are less likely to correctly predict the result of 10000 playouts match (your actual use case). Basically you are in same situation like if you did time based test on unknown hardware - there is an unclear speed related factor that affects your results in unknown ways and extent, and not necessarily the same way during test than during later usage (speed vs strength works very differently at low search than at high search).
The wider your test is, the more factors you allow to affect its result, the more tests you need to perform to get the same knowledge/confidence (because first you need to guess each individual factor from the results). Again this is for the case where your use case / conditon is significantly different and you cannot test on it directly (otherwise you don't need to know individual factors and are fine with a single test there, since you can be sure all factors will work the same way during test than during later usage).
Look above where we talked about the linked scalability graph. Same winner at 1000 visits as at 10000 visits = curves don't cross the 1.0 line. Unknown bonus from tree reuse = different winner at 1000 vs 2000 visits than at 10000 vs 20000 visits = some curves cross the 2.0 line etc.Limeztone wrote:How do you reach that conclusion?jann wrote:...whichever side wins at 1000 visits will likely also win at 10000 visits.
-
go4thewin
- Lives with ko
- Posts: 150
- Joined: Thu Jan 23, 2020 6:09 am
- Rank: 25 kyu
- GD Posts: 0
- Has thanked: 200 times
- Been thanked: 30 times
Re: KataGo V1.3
Will future versions have a way to set minimum playouts per move?
new 20b 800v against elf2 800v on cgos: 30-1 wow
http://www.yss-aya.com/cgos/19x19/cross ... 3v800.html
s191 had 75% win rate
new 20b 800v against elf2 800v on cgos: 30-1 wow
http://www.yss-aya.com/cgos/19x19/cross ... 3v800.html
s191 had 75% win rate
set katago to play at your level https://docdro.id/sHZU1ti or experiment with gtp4zen ( https://rb.gy/kx2ilb )
-
Vargo
- Lives in gote
- Posts: 337
- Joined: Sat Aug 17, 2013 5:28 am
- GD Posts: 0
- Has thanked: 22 times
- Been thanked: 97 times
Re: KataGo V1.3
50 game test
KataGo 1.3.3 (g170e-b20c256x2-s2430231552-d525879064) v. LZ017 (#268)
1600 visits for both (~ time parity)
Katago wins 37-13 (74%)
twogtp 1.5.1, no error, no duplicate game, all games by resignation.
Stats : (Katago always shows as W, because of the command -alternate)
KataGo 1.3.3 (g170e-b20c256x2-s2430231552-d525879064) v. LZ017 (#268)
1600 visits for both (~ time parity)
Katago wins 37-13 (74%)
twogtp 1.5.1, no error, no duplicate game, all games by resignation.
Stats : (Katago always shows as W, because of the command -alternate)
-
And
- Gosei
- Posts: 1464
- Joined: Tue Sep 25, 2018 10:28 am
- GD Posts: 0
- Has thanked: 212 times
- Been thanked: 215 times
Re: KataGo V1.3
Vargo games lost from the ladder, do you consider? are there any statistics, with the promotion LZ the number of games lost due to the ladder decreases? and how does playout affect it?
-
Vargo
- Lives in gote
- Posts: 337
- Joined: Sat Aug 17, 2013 5:28 am
- GD Posts: 0
- Has thanked: 22 times
- Been thanked: 97 times
Re: KataGo V1.3
No, I haven't looked at the games. If you want, you can see for yourself. KataGo is W in every even-numbered game, and the 13 games lost by KataGo are n° 1,12,15,16,20,21,23,24,25,32,34,41,47.And wrote:the ladder, do you consider?
-
splee99
- Dies with sente
- Posts: 101
- Joined: Thu Nov 15, 2012 9:46 pm
- Rank: KGS 2 D
- GD Posts: 0
- Has thanked: 2 times
- Been thanked: 16 times
Re: KataGo V1.3
When a bot knows ladder, many interesting things happen, like this one.
- Attachments
-
- kata-265.sgf
- (2.2 KiB) Downloaded 419 times
-
- Untitled-go.png (583.98 KiB) Viewed 10088 times
-
Vargo
- Lives in gote
- Posts: 337
- Joined: Sat Aug 17, 2013 5:28 am
- GD Posts: 0
- Has thanked: 22 times
- Been thanked: 97 times
Re: KataGo V1.3
Another 50 game test with the exact same commands
KataGo 1.3.3 (g170e-b20c256x2-s2430231552-d525879064) v. LZ017 (#268)
1600 visits for both (~ time parity)
Katago wins 34-16 = 68% ( Last time it was 37-13 = 74%)
twogtp 1.5.1, no error, no duplicate game, all games by resignation.
Stats : (Katago always shows as W, because of the command -alternate, so, B+R means Katago lost)The games :
KataGo 1.3.3 (g170e-b20c256x2-s2430231552-d525879064) v. LZ017 (#268)
1600 visits for both (~ time parity)
Katago wins 34-16 = 68% ( Last time it was 37-13 = 74%)
twogtp 1.5.1, no error, no duplicate game, all games by resignation.
Stats : (Katago always shows as W, because of the command -alternate, so, B+R means Katago lost)