KataGo V1.3

jann · Post by **jann** » Fri Feb 28, 2020 9:47 am

lightvector wrote:And how much longer "long" would need to be, or if in fact the 30 or 40 block nets have been trained enough to even have better scaling yet? Nobody has tested yet.

If there is significant strength difference (seen at equal visits), I'd be very surprised if the scaling effect wouldn't appear in high visit games (exponential policy benefit IMO).

go4thewin · Post by **go4thewin** » Fri Feb 28, 2020 1:42 pm

Any chance for a 15b bot trained on 30b games in the future? Maybe it would eventually get stronger than Elf2 at playout parity? Thanks for great bots!
Katago 1.3.3 s243 20b 1 playout vs gtp4zen zen6 7d : 4-0 . Wow
Katago 1.3.3 s243 20b 350 playout 1 thread vs lz 125 1 thread 4000 playout [9d amateur - beat a pro] : 3-1 two bots were dead even
leela white

350po3.sgf: (5 KiB) Downloaded 421 times

kata white

350po4.sgf: (5.42 KiB) Downloaded 453 times

lastly, 20b s243 vs 20b s191 both 16 po 1 thread engine 1.3.3: 4-2
They are pretty even on my machine. With continued extended training, it will be interesting to watch progress!

Limeztone · Post by **Limeztone** » Sat Feb 29, 2020 4:35 am

go4thewin wrote:Any chance for a 15b bot trained on 30b games in the future? Maybe it would eventually get stronger than Elf2 at playout parity?

What makes you think that KataGo with the current best 15 block net is not stronger than Elf v2 at playout parity? In my opinion it already is.

go4thewin · Post by **go4thewin** » Sat Feb 29, 2020 4:50 am

My tests showed them about even. I forgot how i tested, might not be right. cgos has 15b 100 playout even with kata 1.3.2 s191 50 playout, to give strength estimate. Leela zero did 15b extended trainingd with 40b net, it was effective for them. 15b extended training is one of their strongest nets. It takes time, though, the 30b bot has to get very strong first for the 15b to make any progress

Vargo · Post by **Vargo** » Sat Feb 29, 2020 9:36 am

Katago 1.3.3 , Networks :
20b : b20d4686.txt.gz
30b : b30d5259.bin.gz
40b : b40d5243.bin.gz

9x9 : 100 game tests at visits parity (1600 visits). Gogui-twogtp 1.5.1 , 1xGTX1080

Katago 1.3.3 20b v. Katago 1.3.3 30b
no error, 2 duplicate games,
Katago 1.3.3 30b wins 85-13 (86.7 %)

Katago 1.3.3 20b v. Katago 1.3.3 40b
no error, 14 duplicate games,
Katago 1.3.3 40b wins 69-17 (80.2 %)

For 9x9, at visits parity 30b and 40b seem much stronger than 20b

9x9 : 100 game tests at time parity. Gogui-twogtp 1.5.1 , 2xGTX 1080Ti (2s/move, corresponding to ~ 8000 visits for 20b, and to ~3000-3500 visits for 30b or 40b )

Katago 1.3.3 20b v. Katago 1.3.3 30b
no error, 3 duplicate games,
Katago 1.3.3 20b wins 55-42 (56.7 %)

Katago 1.3.3 20b v. Katago 1.3.3 40b
no error, 2 duplicate games,
Katago 1.3.3 20b wins 57-41 (58.1 %)

30b and 40b seem not too far from 20b at time parity.

Stats :

And · Post by **And** » Sat Feb 29, 2020 9:56 am

Vargo why didn't you use a stronger network g170e 20 block s2.43G?

And · Post by **And** » Sat Feb 29, 2020 10:01 am

Can anyone explain the meaning of the visits parity tests? I understand this for training networks, but for the user, what's the point?

Vargo · Post by **Vargo** » Sat Feb 29, 2020 10:10 am

And wrote:And why didn't you use a stronger network g170e 20 block s2.43G?

Maybe I'll try tomorrow with 20b s2.43G.

And · Post by **And** » Sat Feb 29, 2020 10:14 am

interesting to see 19x19! thanks for your tests!

go4thewin · Post by **go4thewin** » Sat Feb 29, 2020 11:39 am

And wrote:Can anyone explain the meaning of the visits parity tests? I understand this for training networks, but for the user, what's the point?

visits, especially with one thread, are reproducible on different hardware. Very good if you dont want the bot playing at its strongest. Like getting gtp4zen to play at 3 kyu instead of 3 dan. Thats different time per move on different hardware, but visits parameter is the same and reproducible on any hardware. I know exactly how many playouts katago needs to play at a pro level, but the time is different on dif hardware. similarly, i know how to get it to play at 4 dan ogs (1 playout). you can play against a set strength level, especially with nonzero bots trained on sgfs

And · Post by **And** » Sat Feb 29, 2020 1:06 pm

go4thewin This is clear https://github.com/breakwa11/GoAIRatings
what is the meaning of for example a match of a network of 20 blocks versus a network of 40 blocks visits parity? it is obvious that a network of 40 blocks is more powerful but uses much more time!

jann · Post by **jann** » Sat Feb 29, 2020 1:13 pm

With visit parity test you measure the strength difference between nets. A net that appears stronger at 500 visits will likely appear stronger at 1500 visits as well.

With time parity test you measure the intermixed strength and speed difference between nets, together with any hw or code speed difference. A setup that appears stronger at 10s/move could also appear weaker at 30s/move (or at different hw) because different network strengths scale differently with more visits (stronger nets tend to benefit more).

The advantage of knowing the strength difference and the speed difference as two independent values comes when you need to predict the result at a different hw and time setting (where you cannot test directly).

inbae · Post by **inbae** » Sat Feb 29, 2020 2:07 pm

IMHO, benchmarks should be done in playout parity, not in visit parity. While the fixed visits tests can represent the quality of analysis by engines in some controlled sense, this is not necessarily related to the real world strength. The number of visits is heavily dependent on the search tree reuse, and is influenced by policy sharpness. A visit parity test can be not very different from a playout parity test when two very similar engines (like LZ with two different networks) are playing against each other, but becomes dubious when two engines are very different (like LZ vs KG). Playout parity, on the other hand, is more appropriate for measuring strength of engines, since number of playouts is proportional to time spent.

xela · Post by **xela** » Sun Mar 01, 2020 12:50 am

I think a lot of people tend to use "visits" and "playouts" interchangeably. (The Lizzie interface doesn't help, showing "playouts" and "visits/second" where both are measuring the same thing.)

If there's a difference, my understanding is that "one playout" is one round of exploring from the root to a leaf node, and one playout adds one visit to every node along the way, so that one playout = multiple visits. With this definition, I'd expect visits per second to be more or less constant (ignoring tree reuse), and playouts per second to vary according to the tree depth (which is influenced by policy sharpness). A deep tree with little branching means that each playout requires a lot of visits, so that you get fewer playouts per second. A shallow tree with lots of branching will give you shorter branches on average, so more playouts per second. Tree reuse will affect both numbers.

You might notice that Lizzie on an empty board will give you large numbers of "visits/second" (actually playouts in my terminology here), but when you add a few stones, the "visits/second" drops.

inbae, am I using the words in the same way as you, or do you have different definitions?

Vargo · Post by **Vargo** » Sun Mar 01, 2020 4:15 am

100 game test 19x19 : KataGo 1.3.3 b20d52587 v. KataGo 1.3.3 b30d5259 at time parity

1s/move, corresponding to ~2000 visits for 20b and to ~800 visits for 30b, twogtp 1.5.1, no error, no duplicate game, all games by resignation.
b20 wins 67-33

Stats :

Life In 19x19

KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3

Re: KataGo V1.3