ZBaduk - LeeLa Zero and KataGo from your webbrowser

spook · Post by **spook** » Tue Feb 25, 2020 8:23 am

y501 wrote:That looks amazing

Thanks

just the feedback I was hoping for!

On a minor note, it looks like handicap bot games are broken at the moment.
Sorry for the inconveniences, will look into this, this evening.

Maharani · Post by **Maharani** » Wed Feb 26, 2020 4:02 pm

Maharani wrote:Yet another question... sorry. How come the winrates displayed on the board for 7 komi are different (lower) than the winrates given in the chart? I don't remember this happening for 7.5 komi reviews.

EDIT: I think the reason is that the displayed-on-the-board values are an average of KataGo 7 komi New Zealand rules and Leela Zero 7.5 komi Chinese rules, except that the Leela Zero values are hidden in the chart?

Doesn't seem to be a "hidden Leela". It closely parallels the values given in the chart, but diverges more as the values range away from 50 %. Any idea?

https://i.ibb.co/5K98kQn/Screen-Shot-20 ... -33-PM.png

Additionally, you can see from the screenshot that ZBaduk occasionally adds an imaginary white move before the first black move of the game when I save. It doesn't happen every time I save, but maybe every other time.

Jæja · Post by **Jæja** » Mon Mar 09, 2020 4:04 am

@spook: what networks are being used by ZBaduk at the moment? I'd like to make a comparison of the number of playouts between ZBaduk and my GPU-less MacBook.

spook · Post by **spook** » Mon Mar 09, 2020 5:22 pm

I just installed an update of ZBaduk.

This release JUST contains a lot of bugfixes.
You can expect a release with new functionality probably on Friday.

Included:
- Fix for the bug of "white passes" at the start of a file.
- Some fixes for SGF parsing
- Edge webbrowser should be supported now.
- A fix for unexpected resigns at the end of a bot game.

I did actually remove the "Game Editor" tool,
which is really just a slimmed down version of the "Smart Review" tool.
I don't think anybody was using that, right?

spook · Post by **spook** » Mon Mar 09, 2020 5:31 pm

Jæja wrote:@spook: what networks are being used by ZBaduk at the moment? I'd like to make a comparison of the number of playouts between ZBaduk and my GPU-less MacBook.

1: It updates the leela zero network several times per day, and always uses the latest.
2: For KataGo it currently uses g104-b20c256-s447913472-d241840887.zip

ZBaduk also uses a caching mechanism for the first moves of the game. (the most popular positions) It has a cache for KataGo and one for Leela Zero (both only apply to 7.5 komi with chinese rules) So, for the first moves of the game ZBaduk will be very fast, because it just uses stored statistics.

However, this also has a slight disadvantage. The cache can be slightly outdated. It can still contain data of a previous network version. Statistics are only replaced when the number of playouts is exceeded.

spook · Post by **spook** » Mon Mar 09, 2020 5:52 pm

Maharani wrote:It closely parallels the values given in the chart, but diverges more as the values range away from 50 %. Any idea?

As for: why there is a deviation between the 2 charts, this is "by design".

Line colors:
- The blue series: are the evaluations of KataGo.
- A red series: would show the evaluations of Leela Zero.
- The gray series: is an evaluation by ZBaduk, which merges statistics of the other 2 bots.

What may be surprising, is that merging is not just an average.
There are some cases where averages give terrible results.

example:
- Move A = 58% according to bot1, but bot2 doesnt consider it. --> average = 58%
- Move B = 60% according to bot1, and for bot2 only 50% --> average = 55%
Both bots prefer move B, but still move A has a higher average.

So, the "all-bots decision" value slightly normalizes the values of "KataGo's decision",
before averaging them with Leela Zero estimations.
Does that make sense ?

But what I actually do think would be better:
--> if there are only KataGo statistics, the gray line shouldn't be shown in my opinion.
I'll see what I can do about that.

Jæja · Post by **Jæja** » Tue Mar 10, 2020 1:36 am

spook wrote:I don't think anybody was using that, right?

Not me!

spook wrote:2: For KataGo it currently uses g104-b20c256-s447913472-d241840887.zip

Do you plan to upgrade to networks with a larger block size, e.g. g170-b40c256x2-s1349368064-d524332537.zip? I can imagine the computational burden becomes too much for your backend at some point.

I see that bot analyses and variations are stored within saved games. Do you plan on allowing for these to be made public, e.g. by sharing a URL, like a Google Drive document? This would be absolutely amazing

Bill Spight · Post by **Bill Spight** » Tue Mar 10, 2020 3:30 am

spook wrote:Line colors:
- The blue series: are the evaluations of KataGo.
- A red series: would show the evaluations of Leela Zero.
- The gray series: is an evaluation by ZBaduk, which merges statistics of the other 2 bots.

What may be surprising, is that merging is not just an average.
There are some cases where averages give terrible results.

example:
- Move A = 58% according to bot1, but bot2 doesnt consider it. --> average = 58%
- Move B = 60% according to bot1, and for bot2 only 50% --> average = 55%
Both bots prefer move B, but still move A has a higher average.

So, the "all-bots decision" value slightly normalizes the values of "KataGo's decision",
before averaging them with Leela Zero estimations.
Does that make sense ?

Apples and oranges.

1) Bots are trained to win games, not to make accurate winrate estimates. Winrate estimates are never tested by playing positions out to see how often Black or White wins the game. That is why we do not have error estimates for winrates.

2) Winrates measure different things. KataGo's winrates assume that KataGo is playing against KataGo, LZ's winrates assume that LZ is playing against LZ. Apples and oranges. There is no such thing as an objective winrate except 0% or 100%. All winrate estimates make assumptions, and different bots make different assumptions. They are calculating different things.

3) Rollouts matter. We have greater confidence in winrates with more rollouts (whether visits or playouts are better indicators, I cannot say). However, because of how MCTS works, better plays tend to get more rollouts, so they are not simply an indicator of confidence, but also of how good a play is. In the Elf commentaries, Elf does not even report a winrate estimate based upon fewer than 500 rollouts.

In your example, bot1 assigns move A a winrate but bot2 does not. Therefore there is no average winrate for move A. Since bot2 gives move A 0 rollouts, it apparently does not think highly of move A. We might, therefore, assign move A a winrate estimate of 0 for bot2, but we know that in their search for the best move bots may not even consider some good moves, and when they are forced to consider them, they give them high winrate estimates. It would be unreasonable to assign it a winrate estimate of 0 for bot2, given the high estimate of bot1. But assigning move A a winrate for bot2 which is the same as the winrate for bot1 is also unreasonable.

The fact that sometimes averages give terrible results is a big clue. Maybe averaging is not a good idea. Even when it appears to be.

spook · Post by **spook** » Tue Mar 10, 2020 5:26 am

Bill Spight wrote: Apples and oranges.
The fact that sometimes averages give terrible results is a big clue. Maybe averaging is not a good idea. Even when it appears to be.

Merging statistics of 2 different bots is as difficult as making 2 dictators agree on something.
Perhaps we need to train an AI for it.

Because there is no silver bullet solution, ZBaduk tries to keep things transparent.
- By also showing the initial raw data,
- and by marking the best moves of each individual bot in bold.

I could make the behavior more configurable.
But really, I would just be putting the responsability to the user without a real solution.

---

By the way, Do you have a source for that "500 rollout" limit of ELF ?
ZBaduk has a thresshold, but it's only at 10 visits.
(which matches the visits-limit which KataGo uses for LCB calculation.)
Perhaps a thresshold which is relative to the total number of visits makes more sense though.
(e.g. if there are 10M playouts, then a 2000 visit limit seems more reasonable).

Bill Spight · Post by **Bill Spight** » Tue Mar 10, 2020 9:07 am

spook wrote:
Bill Spight wrote: Apples and oranges.
The fact that sometimes averages give terrible results is a big clue. Maybe averaging is not a good idea. Even when it appears to be.
Merging statistics of 2 different bots is as difficult as making 2 dictators agree on something.
Perhaps we need to train an AI for it.

Not a bad idea.

IIUC, a human chess player, using recommendations from more than one chess engine, can beat one of those chess engines. That suggests that combining the recommendations of go bots may be an easier task than playing go.

By the way, Do you have a source for that "500 rollout" limit of ELF ?

That would be me. To quote myself,

Moi wrote:{In the Elf commentaries on GoGoD games}: If the play in the game was Elf's top choice, they {the Elf team} indicated that, and sometimes added variations. If it was not Elf's top choice, they always included a variation with that choice, along with the Black winrate estimate and the number of playouts. The game play also has a winrate estimate and number of playouts, but the two may not be related. For instance, sometimes the game play was not on Elf's radar, and has 0 playouts. Well, you can't get a good winrate estimate from 0 playouts. Where does that estimate come from? Inspection reveals that it comes from the winrate estimate of Elf's reply to the game move. How confident can we be of that estimate? The number of playouts reflects the confidence we can place in the estimate. There is no general agreement as to how confident we can be with a certain number of playouts, but, for the purpose of analysis, I have my doubts about fewer than 10k playouts. With analysis I am not just interested in finding a good play, but in comparing different plays. a distinct task. With fewer than 100 playouts, Elf seems to take the winrate estimate from Elf's reply, just as it does with 0 playouts. With several hunddred playouts Elf takes the estimate from the move itself, not Elf's reply. I do not know the threshold number above which ELf does that.

( viewtopic.php?p=248628#p248628 Emphasis added later.)

With more data I was able to find that the threshold was 500 rollouts. With fewer rollouts for a play, Elf inherits the winrate estimate from Elf's reply to that play.

Edit: I found an update to the above.

viewtopic.php?p=248845#p248845

Maharani · Post by **Maharani** » Wed Mar 11, 2020 7:55 am

spook wrote:So, the "all-bots decision" value slightly normalizes the values of "KataGo's decision",
before averaging them with Leela Zero estimations.
Does that make sense ?

Yes and no: Does this mean that when only KataGo is used, the gray line is "slightly normalized" even though it is not averaged with Leela Zero?

spook · Post by **spook** » Wed Mar 11, 2020 4:20 pm

Maharani wrote:Yes and no: Does this mean that when only KataGo is used, the gray line is "slightly normalized" even though it is not averaged with Leela Zero?

That's exactly what it does.
(but I think the gray chart shouldn't actually be there if there's just 1 bot. I'll see what I can do about that)

In mean time, here's a preview of a column selection wizard to remove unwanted columns of the tables.
That should also make it cleaner on smaller devices. If all goes well, available by Friday.

: column selection.jpg (79.41 KiB) Viewed 12998 times

Maharani · Post by **Maharani** » Wed Mar 11, 2020 4:36 pm

Awesome!! Thanks for the continued updates

Is there possibly any way to delete white passes at the start of games saved before the bug fix without deleting the entire tree?

Jæja · Post by **Jæja** » Thu Mar 12, 2020 3:23 am

Jæja wrote:I see that bot analyses and variations are stored within saved games. Do you plan on allowing for these to be made public, e.g. by sharing a URL, like a Google Drive document? This would be absolutely amazing

@spook: I'm sorry for repeating myself, but I was wondering if you could share your thoughts about this?

spook · Post by **spook** » Fri Mar 13, 2020 4:20 am

Jæja wrote:I see that bot analyses and variations are stored within saved games. Do you plan on allowing for these to be made public, e.g. by sharing a URL, like a Google Drive document? This would be absolutely amazing

Not only do I think this is a great idea. I think it would help promote zbaduk a bit as well.

It would then also make sense to allow guests (i.e. visitors without accounts) to use the review tools. (perhaps in a read-only mode)

It's in my top prio list.

Life In 19x19

ZBaduk - LeeLa Zero and KataGo from your webbrowser

Re: ZBaduk - LeeLa Zero from your webbrowser

Re: ZBaduk - LeeLa Zero from your webbrowser

Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser

Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser

Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser

Re: ZBaduk - LeeLa Zero from your webbrowser

Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser

Re: ZBaduk - LeeLa Zero from your webbrowser

Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser

Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser

Re: ZBaduk - LeeLa Zero from your webbrowser

Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser

Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser

Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser

Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser