Life In 19x19 :: ZBaduk - LeeLa Zero and KataGo from your webbrowser

Yet another question... sorry. How come the winrates displayed on the board for 7 komi are different (lower) than the winrates given in the chart? I don't remember this happening for 7.5 komi reviews.

EDIT: I think the reason is that the displayed-on-the-board values are an average of KataGo 7 komi New Zealand rules and Leela Zero 7.5 komi Chinese rules, except that the Leela Zero values are hidden in the chart?

@spook: what networks are being used by ZBaduk at the moment? I'd like to make a comparison of the number of playouts between ZBaduk and my GPU-less MacBook.

It closely parallels the values given in the chart, but diverges more as the values range away from 50 %. Any idea?

I don't think anybody was using that, right?

2: For KataGo it currently uses g104-b20c256-s447913472-d241840887.zip

Line colors:
- The blue series: are the evaluations of KataGo.
- A red series: would show the evaluations of Leela Zero.
- The gray series: is an evaluation by ZBaduk, which merges statistics of the other 2 bots.

What may be surprising, is that merging is not just an average.
There are some cases where averages give terrible results. :blackeye:

example:
- Move A = 58% according to bot1, but bot2 doesnt consider it. --> average = 58%
- Move B = 60% according to bot1, and for bot2 only 50% --> average = 55%
Both bots prefer move B, but still move A has a higher average.

So, the "all-bots decision" value slightly normalizes the values of "KataGo's decision",
before averaging them with Leela Zero estimations.
Does that make sense ?

Apples and oranges.
The fact that sometimes averages give terrible results is a big clue. Maybe averaging is not a good idea. Even when it appears to be.

Bill Spight wrote:

Apples and oranges.
The fact that sometimes averages give terrible results is a big clue. Maybe averaging is not a good idea. Even when it appears to be.

Merging statistics of 2 different bots is as difficult as making 2 dictators agree on something.
Perhaps we need to train an AI for it. :scratch:

By the way, Do you have a source for that "500 rollout" limit of ELF ?

{In the Elf commentaries on GoGoD games}: If the play in the game was Elf's top choice, they {the Elf team} indicated that, and sometimes added variations. If it was not Elf's top choice, they always included a variation with that choice, along with the Black winrate estimate and the number of playouts. The game play also has a winrate estimate and number of playouts, but the two may not be related. For instance, sometimes the game play was not on Elf's radar, and has 0 playouts. Well, you can't get a good winrate estimate from 0 playouts. Where does that estimate come from? Inspection reveals that it comes from the winrate estimate of Elf's reply to the game move. How confident can we be of that estimate? The number of playouts reflects the confidence we can place in the estimate. There is no general agreement as to how confident we can be with a certain number of playouts, but, for the purpose of analysis, I have my doubts about fewer than 10k playouts. With analysis I am not just interested in finding a good play, but in comparing different plays. a distinct task. With fewer than 100 playouts, Elf seems to take the winrate estimate from Elf's reply, just as it does with 0 playouts. With several hunddred playouts Elf takes the estimate from the move itself, not Elf's reply. I do not know the threshold number above which ELf does that.

So, the "all-bots decision" value slightly normalizes the values of "KataGo's decision",
before averaging them with Leela Zero estimations.
Does that make sense ?

Yes and no: Does this mean that when only KataGo is used, the gray line is "slightly normalized" even though it is not averaged with Leela Zero?

I see that bot analyses and variations are stored within saved games. Do you plan on allowing for these to be made public, e.g. by sharing a URL, like a Google Drive document? This would be absolutely amazing :bow:

The fact that sometimes averages give terrible results is a big clue. Maybe averaging is not a good idea. Even when it appears to be.

Merging statistics of 2 different bots is as difficult as making 2 dictators agree on something.
Perhaps we need to train an AI for it. :scratch:

Jæja wrote:

Not only do I think this is a great idea. I think it would help promote zbaduk a bit as well.

It would then also make sense to allow guests (i.e. visitors without accounts) to use the review tools. (perhaps in a read-only mode)

It's in my top prio list.

EDIT: As of a few hours ago, ZBaduk does not display "winrate" in the chart anymore, or only "decision"...?
EDIT 2: Nevermind, this just means you've already made it customizable! Awesome, thanks!!

Author:	spook [ Tue Feb 25, 2020 8:23 am ]
Post subject:	Re: ZBaduk - LeeLa Zero from your webbrowser
y501 wrote: That looks amazing Thanks just the feedback I was hoping for! On a minor note, it looks like handicap bot games are broken at the moment. Sorry for the inconveniences, will look into this, this evening.

Author:	Maharani [ Wed Feb 26, 2020 4:02 pm ]
Post subject:	Re: ZBaduk - LeeLa Zero from your webbrowser
Maharani wrote: Yet another question... sorry. How come the winrates displayed on the board for 7 komi are different (lower) than the winrates given in the chart? I don't remember this happening for 7.5 komi reviews. EDIT: I think the reason is that the displayed-on-the-board values are an average of KataGo 7 komi New Zealand rules and Leela Zero 7.5 komi Chinese rules, except that the Leela Zero values are hidden in the chart? Doesn't seem to be a "hidden Leela". It closely parallels the values given in the chart, but diverges more as the values range away from 50 %. Any idea? https://i.ibb.co/5K98kQn/Screen-Shot-20 ... -33-PM.png Additionally, you can see from the screenshot that ZBaduk occasionally adds an imaginary white move before the first black move of the game when I save. It doesn't happen every time I save, but maybe every other time.

Author:	Jæja [ Mon Mar 09, 2020 4:04 am ]
Post subject:	Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser
@spook: what networks are being used by ZBaduk at the moment? I'd like to make a comparison of the number of playouts between ZBaduk and my GPU-less MacBook.

Author:	spook [ Mon Mar 09, 2020 5:22 pm ]
Post subject:	Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser
I just installed an update of ZBaduk. This release JUST contains a lot of bugfixes. You can expect a release with new functionality probably on Friday. Included: - Fix for the bug of "white passes" at the start of a file. - Some fixes for SGF parsing - Edge webbrowser should be supported now. - A fix for unexpected resigns at the end of a bot game. I did actually remove the "Game Editor" tool, which is really just a slimmed down version of the "Smart Review" tool. I don't think anybody was using that, right?

Author:	spook [ Mon Mar 09, 2020 5:31 pm ]
Post subject:	Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser
Jæja wrote: @spook: what networks are being used by ZBaduk at the moment? I'd like to make a comparison of the number of playouts between ZBaduk and my GPU-less MacBook. 1: It updates the leela zero network several times per day, and always uses the latest. 2: For KataGo it currently uses g104-b20c256-s447913472-d241840887.zip ZBaduk also uses a caching mechanism for the first moves of the game. (the most popular positions) It has a cache for KataGo and one for Leela Zero (both only apply to 7.5 komi with chinese rules) So, for the first moves of the game ZBaduk will be very fast, because it just uses stored statistics. However, this also has a slight disadvantage. The cache can be slightly outdated. It can still contain data of a previous network version. Statistics are only replaced when the number of playouts is exceeded.

Life In 19x19 http://lifein19x19.com/

ZBaduk - LeeLa Zero and KataGo from your webbrowser http://lifein19x19.com/viewtopic.php?f=9&t=16563	Page 7 of 12

Author:	spook [ Mon Mar 09, 2020 5:52 pm ]
Post subject:	Re: ZBaduk - LeeLa Zero from your webbrowser
Maharani wrote: It closely parallels the values given in the chart, but diverges more as the values range away from 50 %. Any idea? As for: why there is a deviation between the 2 charts, this is "by design". Line colors: - The blue series: are the evaluations of KataGo. - A red series: would show the evaluations of Leela Zero. - The gray series: is an evaluation by ZBaduk, which merges statistics of the other 2 bots. What may be surprising, is that merging is not just an average. There are some cases where averages give terrible results. example: - Move A = 58% according to bot1, but bot2 doesnt consider it. --> average = 58% - Move B = 60% according to bot1, and for bot2 only 50% --> average = 55% Both bots prefer move B, but still move A has a higher average. So, the "all-bots decision" value slightly normalizes the values of "KataGo's decision", before averaging them with Leela Zero estimations. Does that make sense ? But what I actually do think would be better: --> if there are only KataGo statistics, the gray line shouldn't be shown in my opinion. I'll see what I can do about that.

Author:	Jæja [ Tue Mar 10, 2020 1:36 am ]
Post subject:	Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser
spook wrote: I don't think anybody was using that, right? Not me! spook wrote: 2: For KataGo it currently uses g104-b20c256-s447913472-d241840887.zip Do you plan to upgrade to networks with a larger block size, e.g. g170-b40c256x2-s1349368064-d524332537.zip? I can imagine the computational burden becomes too much for your backend at some point. I see that bot analyses and variations are stored within saved games. Do you plan on allowing for these to be made public, e.g. by sharing a URL, like a Google Drive document? This would be absolutely amazing

Author:	Bill Spight [ Tue Mar 10, 2020 3:30 am ]
Post subject:	Re: ZBaduk - LeeLa Zero from your webbrowser
spook wrote: Line colors: - The blue series: are the evaluations of KataGo. - A red series: would show the evaluations of Leela Zero. - The gray series: is an evaluation by ZBaduk, which merges statistics of the other 2 bots. What may be surprising, is that merging is not just an average. There are some cases where averages give terrible results. example: - Move A = 58% according to bot1, but bot2 doesnt consider it. --> average = 58% - Move B = 60% according to bot1, and for bot2 only 50% --> average = 55% Both bots prefer move B, but still move A has a higher average. So, the "all-bots decision" value slightly normalizes the values of "KataGo's decision", before averaging them with Leela Zero estimations. Does that make sense ? Apples and oranges. 1) Bots are trained to win games, not to make accurate winrate estimates. Winrate estimates are never tested by playing positions out to see how often Black or White wins the game. That is why we do not have error estimates for winrates. 2) Winrates measure different things. KataGo's winrates assume that KataGo is playing against KataGo, LZ's winrates assume that LZ is playing against LZ. Apples and oranges. There is no such thing as an objective winrate except 0% or 100%. All winrate estimates make assumptions, and different bots make different assumptions. They are calculating different things. 3) Rollouts matter. We have greater confidence in winrates with more rollouts (whether visits or playouts are better indicators, I cannot say). However, because of how MCTS works, better plays tend to get more rollouts, so they are not simply an indicator of confidence, but also of how good a play is. In the Elf commentaries, Elf does not even report a winrate estimate based upon fewer than 500 rollouts. In your example, bot1 assigns move A a winrate but bot2 does not. Therefore there is no average winrate for move A. Since bot2 gives move A 0 rollouts, it apparently does not think highly of move A. We might, therefore, assign move A a winrate estimate of 0 for bot2, but we know that in their search for the best move bots may not even consider some good moves, and when they are forced to consider them, they give them high winrate estimates. It would be unreasonable to assign it a winrate estimate of 0 for bot2, given the high estimate of bot1. But assigning move A a winrate for bot2 which is the same as the winrate for bot1 is also unreasonable. The fact that sometimes averages give terrible results is a big clue. Maybe averaging is not a good idea. Even when it appears to be.

Author:	spook [ Tue Mar 10, 2020 5:26 am ]
Post subject:	Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser
Bill Spight wrote: Apples and oranges. The fact that sometimes averages give terrible results is a big clue. Maybe averaging is not a good idea. Even when it appears to be. Merging statistics of 2 different bots is as difficult as making 2 dictators agree on something. Perhaps we need to train an AI for it. Because there is no silver bullet solution, ZBaduk tries to keep things transparent. - By also showing the initial raw data, - and by marking the best moves of each individual bot in bold. I could make the behavior more configurable. But really, I would just be putting the responsability to the user without a real solution. --- By the way, Do you have a source for that "500 rollout" limit of ELF ? ZBaduk has a thresshold, but it's only at 10 visits. (which matches the visits-limit which KataGo uses for LCB calculation.) Perhaps a thresshold which is relative to the total number of visits makes more sense though. (e.g. if there are 10M playouts, then a 2000 visit limit seems more reasonable).

Author:	Bill Spight [ Tue Mar 10, 2020 9:07 am ]
Post subject:	Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser
spook wrote: Bill Spight wrote: Apples and oranges. The fact that sometimes averages give terrible results is a big clue. Maybe averaging is not a good idea. Even when it appears to be. Merging statistics of 2 different bots is as difficult as making 2 dictators agree on something. Perhaps we need to train an AI for it. Not a bad idea. IIUC, a human chess player, using recommendations from more than one chess engine, can beat one of those chess engines. That suggests that combining the recommendations of go bots may be an easier task than playing go. Quote: By the way, Do you have a source for that "500 rollout" limit of ELF ? That would be me. To quote myself, Moi wrote: {In the Elf commentaries on GoGoD games}: If the play in the game was Elf's top choice, they {the Elf team} indicated that, and sometimes added variations. If it was not Elf's top choice, they always included a variation with that choice, along with the Black winrate estimate and the number of playouts. The game play also has a winrate estimate and number of playouts, but the two may not be related. For instance, sometimes the game play was not on Elf's radar, and has 0 playouts. Well, you can't get a good winrate estimate from 0 playouts. Where does that estimate come from? Inspection reveals that it comes from the winrate estimate of Elf's reply to the game move. How confident can we be of that estimate? The number of playouts reflects the confidence we can place in the estimate. There is no general agreement as to how confident we can be with a certain number of playouts, but, for the purpose of analysis, I have my doubts about fewer than 10k playouts. With analysis I am not just interested in finding a good play, but in comparing different plays. a distinct task. With fewer than 100 playouts, Elf seems to take the winrate estimate from Elf's reply, just as it does with 0 playouts. With several hunddred playouts Elf takes the estimate from the move itself, not Elf's reply. I do not know the threshold number above which ELf does that. ( https://lifein19x19.com/viewtopic.php?p=248628#p248628 Emphasis added later.) With more data I was able to find that the threshold was 500 rollouts. With fewer rollouts for a play, Elf inherits the winrate estimate from Elf's reply to that play. Edit: I found an update to the above. https://lifein19x19.com/viewtopic.php?p=248845#p248845

Author:	Maharani [ Wed Mar 11, 2020 7:55 am ]
Post subject:	Re: ZBaduk - LeeLa Zero from your webbrowser
spook wrote: So, the "all-bots decision" value slightly normalizes the values of "KataGo's decision", before averaging them with Leela Zero estimations. Does that make sense ? Yes and no: Does this mean that when only KataGo is used, the gray line is "slightly normalized" even though it is not averaged with Leela Zero?

Author:	spook [ Wed Mar 11, 2020 4:20 pm ]
Post subject:	Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser
Maharani wrote: Yes and no: Does this mean that when only KataGo is used, the gray line is "slightly normalized" even though it is not averaged with Leela Zero? That's exactly what it does. (but I think the gray chart shouldn't actually be there if there's just 1 bot. I'll see what I can do about that) In mean time, here's a preview of a column selection wizard to remove unwanted columns of the tables. That should also make it cleaner on smaller devices. If all goes well, available by Friday. Attachment: column selection.jpg [ 79.41 KiB \| Viewed 9201 times ]

Author:	Maharani [ Wed Mar 11, 2020 4:36 pm ]
Post subject:	Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser
Awesome!! Thanks for the continued updates Is there possibly any way to delete white passes at the start of games saved before the bug fix without deleting the entire tree?

Author:	Jæja [ Thu Mar 12, 2020 3:23 am ]
Post subject:	Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser
Jæja wrote: I see that bot analyses and variations are stored within saved games. Do you plan on allowing for these to be made public, e.g. by sharing a URL, like a Google Drive document? This would be absolutely amazing @spook: I'm sorry for repeating myself, but I was wondering if you could share your thoughts about this?

Page 7 of 12	All times are UTC - 8 hours [ DST ]
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/

Author:	spook [ Fri Mar 13, 2020 4:20 am ]
Post subject:	Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser
Jæja wrote: I see that bot analyses and variations are stored within saved games. Do you plan on allowing for these to be made public, e.g. by sharing a URL, like a Google Drive document? This would be absolutely amazing Not only do I think this is a great idea. I think it would help promote zbaduk a bit as well. It would then also make sense to allow guests (i.e. visitors without accounts) to use the review tools. (perhaps in a read-only mode) It's in my top prio list.

Author:	xela [ Fri Mar 13, 2020 5:18 am ]
Post subject:	Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser
Bill Spight wrote: The fact that sometimes averages give terrible results is a big clue. Maybe averaging is not a good idea. Even when it appears to be. spook wrote: Merging statistics of 2 different bots is as difficult as making 2 dictators agree on something. Perhaps we need to train an AI for it. :scratch: OK, so a simple average isn't quite the right thing. But don't give up! Combining different models can often outperform the individual models.

Author:	Jæja [ Sun Mar 15, 2020 9:12 am ]
Post subject:	Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser
spook wrote: Jæja wrote: I see that bot analyses and variations are stored within saved games. Do you plan on allowing for these to be made public, e.g. by sharing a URL, like a Google Drive document? This would be absolutely amazing Not only do I think this is a great idea. I think it would help promote zbaduk a bit as well. It would then also make sense to allow guests (i.e. visitors without accounts) to use the review tools. (perhaps in a read-only mode) It's in my top prio list. Sooooo cool. Thanks again for all your hard work and good luck with the project!

Author:	Maharani [ Mon Mar 16, 2020 11:13 pm ]
Post subject:	Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser
I've finally figured out that it is possible to use "stop analyzer" to show the server that you're "just idle" rather than "completely idle". I guess I had simply assumed that stopping the analyzer would reset the analysis like the ownership tool does? Very happy that this is not actually the case. More of a "pause analyzer" Unrelatedly, sometimes (maybe 1 out of 20?) the analysis will be running merrily, but then I click its favourite move onto the board and that will cause it to disconnect, showing no statistics chart for 10 - 15 seconds before starting analysis of the new move at 0. EDIT: As of a few hours ago, ZBaduk does not display "winrate" in the chart anymore, or only "decision"...? EDIT 2: Nevermind, this just means you've already made it customizable! Awesome, thanks!!

Author:	spook [ Thu Mar 19, 2020 7:25 pm ]
Post subject:	Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser
I just installed an update. Attachment: screen1.jpg [ 176.49 KiB \| Viewed 8952 times ] Attachment: screen2.jpg [ 230.06 KiB \| Viewed 8952 times ] Attachment: screen3.jpg [ 152.3 KiB \| Viewed 8952 times ] Enjoy ! (PS: @Maharani, this should also fix the confusing gray chart issue.)

Author:	spook [ Thu Mar 19, 2020 7:34 pm ]
Post subject:	Re: ZBaduk - LeeLa Zero and KataGo from your webbrowser
Maharani wrote: EDIT: As of a few hours ago, ZBaduk does not display "winrate" in the chart anymore, or only "decision"...? EDIT 2: Nevermind, this just means you've already made it customizable! Awesome, thanks!! Exactly, that was actually part of a previous release (was it last weekend ? ) One detail about the column selection: the selection of columns is stored in your webbrowser, not on your account. The reason for this, is that you can select a different set of columns for your mobile phone or tablet. But it also means that if you use both chrome and firefox on your computer, you may have to configure it twice. --- (As for the game properties window, the selection of ruleset is displayed, but intentionally disabled.)