Thanksy501 wrote:That looks amazing![]()
On a minor note, it looks like handicap bot games are broken at the moment.
Sorry for the inconveniences, will look into this, this evening.
Thanksy501 wrote:That looks amazing![]()
Doesn't seem to be a "hidden Leela". It closely parallels the values given in the chart, but diverges more as the values range away from 50 %. Any idea?Maharani wrote:Yet another question... sorry. How come the winrates displayed on the board for 7 komi are different (lower) than the winrates given in the chart? I don't remember this happening for 7.5 komi reviews.
EDIT: I think the reason is that the displayed-on-the-board values are an average of KataGo 7 komi New Zealand rules and Leela Zero 7.5 komi Chinese rules, except that the Leela Zero values are hidden in the chart?
1: It updates the leela zero network several times per day, and always uses the latest.Jæja wrote:@spook: what networks are being used by ZBaduk at the moment? I'd like to make a comparison of the number of playouts between ZBaduk and my GPU-less MacBook.
As for: why there is a deviation between the 2 charts, this is "by design".Maharani wrote:It closely parallels the values given in the chart, but diverges more as the values range away from 50 %. Any idea?
Not me!spook wrote:I don't think anybody was using that, right?
Do you plan to upgrade to networks with a larger block size, e.g. g170-b40c256x2-s1349368064-d524332537.zip? I can imagine the computational burden becomes too much for your backend at some point.spook wrote:2: For KataGo it currently uses g104-b20c256-s447913472-d241840887.zip
Apples and oranges.spook wrote:Line colors:
- The blue series: are the evaluations of KataGo.
- A red series: would show the evaluations of Leela Zero.
- The gray series: is an evaluation by ZBaduk, which merges statistics of the other 2 bots.
What may be surprising, is that merging is not just an average.
There are some cases where averages give terrible results.![]()
example:
- Move A = 58% according to bot1, but bot2 doesnt consider it. --> average = 58%
- Move B = 60% according to bot1, and for bot2 only 50% --> average = 55%
Both bots prefer move B, but still move A has a higher average.
So, the "all-bots decision" value slightly normalizes the values of "KataGo's decision",
before averaging them with Leela Zero estimations.
Does that make sense ?![]()
Merging statistics of 2 different bots is as difficult as making 2 dictators agree on something.Bill Spight wrote: Apples and oranges.
The fact that sometimes averages give terrible results is a big clue. Maybe averaging is not a good idea. Even when it appears to be.
Not a bad idea.spook wrote:Merging statistics of 2 different bots is as difficult as making 2 dictators agree on something.Bill Spight wrote: Apples and oranges.
The fact that sometimes averages give terrible results is a big clue. Maybe averaging is not a good idea. Even when it appears to be.
Perhaps we need to train an AI for it.
That would be me. To quote myself,By the way, Do you have a source for that "500 rollout" limit of ELF ?
( viewtopic.php?p=248628#p248628 Emphasis added later.)Moi wrote:{In the Elf commentaries on GoGoD games}: If the play in the game was Elf's top choice, they {the Elf team} indicated that, and sometimes added variations. If it was not Elf's top choice, they always included a variation with that choice, along with the Black winrate estimate and the number of playouts. The game play also has a winrate estimate and number of playouts, but the two may not be related. For instance, sometimes the game play was not on Elf's radar, and has 0 playouts. Well, you can't get a good winrate estimate from 0 playouts. Where does that estimate come from? Inspection reveals that it comes from the winrate estimate of Elf's reply to the game move. How confident can we be of that estimate? The number of playouts reflects the confidence we can place in the estimate. There is no general agreement as to how confident we can be with a certain number of playouts, but, for the purpose of analysis, I have my doubts about fewer than 10k playouts. With analysis I am not just interested in finding a good play, but in comparing different plays. a distinct task. With fewer than 100 playouts, Elf seems to take the winrate estimate from Elf's reply, just as it does with 0 playouts. With several hunddred playouts Elf takes the estimate from the move itself, not Elf's reply. I do not know the threshold number above which ELf does that.
Yes and no: Does this mean that when only KataGo is used, the gray line is "slightly normalized" even though it is not averaged with Leela Zero?spook wrote:So, the "all-bots decision" value slightly normalizes the values of "KataGo's decision",
before averaging them with Leela Zero estimations.
Does that make sense ?
That's exactly what it does.Maharani wrote:Yes and no: Does this mean that when only KataGo is used, the gray line is "slightly normalized" even though it is not averaged with Leela Zero?
@spook: I'm sorry for repeating myself, but I was wondering if you could share your thoughts about this?Jæja wrote:I see that bot analyses and variations are stored within saved games. Do you plan on allowing for these to be made public, e.g. by sharing a URL, like a Google Drive document? This would be absolutely amazing
Not only do I think this is a great idea. I think it would help promote zbaduk a bit as well.Jæja wrote:I see that bot analyses and variations are stored within saved games. Do you plan on allowing for these to be made public, e.g. by sharing a URL, like a Google Drive document? This would be absolutely amazing