I checked with a truncated sgf, the results are essentially (but not exactly, see Herman's point below) the same. The Leela interface (or my ineptitude with it) makes it difficult to input both black and white moves to an unfinished sgf and have Leela only offer analysis instead of actually playing moves of the opposite colour in reply.Dmytro wrote:I do not know much about Leela interface. But, logically, your way for game analysis looks good. Still, I would prefer to use truncated sgf to be 100% sure that there is no influence from next moves.Uberdude wrote: Although I load the whole game sgf into Leela, when I ask it for what it wants to play for move X I haven't done any analysis for moves after X (I used a separate sgf replayer to know what the human played) so I don't think the fact the sgf contains that information is used by Leela, but I will check with a truncated sgf. (It's a manual position-by-position analysis rather than bulk analysis of the game like go review partner does). If you go forward from X and do analysis then these simulations of the game tree are used if you move back to X and continue analysis.
Definitely possible, though my feeling from the analysis I've done so far is it would be rare for a #1 to drop so far. Shuffling around of #2/#3/#4, and win% crossing the 5% mark more common.HermanHiddema wrote: So, given that Leela's preferred moves are non-deterministic like this, it is possible that the same move might on one run be Leela's top choice, and on another be outside the top 3 or outside the 5% margin?
Too much work for me to do manually though! As a little test, here's a pic of 3 runs (50k, 50k, 150k) on the same position on a full sgf and 3 on a snipped one to also test Dmytro's point. This position Leela has a strong preference for the #1 of d15 and didn't put much effort into analysing the other choices. In other positions I've seen much flatter distribution of the effort so I'd expect more variance between runs (and also with #nodes). In all 6 d15 is #1 and has by far the most simulations. d14 and d16 are always taking the next 2 positions, but d14 is #2 in 4 of 6, and is always within 5% of #1, even when in 3rd. In 2 of the 4 where d16 is 3rd it is more than 5% worse than #1. The order of moves outside the top 3 changes a bit, but with so few simulations is basically noise.HermanHiddema wrote: Given one of your test games, for every position between moves 50-150, let Leela analyse the position five times, independently (i.e. close and reopen the position between runs). Then record if the human move played was ever Leela's top choice.
I don't think that's a good idea, unless you can first ensure voters understand the evidence. Otherwise a lot of naive people will think "98% similarity to Leela => 98% is big, almost 100% => he cheated with Leela". If you can only vote after reading a detailed report on the evidence, doing a mini-course in statistics, reading an essay from Bill on Bayesianism etc and passing a mini-exam on them then I'd be happier with a vote. Then again we let uninformed people vote in much more important mattersJohn Fairbairn wrote:Maybe we could try an electronic vote here, too.
I think "absolute" is too strong, "beyond reasonable doubt" is good enough for me in this case (but I have oodles of doubt). For less important things like regular KGS games even less strong evidence is ok.Kirby wrote: * I don't think punitive measures can fairly be taken without absolute proof of cheating.
Edit: skim reading some of drmwc's links from the bridge case I see "comfortable satisfaction" as an intermediate level of proof between "balance of probabilities" and "beyond reasonable doubt".