“Decision: case of using computer assistance in League A”

Uberdude · Post by **Uberdude** » Thu Apr 12, 2018 4:26 am

Dmytro wrote:
Uberdude wrote: Although I load the whole game sgf into Leela, when I ask it for what it wants to play for move X I haven't done any analysis for moves after X (I used a separate sgf replayer to know what the human played) so I don't think the fact the sgf contains that information is used by Leela, but I will check with a truncated sgf. (It's a manual position-by-position analysis rather than bulk analysis of the game like go review partner does). If you go forward from X and do analysis then these simulations of the game tree are used if you move back to X and continue analysis.
I do not know much about Leela interface. But, logically, your way for game analysis looks good. Still, I would prefer to use truncated sgf to be 100% sure that there is no influence from next moves.

I checked with a truncated sgf, the results are essentially (but not exactly, see Herman's point below) the same. The Leela interface (or my ineptitude with it) makes it difficult to input both black and white moves to an unfinished sgf and have Leela only offer analysis instead of actually playing moves of the opposite colour in reply.

HermanHiddema wrote: So, given that Leela's preferred moves are non-deterministic like this, it is possible that the same move might on one run be Leela's top choice, and on another be outside the top 3 or outside the 5% margin?

Definitely possible, though my feeling from the analysis I've done so far is it would be rare for a #1 to drop so far. Shuffling around of #2/#3/#4, and win% crossing the 5% mark more common.

HermanHiddema wrote: Given one of your test games, for every position between moves 50-150, let Leela analyse the position five times, independently (i.e. close and reopen the position between runs). Then record if the human move played was ever Leela's top choice.

Too much work for me to do manually though! As a little test, here's a pic of 3 runs (50k, 50k, 150k) on the same position on a full sgf and 3 on a snipped one to also test Dmytro's point. This position Leela has a strong preference for the #1 of d15 and didn't put much effort into analysing the other choices. In other positions I've seen much flatter distribution of the effort so I'd expect more variance between runs (and also with #nodes). In all 6 d15 is #1 and has by far the most simulations. d14 and d16 are always taking the next 2 positions, but d14 is #2 in 4 of 6, and is always within 5% of #1, even when in 3rd. In 2 of the 4 where d16 is 3rd it is more than 5% worse than #1. The order of moves outside the top 3 changes a bit, but with so few simulations is basically noise.

John Fairbairn wrote:Maybe we could try an electronic vote here, too.

I don't think that's a good idea, unless you can first ensure voters understand the evidence. Otherwise a lot of naive people will think "98% similarity to Leela => 98% is big, almost 100% => he cheated with Leela". If you can only vote after reading a detailed report on the evidence, doing a mini-course in statistics, reading an essay from Bill on Bayesianism etc and passing a mini-exam on them then I'd be happier with a vote. Then again we let uninformed people vote in much more important matters

.

Kirby wrote: * I don't think punitive measures can fairly be taken without absolute proof of cheating.

I think "absolute" is too strong, "beyond reasonable doubt" is good enough for me in this case (but I have oodles of doubt). For less important things like regular KGS games even less strong evidence is ok.
Edit: skim reading some of drmwc's links from the bridge case I see "comfortable satisfaction" as an intermediate level of proof between "balance of probabilities" and "beyond reasonable doubt".

drmwc · Post by **drmwc** » Thu Apr 12, 2018 4:34 am

In bridge, Fantoni and Nunes were accused of cheating. The site BridgeWinners was involved in analysing the evidence and briging the case. They were world champion level players.

Bridge is attempting to become Olynpic, and so the ulitimate arbitration for European bridge is the Court of Arbitration for Sport. Their appeal ultimately ended up at CAS, and they won the appeal earlier this year.

The EBL press release is here:
http://www.eurobridge.org/2018/01/10/4798/

The Bridgewinners thread on the CAS decision is here:
https://bridgewinners.com/article/view/ ... and-nunes/

Go at one point was seeking Olympic accreditation, and so it's possible that Go appeals could also end up at CAS.

John Fairbairn · Post by **John Fairbairn** » Thu Apr 12, 2018 5:37 am

(Answering: Maybe we could try an electronic vote here, too.)

I don't think that's a good idea, unless you can first ensure voters understand the evidence. Otherwise a lot of naive people will think "98% similarity to Leela => 98% is big, almost 100% => he cheated with Leela". If you can only vote after reading a detailed report on the evidence, doing a mini-course in statistics, reading an essay from Bill on Bayesianism etc and passing a mini-exam on them then I'd be happier with a vote. Then again we let uninformed people vote in much more important matters .

But that's my point. In so many areas of our lives all of us (even those who think they are rational) makes decisions and assessments in naïve, uninformed way - and that is usually how we expect other decisions to be made, as you seem to concede.

I think most of use realise the process is flawed and does (not just "is likely to", but "does") lead to rather serious mistaken consequences: criminal convictions, bombing Iraq, destroyed reputations, etc, etc. Still, we accept that as a necessary compromise for practical reasons.

Drug-busting in sport has tried to go the other "scientific" way at the most enormous cost and major inconvenience to the lives of athletes, and terminal boredom or frustration for sports fans, yet it still doesn't work - unless you're a greedy or hungry lawyer.

A game like go can't afford much in the way of anti-cheating and anti-drugs measures anyway, so the most sensible regime seems to be that we tolerate unscientific suspicions but we try to alleviate the inevitable but occasional malign consequences by (1) not doling out treatments as harsh as Carlo's and (2) making people aware that they have to take their own steps to appear to be above suspicion.

A hard-nosed extension of the last point, for example, might be to say that Carlo was unwise to limit his study to copying Leela and unwise to tell other people about it, so he was the victim of his own mistakes even if he didn't cheat. The phrase "justice needs to be seen to be done" works two ways, after all.

It could be instructive to see how an average amateur go audience would vote on this, I.e. to see what the normal real-world expectations would be. It may not be a "good idea", but is it the "best idea"?

Bill Spight · Post by **Bill Spight** » Thu Apr 12, 2018 12:04 pm

John Fairbairn wrote:Maybe we could try an electronic vote here, too.

How did that work out for Socrates?

Uberdude wrote:I don't think that's a good idea, unless you can first ensure voters understand the evidence. Otherwise a lot of naive people will think "98% similarity to Leela => 98% is big, almost 100% => he cheated with Leela".

Leaving aside my opposition to matching (confirmatory) evidence as well as matching in only one game, matching one of the top three Leela choices (unless it is not a good play) is a lousy metric. Why? Because, as Uberdude has shown, non-cheaters at that level also match quite frequently. What you would like is a more sensitive test, such as one by which non-cheaters match around 50% of the time. Matching Leela's top choice looks like it fits the bill pretty well. In Uberdude's admittedly non-random, insufficiently large sample, the matching to Leela's top pick among supposed non-cheaters has a median value of 48%. Using that metric Carlo matched 72% of the time.

Tell people this:

In this one game, out of 50 moves in the middle (Black 51 - Black 99), 36 of Carlo's plays matched Leela's top choice for that play.

Research has shown that people are not so good at judging percentages or fractions, they are better at judging integers.

My guess is that most people would say, 36 matches out of 50? That's suspicious, all right, but not convincing.

Charlie · Post by **Charlie** » Fri Apr 13, 2018 5:09 am

Uberdude wrote:If you can only vote after reading a detailed report on the evidence, doing a mini-course in statistics, reading an essay from Bill on Bayesianism etc and passing a mini-exam on them then I'd be happier with a vote.

Would it suffice to restrict the vote to amateur dan players? A "jury of his peers", as it were.

After all, *those* players are the stakeholders, here. Those are the players that will be being paired against Carlo in future matches and *those* are the players who stand to lose the most should this incident lead to either an atmosphere of mistrust and suspicion, to lax anti-cheater policies, or to overly draconian anti-cheater measures that lead to many false positives in the future.

Personally, I feel that this decision and the way it was handed down was far too focused on the local situation -- a single game in a single, online tournament -- than the whole board position. In my opinion, even if the evidence for cheating had been much, much stronger than it really is, one could *still* argue that this incident should serve only to open the conversation about anti-cheating measures and proper, responsible punishments for convicted cheaters.

The reason why cheating is against the rules is to prevent players from gaining an unfair advantage, after all, not to dole out punishments. If a player is suspected of cheating, surely future scrutiny and long-term observation achieves this far more elegantly than wrecking that player's reputation and tarring that player with a conviction?

Sure, you might never get the chance to convict because the player might never cheat again, foiling your efforts to catch him or her. I'd call that a resounding success, not a failure.

Even if I wasn't (also) a Bayesian, even if I didn't know enough about statistics to know what that means in the first place, I would still vote not-guilty, should I be asked, simply because the outcome of the vote has consequences and an unqualified "guilty" verdict must be the worst outcome for everyone. It does not create an environment where cheating is not a dominant strategy because cheaters will, ultimately, be caught. It creates a minefield: don't, whatever you do, resemble a bot or your Go-life is over.

Uberdude · Post by **Uberdude** » Fri Apr 13, 2018 5:26 am

Charlie wrote:It creates a minefield: don't, whatever you do, resemble a bot or your Go-life is over.

So to avoid being accused of cheating, maybe for my PGETC game next week I should run Leela to make sure I don't play too many moves similar to her suggestions

. (I'll likely be playing Gilles van Eeden 6d, who scored 82% similarity to Leela's top 3 vs Viktor Lin; my best results against stronger players typically happen when I succeed in playing a solid honte style, which probably has a relatively high Leela match rate).

Shenoute · Post by **Shenoute** » Fri Apr 13, 2018 12:20 pm

Meanwhile in the chess world...
https://www.chess.com/news/view/gm-solo ... ms-protest

It seems that even with chess engines being much more advanced than anything we can do with Leela or Crazystone conclusions are hard to reach. Even tournament perfomances that are worlds apart (2000 elo to 2500 elo in this case, although one was blitz the other rapid) are apparently no smoking gun.

dfan · Post by **dfan** » Fri Apr 13, 2018 1:18 pm

Shenoute wrote:Meanwhile in the chess world...
https://www.chess.com/news/view/gm-solo ... ms-protest

It seems that even with chess engines being much more advanced than anything we can do with Leela or Crazystone conclusions are hard to reach. Even tournament perfomances that are worlds apart (2000 elo to 2500 elo in this case, although one was blitz the other rapid) are apparently no smoking gun.

Ken Regan found zero evidence of cheating in this case.

Shenoute · Post by **Shenoute** » Fri Apr 13, 2018 1:43 pm

dfan wrote:Ken Regan found zero evidence of cheating in this case.

Thanks!

Uberdude · Post by **Uberdude** » Tue Apr 17, 2018 12:56 pm

Uberdude wrote:I'll likely be playing Gilles van Eeden 6d, who scored 82% similarity to Leela's top 3 vs Viktor Lin; my best results against stronger players typically happen when I succeed in playing a solid honte style

Ooops, I played the taisha!

Bonobo · Post by **Bonobo** » Wed Apr 18, 2018 1:03 pm

Continuation: http://pandanet-igs.com/communities/eur ... rounds/416

Pandanet wrote:League A 2017/2018 - Round 8
Italy vs. Poland
Tuesday 10 April 2018, 20:00 CET/CEST
[..]
3rd and 4th board temporarily postponed until an appeal is concluded.

Bonobo · Post by **Bonobo** » Thu Apr 26, 2018 3:53 am

UPDATE (and sorry for not having a better, more direct source)

According to Ingo Althöfer’s comment in the German DGoB forum, the Italian player in question will be allowed to play in rounds 8 and 9 until the appeal is concluded.

goTony · Post by **goTony** » Thu Apr 26, 2018 12:32 pm

Knotwilg wrote:I agree the game should be forfeited and replayed. The punishment is too harsh for what's merely confirmed suspicion.

But if it is an internet game how does one confirm there is computer assistance?

RobertJasiek · Post by **RobertJasiek** » Thu Apr 26, 2018 1:08 pm

goTony wrote: But if it is an internet game how does one confirm there is computer assistance?

forum/viewtopic.php?p=228864#p228864

goTony · Post by **goTony** » Thu Apr 26, 2018 2:54 pm

RobertJasiek wrote:
goTony wrote: But if it is an internet game how does one confirm there is computer assistance?
forum/viewtopic.php?p=228864#p228864

Reading over some of the posts in this topic it seems that we expect someone to use a computer for much of the game. All it takes to get an unfair advantage by cheating is to use it once in a difficult position to come up with a move or combination that you would not come up with or verify your thoughts. It seems with the technology here now that serious online tournaments will have to become a thing of the past. Because of the difficulty of monitoring the integrity of the players.

Life In 19x19

“Decision: case of using computer assistance in League A”

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A