“Decision: case of using computer assistance in League A”

Bonobo · Post by **Bonobo** » Sun Mar 25, 2018 12:10 pm

http://pandanet-igs.com/communities/euroteamchamps/403

Decision: case of using computer assistance in League A
Based on a protest against the PGETC League A Round 4 match Italy vs. Israel Board 3 game Carlo Metta vs Reem Ben David, and after an extensive study and consultation, the League A referee has decided that Carlo Metta used Leela computer software to assist him in the game. This violates the rule of fair play (§3.2).

The penalty is that all games by Carlo in the 8th PGETC League A are forfeited and the player is banned from the 8th and 9th PGETC play.

Lorenz Trippel of EGF gave some extra information on Facebook:

Lorenz Trippel wrote:After Israel's protest some extensive research on the games was done and finally it was concluded that 98% similarity to Leela Go playing software is too much.

And from a comment in that same FB thread:

Alessandro Boh Pace wrote: I know Carlo since a very long time and we played many many games in tournaments...in the last 2 years he didn't play online and played instead only against Leela also studing and reviewing with it. I was at first against this idea but when i saw him in life tournaments i had to change my mind: he improved like hell and his style was even more solid then usual. Now if we can see similarities between him and Leela i am not surprised at all, but those small differences can actually be extremely big; one mistake is enough to loose a game. In some games of the league in fact he was loosing and won because of an opponent's blunder (that is definetely not leela style ^^').
I think even analizing that indicted game, almost all the moves were nothing particular and even the special ones were totally "Carlo style" as i know him; the fact that many were all top5 moves of leela in that game means just carlo is not a kyu player(and it works also for his opponent in that game ^^' crazy)...

Above all tecnical stuff (that actually cannot really prove anything) the italian team was very excited about the miracle of reaching A league; we knew that the level was higher than our but for us it was a big honor and occasion to play with strong players and learn (i did lose all my games and still i am very gratefull for the experience ). But the others managed to fight very well, especially the last boards, especially Carlo.
I would bet my life he didn't cheat because i know him and he is exactly like all the others go players: he wants to play and learn.
I am sorry for the referees because is a difficult situation and even after a long study they made a big blunder, but being at their place i doubt i could do better.
But i am even more sorry for Carlo that after giving so much for studing go game and working so hard for the italian go federation as organizer, he is now ashamed and feeling bad about this crazy story and also losing motivation in this period when he was finally having very nice results..

Me and all the italian team(pretty sure), will support Carlo with all we have as he is victim of a new, difficult to deal with, system and doesn't deserve it at all.

The situation is actually disgusting me but i can understand everyone.. please just try to be respectfull to a great, polite and honest go player.
Thx

And here’s a link to the SGF file of that game, posted in the German DGoB forum.

________________

I for one think that it should say “assumed case of using computer assistance” because I think it cannot be proven 100%.

Javaness2 · Post by **Javaness2** » Sun Mar 25, 2018 12:45 pm

Has the decision been appealed?

The thing I find slightly odd is that the entire game (well 98% of) is said to be within Leela's top choices.
I mean, if you were cheating, would you really be so bad at it that you would make it so obvious?

Uberdude · Post by **Uberdude** » Sun Mar 25, 2018 1:52 pm

Out of interest of the quality of the similarity metric used, I downloaded Leela 0.11 (to my crappy laptop, about 10 seconds to get 30k nodes, I don't know how strong it is) and analysed moves 50-80 of Carlo's game vs Israel, and my last PGETC game moves 50-88. In that small section of Carlo's game he got 100% similar, Israel 67%. In my game I got 74% and my opp 89%. I have no reason to believe my opp used Leela (and I didn't) and all his moves seemed plausible 3-4d moves (and I won), so my tentative conclusion from this small test is that if it's possible for an innocent to get 89% on 38 moves then 98% on 100 moves when you've been studying with Leela is suspicious but not good enough proof for punishment. Of the 38 moves I looked at in my game I classified 13 of them as "only moves", as in a dan player like myself can say it's the only move in less than a second e.g. capture after atari (when no other sensible choice like connect). This classification is somewhat subjective, but excluding these from the count would give a higher quality metric.

dfan · Post by **dfan** » Sun Mar 25, 2018 2:02 pm

Javaness2 wrote:The thing I find slightly odd is that the entire game (well 98% of) is said to be within Leela's top choices.
I mean, if you were cheating, would you really be so bad at it that you would make it so obvious?

In chess the answer has been shown to be an unequivocal yes. Perhaps Go players are smarter, though.

Bill Spight · Post by **Bill Spight** » Sun Mar 25, 2018 2:15 pm

Bonobo wrote: Lorenz Trippel of EGF gave some extra information on Facebook:

Lorenz Trippel wrote:After Israel's protest some extensive research on the games was done and finally it was concluded that 98% similarity to Leela Go playing software is too much.

My inner scientist says that's not the right question. Confirmatory evidence (he plays like Leela) is weak. That could be the result, as mentioned by Alessandro Boh Pace, of training using Leela for two years. The main question is this: how different was his play in these games from his play in recent FTF tournaments? (If those game records are unavailable, there are other ways of testing how close his go decisions accord with Leela's.)

I took a look at the linked game record. Nothing seemed particularly distinctive until Black 109. Black's play from that point up to Black 153 does, however. Would Leela have played that whole sequence as Black? If so, that's pretty good evidence, but still confirmatory.

----

As I have said for quite a while now, humans can learn to imitate the strategy of the top bots, even though the bots cannot explain it. But the decision to play Black 109 rests upon fairly specific tactics, as well. The play is not particularly urgent on its face. That is why it seemed distinctive to me.

Bill Spight · Post by **Bill Spight** » Sun Mar 25, 2018 2:23 pm

Uberdude wrote:Out of interest of the quality of the similarity metric used, I downloaded Leela 0.11 (to my crappy laptop, about 10 seconds to get 30k nodes, I don't know how strong it is) and analysed moves 50-80 of Carlo's game vs Israel, and my last PGETC game moves 50-88. In that small section of Carlo's game he got 100% similar, Israel 67%. In my game I got 74% and my opp 89%. I have no reason to believe my opp used Leela (and I didn't) and all his moves seemed plausible 3-4d moves (and I won), so my tentative conclusion from this small test is that if it's possible for an innocent to get 89% on 38 moves then 98% on 100 moves when you've been studying with Leela is suspicious but not good enough proof for punishment.

Minor point: it's 19 moves, not 38.

But yes, the fact that Carlo made exactly the same moves as Leela for 15 moves straight may be suspicious, but that's all. And, as I said, it is confirmatory evidence, which is weak, weak, weak. (Emphasis because most people overvalue confirmatory evidence.)

Bill Spight · Post by **Bill Spight** » Sun Mar 25, 2018 2:31 pm

dfan wrote:
Javaness2 wrote:The thing I find slightly odd is that the entire game (well 98% of) is said to be within Leela's top choices.
I mean, if you were cheating, would you really be so bad at it that you would make it so obvious?
In chess the answer has been shown to be an unequivocal yes. Perhaps Go players are smarter, though.

Except perhaps for AlphaZero, IIUC, the main difference between humans and chess engines lies in the realm of tactics and calculation of variations. Those skills are hard to imitate. But the top AI go bots use neural nets to come up with different evaluations and strategies. These can be imitated, especially if you devote a couple of years to trying to understand them.

HermanHiddema · Post by **HermanHiddema** » Sun Mar 25, 2018 2:35 pm

Bill Spight wrote:My inner scientist says that's not the right question. Confirmatory evidence (he plays like Leela) is weak. That could be the result, as mentioned by Alessandro Boh Pace, of training using Leela for two years. The main question is this: how different was his play in these games from his play in recent FTF tournaments? (If those game records are unavailable, there are other ways of testing how close his go decisions accord with Leela's.)

As a comment on the linked facebook thread, Jonas Egeberg wrote:

Jonas Egeberg wrote: As the manager of League A in PGETC I have been in charge of dealing with this matter. I of course had help from other strong, non-biased players in analyzing the games etc. For those asking, what we did is that we checked several of his offline games from recent tournaments, and we also verified with his opponents that they were the actual games played. We checked moves 50-150 and noted the moves as similar if they were within Leela's top 3 moves, and no further than 5% away from its top move. In those games the similarity was 70-80%. We then went back to the PGETC games and checked the game against Isreal. In that game we found that the similarity was 98%, where the only move that was different was Leela's move number 4, but still within 1% winrate of its top move.

Bill Spight · Post by **Bill Spight** » Sun Mar 25, 2018 2:38 pm

Javaness2 wrote:Has the decision been appealed?

The thing I find slightly odd is that the entire game (well 98% of) is said to be within Leela's top choices.
I mean, if you were cheating, would you really be so bad at it that you would make it so obvious?

The decision rests upon one game?????

That may be enough to require a replay or throw the result out. But confirmatory evidence is weak. For disciplinary action I would want evidence from at least 10 games. And further tests, besides. For instance, have a monitor for a game or two and see how many of Leela's moves Carlo makes.

Bill Spight · Post by **Bill Spight** » Sun Mar 25, 2018 2:42 pm

HermanHiddema wrote: As a comment on the linked facebook thread, Jonas Egeberg wrote:

Jonas Egeberg wrote: As the manager of League A in PGETC I have been in charge of dealing with this matter. I of course had help from other strong, non-biased players in analyzing the games etc. For those asking, what we did is that we checked several of his offline games from recent tournaments, and we also verified with his opponents that they were the actual games played. We checked moves 50-150 and noted the moves as similar if they were within Leela's top 3 moves, and no further than 5% away from its top move. In those games the similarity was 70-80%. We then went back to the PGETC games and checked the game against Isreal. In that game we found that the similarity was 98%, where the only move that was different was Leela's move number 4, but still within 1% winrate of its top move.

Many thanks, Herman.

I don't have truck with Facebook, for mostly obvious reasons.

hyperpape · Post by **hyperpape** » Sun Mar 25, 2018 3:06 pm

The go world will probably have to spend some time looking over Ken Regan's work in Chess. I can't vouch for it, though what I have read by him impresses me. Key things I can say: he claims that it is well within the realm of possibility to substantiate cheating allegations from a single game, but you usually have to go well beyond "how many of the engine's preferred moves did the human play?" to do so.

Here's one page where he summarizes some of his work: https://www.cse.buffalo.edu//~regan/chess/fidelity/
Here's a longform article about his work: http://www.uschess.org/content/view/12677/763/
Here's is a blog he coauthors where I first read about his work: https://rjlipton.wordpress.com/

Gomoto · Post by **Gomoto** » Sun Mar 25, 2018 3:33 pm

98% on 100 moves is good enough to decide.

If you want to continue meaningful online tournaments a deep learning aproach to detecting cheaters is necessary nowadays.

What frightens me more: No more bathrooms at real life tournaments (The only kind I take part in, and my bad results show clearly I am not cheating

). I have to cut my coffee drinking routine at tournaments

hyperpape · Post by **hyperpape** » Sun Mar 25, 2018 4:23 pm

Gomoto wrote:98% on 100 moves is good enough to decide.

It really isn't, at least not on an ongoing basis. I can understand making a preliminary decision based on this, as it one of the first (or is the first?) major accusations of cheating in Western play, but we have to do better in the long term. Accusing a player of cheating will cast a cloud over that player forever. With a few thousand games played each year, we need the chance of a false positive to be miniscule. 1% false positives won't cut it.

By the judges own statement, in several offline games played with no suspicion of cheating, this player was playing 35-40 moves that Leela picked out of a 50 game sequence. In this game, he played 49. That stinks, and if you force me to bet, my money is on cheating.

The shift from 40 moves to 49 is highly suggestive. But to make it stick, you need to know what the range of similarity different players show. To accurately estimate that value, you have to look at a lot of different players, over a lot of different games (without a great number of players, you run the risk that some players are much more similar to a particular bot). You also run the risk that certain positions lend themselves to much higher scores.

While the league manager seems to have worked hard to come to a reliable decision, I do not think that the approach is good enough for the long term.

Gomoto · Post by **Gomoto** » Sun Mar 25, 2018 4:41 pm

there is a chance for error if 98 of 100 moves are the same, but I am sure it is not 1%

my first estimate for false positive is < 0.01%

hyperpape · Post by **hyperpape** » Sun Mar 25, 2018 4:56 pm

I’m not saying it is 1%. I don’t have the faintest idea what the number is. 1% was just to illustrate that we need to be very certain.

I’m saying we should know what the number is, rather than relying on numbers someone pulled out of their butt.

P.S. I believe they said 98 of his moves between move 50 and 150 of the game were similar (one of Leela’s top 3 choices). That’s 49 out of 50, not 98 out of a 100, and that makes a difference.

Life In 19x19

“Decision: case of using computer assistance in League A”

“Decision: case of using computer assistance in League A”

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A