“Decision: case of using computer assistance in League A”

Javaness2 · Post by **Javaness2** » Mon Mar 26, 2018 2:22 am

This is what they did

For those asking, what we did is that we checked several of his offline games from recent tournaments, and we also verified with his opponents that they were the actual games played. We checked moves 50-150 and noted the moves as similar if they were within Leela's top 3 moves, and no further than 5% away from its top move. In those games the similarity was 70-80%. We then went back to the PGETC games and checked the game against Isreal. In that game we found that the similarity was 98%, where the only move that was different was Leela's move number 4, but still within 1% winrate of its top move.

Leela move 4 - does that correspond to move 7 of the game, or does it mean move 57? I guess we will see a technical report soon to clear up this confusion. In any EGF event you have 3 stages of appeal. #1.Appeal to the Referee, #2.Appeal to the Tournament Appeals Committee, #3.Appeal to the EGF version of #2. Given that this happened in November and has only become known now, which steps were taken?

RobertJasiek · Post by **RobertJasiek** » Mon Mar 26, 2018 2:52 am

The game almost only has ordinary moves. Already this lets it be possible that two different players / programs find most of the same moves, especially if they use a similar playing style. That the player is said to have studied with the program for 2 years makes this all the more likely that same moves are not coincidence and not cheating but a direct consequence of adapting a playing style and "knowledge" / "experience" from training with the program during the years before the game. I do hope very sincerely that the judgement is overturned by higher instances.

HermanHiddema · Post by **HermanHiddema** » Mon Mar 26, 2018 2:54 am

Javaness2 wrote:This is what they did
For those asking, what we did is that we checked several of his offline games from recent tournaments, and we also verified with his opponents that they were the actual games played. We checked moves 50-150 and noted the moves as similar if they were within Leela's top 3 moves, and no further than 5% away from its top move. In those games the similarity was 70-80%. We then went back to the PGETC games and checked the game against Isreal. In that game we found that the similarity was 98%, where the only move that was different was Leela's move number 4, but still within 1% winrate of its top move.
Leela move 4 - does that correspond to move 7 of the game, or does it mean move 57? I guess we will see a technical report soon to clear up this confusion. In any EGF event you have 3 stages of appeal. #1.Appeal to the Referee, #2.Appeal to the Tournament Appeals Committee, #3.Appeal to the EGF version of #2. Given that this happened in November and has only become known now, which steps were taken?

Leela move 4 means the 4th best move in that position according to Leela's analysis.

Of the 50 moves considered, 49 were in the top 3 best moves according to Leela's analysis (with the additional constraint that the move should not be more than 5% worse than Leela's top choice).

This is where the 98% number comes from, it is 49/50.

The only move out of 50 not in Leela's top 3 was Leela's 4th choice in that position, and was less than 1% worse than Leela's top choice.

Javaness2 · Post by **Javaness2** » Mon Mar 26, 2018 4:03 am

Ah, you are right, that seems like the best way to interpret that statement. So did they then only look at a section of the game or at the whole game? For me it would be kind of strange if they didn't look at the whole game. I suppose that a script already exists to show the comparison data per ply for the game.

jlt · Post by **jlt** » Mon Mar 26, 2018 4:18 am

Uberdude wrote:Out of interest of the quality of the similarity metric used, I downloaded Leela 0.11 (to my crappy laptop, about 10 seconds to get 30k nodes, I don't know how strong it is) and analysed moves 50-80 of Carlo's game vs Israel, and my last PGETC game moves 50-88. In that small section of Carlo's game he got 100% similar, Israel 67%. In my game I got 74% and my opp 89%. (...) if it's possible for an innocent to get 89% on 38 moves then 98% on 100 moves when you've been studying with Leela is suspicious but not good enough proof for punishment.

Let's pick the highest percentage, i.e. 89%. Suppose for simplicity that for each move, the probability to find Leela's move is p=0.89. Then for n=50 moves, the probability to find correctly exactly 49 moves is np^n-1(1-p) which is about 2%. During rounds 1--3 of the Pandanet EGC, 60 games were played, so you would expect at least one false positive.

Uberdude · Post by **Uberdude** » Mon Mar 26, 2018 4:44 am

jlt, a good point to illustrate, but even that is placing too much significance on this test. That 2% is the chance of a randomly chosen game being at 98% Leela (and you should include 100% too). But when the game you choose to investigate is selected because someone else noticed it was similar to Leela it's like putting the black spot in the golf analogy on the player who hit the hole-in-one after and because he did so. It's not independent so the simple probabilities are not appropriate.

Javaness2 · Post by **Javaness2** » Mon Mar 26, 2018 4:49 am

Finally I saw a confirmation, an appeal is planned, so the decision is not final. Judging by the last appeal I saw, it could take a year to figure this out.

Russia

CzechRepublic

Romania

Hungary

Serbia

Ukraine

ez4u · Post by **ez4u** » Mon Mar 26, 2018 5:11 am

jlt wrote:
Uberdude wrote:Out of interest of the quality of the similarity metric used, I downloaded Leela 0.11 (to my crappy laptop, about 10 seconds to get 30k nodes, I don't know how strong it is) and analysed moves 50-80 of Carlo's game vs Israel, and my last PGETC game moves 50-88. In that small section of Carlo's game he got 100% similar, Israel 67%. In my game I got 74% and my opp 89%. (...) if it's possible for an innocent to get 89% on 38 moves then 98% on 100 moves when you've been studying with Leela is suspicious but not good enough proof for punishment.
Let's pick the highest percentage, i.e. 89%. Suppose for simplicity that for each move, the probability to find Leela's move is p=0.89. Then for n=50 moves, the probability to find correctly exactly 49 moves is np^n-1(1-p) which is about 2%. During rounds 1--3 of the Pandanet EGC, 60 games were played, so you would expect at least one false positive.

This calculation was why I was trying to get something more 'common sense' from Bill.

If you take 89% (Uberdude's 4d opponent), you get 1.8% or about one time in 54. However, if you plug in 80% (upper figure for what was observed for the player under discussion) you get a very different result = [edit --> wrong! 0.0000065% or one time in 15 million] 0.018% or one time in about 5,600. And if you take 70% (the lower figure), you get [edit --> wrong!! 5.0E-16 or about one time in 2 quadrillion] 3.8E-7 or about one time in 2.5 million.

jlt · Post by **jlt** » Mon Mar 26, 2018 5:26 am

ez4u wrote: If you take 89% (Uberdude's 4d opponent), you get 1.8% or about one time in 54. However, if you plug in 80% (upper figure for what was observed for the player under discussion) you get a very different result = 0.0000065% or one time in 15 million. And if you take 70% (the lower figure), you get 5.0E-16 or about one time in 2 quadrillion.

No you don't. If p=0.8 then np^n-1(1-p) is about 1/5600. If p=0.7 then np^n-1(1-p) is about 1/(2.5 million).

The orders of magnitude are certainly very different, but I purposely picked p=0.89 to be on the safe side, i.e. I would prefer a few cheaters to go unpunished rather than too many punished innocents (together with the whole team).

And also, p=0.89 may not be unrealistic for that particular game if you assume that most moves were very "ordinary" (dixit Robert Jasiek).

HermanHiddema · Post by **HermanHiddema** » Mon Mar 26, 2018 5:28 am

IMO, for the appeal, they should analyse a large sample of PGETC games to see how much of an outlier 98% is.

lightvector · Post by **lightvector** » Mon Mar 26, 2018 5:54 am

jlt wrote:
ez4u wrote: If you take 89% (Uberdude's 4d opponent), you get 1.8% or about one time in 54. However, if you plug in 80% (upper figure for what was observed for the player under discussion) you get a very different result = 0.0000065% or one time in 15 million. And if you take 70% (the lower figure), you get 5.0E-16 or about one time in 2 quadrillion.
No you don't. If p=0.8 then np^n-1(1-p) is about 1/5600. If p=0.7 then np^n-1(1-p) is about 1/(2.5 million).

There's a reasonable chance the actual probability is nearer the larger side of these estimates. Because if there is sometimes any correlation at all between a match on one move to a match on subsequent moves, then the moves within a game are not independent, and we should expect comparable to the larger side. (e.g. if there are kinds of games/fights/sequences where each move is 85% and kinds where each move is 75%, and each kind of game is equally likely, then the overall probability for 98% is much higher than if every move in every game was uniformly 80%).

HermanHiddema wrote:IMO, for the appeal, they should analyse a large sample of PGETC games to see how much of an outlier 98% is.

This.

Uberdude · Post by **Uberdude** » Mon Mar 26, 2018 5:58 am

HermanHiddema wrote:IMO, for the appeal, they should analyse a large sample of PGETC games to see how much of an outlier 98% is.

Even if that shows 98% is a significant enough outlier that's not enough to convict IMO (of course if it's not an outlier then that's easy, case closed, less work, unless everyone is using Leela!). To do this well you need to analyse a decent number of games of people who have studied a lot with Leela and see how much of an outlier 98% is to that. It may be hard to find enough such people/games around 4d level. Carlo's offline games are a good start (if recent enough), but I want to know how many were analysed, not just a plural and result is 70-80%.

Update: Rather than going for a walk in the park at lunch I finished analysing moves 50-149 of my PGETC game

. I scored 80% similarity and my opp* scored 90%. So we have the report at least 2 of Carlo's offline games are in the 70-80% range, and my data of [80,90]. 98 is looking less conclusive. I attach the spreadsheet I used for interest/verification. I note the sequence from move 93 to 136 would have all counted as copies under this metric were it not for 2 timesujis I played in the middle (most people don't get into byo yomi as earlier as I do). Next I might look at the stricter "top 1" instead of "top 3" metric.

* for ez4u, now ranked 3d, but a former 5d.

ez4u · Post by **ez4u** » Mon Mar 26, 2018 6:09 am

jlt wrote:
ez4u wrote: If you take 89% (Uberdude's 4d opponent), you get 1.8% or about one time in 54. However, if you plug in 80% (upper figure for what was observed for the player under discussion) you get a very different result = 0.0000065% or one time in 15 million. And if you take 70% (the lower figure), you get 5.0E-16 or about one time in 2 quadrillion.
No you don't. If p=0.8 then np^n-1(1-p) is about 1/5600. If p=0.7 then np^n-1(1-p) is about 1/(2.5 million).

The orders of magnitude are certainly very different, but I purposely picked p=0.89 to be on the safe side, i.e. I would prefer a few cheaters to go unpunished rather than too many punished innocents (together with the whole team).

And also, p=0.89 may not be unrealistic for that particular game if you assume that most moves were very "ordinary" (dixit Robert Jasiek).

You are right, a spreadsheet error on my part!

OferZ · Post by **OferZ** » Mon Mar 26, 2018 6:14 am

I wonder if we'll get a comment from Metta himself anytime soon...

It's quite a complicated issue...

I really wonder why was only this game from the PGETC checked.
It seems a bit suspicious, considering it was perhaps the least impressive of his wins (he lost only to Csaba Mero and won against several 6d's and 5d's). Perhaps the reason is that it was the only game that someone appealed against (surprisingly, as it didn't change the outcome of the match, unlike other cases).

From Carlo's side, assuming the accusation is correct, I can understand why someone might choose to cheat when playing in a team. He's not playing just for his own sake and he knows that winning can have a great effect on his friends.. Staying or going down a league affects the whole next year.
(Still, no doubt, it's the wrong thing to do)...

Somehow I feel no problem with him being a referee in the EGC.
He'll be doing service to the EGF, and he'll be more obliged to prove himself...

bernds · Post by **bernds** » Mon Mar 26, 2018 6:15 am

jlt wrote:Let's pick the highest percentage, i.e. 89%. Suppose for simplicity that for each move, the probability to find Leela's move is p=0.89. Then for n=50 moves, the probability to find correctly exactly 49 moves is np^n-1(1-p)

How do you arrive at this?

Life In 19x19

“Decision: case of using computer assistance in League A”

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A