“Decision: case of using computer assistance in League A”

General conversations about Go belong here.
Javaness2
Gosei
Posts: 1545
Joined: Tue Jul 19, 2011 10:48 am
GD Posts: 0
Has thanked: 111 times
Been thanked: 322 times
Contact:

Re: “Decision: case of using computer assistance in League A

Post by Javaness2 »

This is what they did
For those asking, what we did is that we checked several of his offline games from recent tournaments, and we also verified with his opponents that they were the actual games played. We checked moves 50-150 and noted the moves as similar if they were within Leela's top 3 moves, and no further than 5% away from its top move. In those games the similarity was 70-80%. We then went back to the PGETC games and checked the game against Isreal. In that game we found that the similarity was 98%, where the only move that was different was Leela's move number 4, but still within 1% winrate of its top move.


Leela move 4 - does that correspond to move 7 of the game, or does it mean move 57? I guess we will see a technical report soon to clear up this confusion. In any EGF event you have 3 stages of appeal. #1.Appeal to the Referee, #2.Appeal to the Tournament Appeals Committee, #3.Appeal to the EGF version of #2. Given that this happened in November and has only become known now, which steps were taken?
RobertJasiek
Judan
Posts: 6272
Joined: Tue Apr 27, 2010 8:54 pm
GD Posts: 0
Been thanked: 797 times
Contact:

Re: “Decision: case of using computer assistance in League A

Post by RobertJasiek »

The game almost only has ordinary moves. Already this lets it be possible that two different players / programs find most of the same moves, especially if they use a similar playing style. That the player is said to have studied with the program for 2 years makes this all the more likely that same moves are not coincidence and not cheating but a direct consequence of adapting a playing style and "knowledge" / "experience" from training with the program during the years before the game. I do hope very sincerely that the judgement is overturned by higher instances.
User avatar
HermanHiddema
Gosei
Posts: 2011
Joined: Tue Apr 20, 2010 10:08 am
Rank: Dutch 4D
GD Posts: 645
Universal go server handle: herminator
Location: Groningen, NL
Has thanked: 202 times
Been thanked: 1086 times

Re: “Decision: case of using computer assistance in League A

Post by HermanHiddema »

Javaness2 wrote:This is what they did
For those asking, what we did is that we checked several of his offline games from recent tournaments, and we also verified with his opponents that they were the actual games played. We checked moves 50-150 and noted the moves as similar if they were within Leela's top 3 moves, and no further than 5% away from its top move. In those games the similarity was 70-80%. We then went back to the PGETC games and checked the game against Isreal. In that game we found that the similarity was 98%, where the only move that was different was Leela's move number 4, but still within 1% winrate of its top move.


Leela move 4 - does that correspond to move 7 of the game, or does it mean move 57? I guess we will see a technical report soon to clear up this confusion. In any EGF event you have 3 stages of appeal. #1.Appeal to the Referee, #2.Appeal to the Tournament Appeals Committee, #3.Appeal to the EGF version of #2. Given that this happened in November and has only become known now, which steps were taken?


Leela move 4 means the 4th best move in that position according to Leela's analysis.

Of the 50 moves considered, 49 were in the top 3 best moves according to Leela's analysis (with the additional constraint that the move should not be more than 5% worse than Leela's top choice).

This is where the 98% number comes from, it is 49/50.

The only move out of 50 not in Leela's top 3 was Leela's 4th choice in that position, and was less than 1% worse than Leela's top choice.
Javaness2
Gosei
Posts: 1545
Joined: Tue Jul 19, 2011 10:48 am
GD Posts: 0
Has thanked: 111 times
Been thanked: 322 times
Contact:

Re: “Decision: case of using computer assistance in League A

Post by Javaness2 »

Ah, you are right, that seems like the best way to interpret that statement. So did they then only look at a section of the game or at the whole game? For me it would be kind of strange if they didn't look at the whole game. I suppose that a script already exists to show the comparison data per ply for the game.
User avatar
jlt
Gosei
Posts: 1786
Joined: Wed Dec 14, 2016 3:59 am
GD Posts: 0
Has thanked: 185 times
Been thanked: 495 times

Re: “Decision: case of using computer assistance in League A

Post by jlt »

Uberdude wrote:Out of interest of the quality of the similarity metric used, I downloaded Leela 0.11 (to my crappy laptop, about 10 seconds to get 30k nodes, I don't know how strong it is) and analysed moves 50-80 of Carlo's game vs Israel, and my last PGETC game moves 50-88. In that small section of Carlo's game he got 100% similar, Israel 67%. In my game I got 74% and my opp 89%. (...) if it's possible for an innocent to get 89% on 38 moves then 98% on 100 moves when you've been studying with Leela is suspicious but not good enough proof for punishment.


Let's pick the highest percentage, i.e. 89%. Suppose for simplicity that for each move, the probability to find Leela's move is p=0.89. Then for n=50 moves, the probability to find correctly exactly 49 moves is npn-1(1-p) which is about 2%. During rounds 1--3 of the Pandanet EGC, 60 games were played, so you would expect at least one false positive.
Uberdude
Judan
Posts: 6727
Joined: Thu Nov 24, 2011 11:35 am
Rank: UK 4 dan
GD Posts: 0
KGS: Uberdude 4d
OGS: Uberdude 7d
Location: Cambridge, UK
Has thanked: 436 times
Been thanked: 3718 times

Re: “Decision: case of using computer assistance in League A

Post by Uberdude »

jlt, a good point to illustrate, but even that is placing too much significance on this test. That 2% is the chance of a randomly chosen game being at 98% Leela (and you should include 100% too). But when the game you choose to investigate is selected because someone else noticed it was similar to Leela it's like putting the black spot in the golf analogy on the player who hit the hole-in-one after and because he did so. It's not independent so the simple probabilities are not appropriate.
Javaness2
Gosei
Posts: 1545
Joined: Tue Jul 19, 2011 10:48 am
GD Posts: 0
Has thanked: 111 times
Been thanked: 322 times
Contact:

Re: “Decision: case of using computer assistance in League A

Post by Javaness2 »

Finally I saw a confirmation, an appeal is planned, so the decision is not final. Judging by the last appeal I saw, it could take a year to figure this out.

Russia

CzechRepublic

Romania

Hungary

Serbia

Ukraine
Last edited by Javaness2 on Mon Mar 26, 2018 5:40 am, edited 2 times in total.
User avatar
ez4u
Oza
Posts: 2414
Joined: Wed Feb 23, 2011 10:15 pm
Rank: Jp 6 dan
GD Posts: 0
KGS: ez4u
Location: Tokyo, Japan
Has thanked: 2351 times
Been thanked: 1332 times

Re: “Decision: case of using computer assistance in League A

Post by ez4u »

jlt wrote:
Uberdude wrote:Out of interest of the quality of the similarity metric used, I downloaded Leela 0.11 (to my crappy laptop, about 10 seconds to get 30k nodes, I don't know how strong it is) and analysed moves 50-80 of Carlo's game vs Israel, and my last PGETC game moves 50-88. In that small section of Carlo's game he got 100% similar, Israel 67%. In my game I got 74% and my opp 89%. (...) if it's possible for an innocent to get 89% on 38 moves then 98% on 100 moves when you've been studying with Leela is suspicious but not good enough proof for punishment.


Let's pick the highest percentage, i.e. 89%. Suppose for simplicity that for each move, the probability to find Leela's move is p=0.89. Then for n=50 moves, the probability to find correctly exactly 49 moves is npn-1(1-p) which is about 2%. During rounds 1--3 of the Pandanet EGC, 60 games were played, so you would expect at least one false positive.

This calculation was why I was trying to get something more 'common sense' from Bill. :)
If you take 89% (Uberdude's 4d opponent), you get 1.8% or about one time in 54. However, if you plug in 80% (upper figure for what was observed for the player under discussion) you get a very different result = [edit --> wrong! 0.0000065% or one time in 15 million] 0.018% or one time in about 5,600. And if you take 70% (the lower figure), you get [edit --> wrong!! 5.0E-16 or about one time in 2 quadrillion] 3.8E-7 or about one time in 2.5 million.
Last edited by ez4u on Mon Mar 26, 2018 6:26 am, edited 2 times in total.
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
User avatar
jlt
Gosei
Posts: 1786
Joined: Wed Dec 14, 2016 3:59 am
GD Posts: 0
Has thanked: 185 times
Been thanked: 495 times

Re: “Decision: case of using computer assistance in League A

Post by jlt »

ez4u wrote:If you take 89% (Uberdude's 4d opponent), you get 1.8% or about one time in 54. However, if you plug in 80% (upper figure for what was observed for the player under discussion) you get a very different result = 0.0000065% or one time in 15 million. And if you take 70% (the lower figure), you get 5.0E-16 or about one time in 2 quadrillion.


No you don't. If p=0.8 then npn-1(1-p) is about 1/5600. If p=0.7 then npn-1(1-p) is about 1/(2.5 million).

The orders of magnitude are certainly very different, but I purposely picked p=0.89 to be on the safe side, i.e. I would prefer a few cheaters to go unpunished rather than too many punished innocents (together with the whole team).

And also, p=0.89 may not be unrealistic for that particular game if you assume that most moves were very "ordinary" (dixit Robert Jasiek).
User avatar
HermanHiddema
Gosei
Posts: 2011
Joined: Tue Apr 20, 2010 10:08 am
Rank: Dutch 4D
GD Posts: 645
Universal go server handle: herminator
Location: Groningen, NL
Has thanked: 202 times
Been thanked: 1086 times

Re: “Decision: case of using computer assistance in League A

Post by HermanHiddema »

IMO, for the appeal, they should analyse a large sample of PGETC games to see how much of an outlier 98% is.
lightvector
Lives in sente
Posts: 759
Joined: Sat Jun 19, 2010 10:11 pm
Rank: maybe 2d
GD Posts: 0
Has thanked: 114 times
Been thanked: 916 times

Re: “Decision: case of using computer assistance in League A

Post by lightvector »

jlt wrote:
ez4u wrote:If you take 89% (Uberdude's 4d opponent), you get 1.8% or about one time in 54. However, if you plug in 80% (upper figure for what was observed for the player under discussion) you get a very different result = 0.0000065% or one time in 15 million. And if you take 70% (the lower figure), you get 5.0E-16 or about one time in 2 quadrillion.


No you don't. If p=0.8 then npn-1(1-p) is about 1/5600. If p=0.7 then npn-1(1-p) is about 1/(2.5 million).


There's a reasonable chance the actual probability is nearer the larger side of these estimates. Because if there is sometimes any correlation at all between a match on one move to a match on subsequent moves, then the moves within a game are not independent, and we should expect comparable to the larger side. (e.g. if there are kinds of games/fights/sequences where each move is 85% and kinds where each move is 75%, and each kind of game is equally likely, then the overall probability for 98% is much higher than if every move in every game was uniformly 80%).

HermanHiddema wrote:IMO, for the appeal, they should analyse a large sample of PGETC games to see how much of an outlier 98% is.


This.
Uberdude
Judan
Posts: 6727
Joined: Thu Nov 24, 2011 11:35 am
Rank: UK 4 dan
GD Posts: 0
KGS: Uberdude 4d
OGS: Uberdude 7d
Location: Cambridge, UK
Has thanked: 436 times
Been thanked: 3718 times

Re: “Decision: case of using computer assistance in League A

Post by Uberdude »

HermanHiddema wrote:IMO, for the appeal, they should analyse a large sample of PGETC games to see how much of an outlier 98% is.


Even if that shows 98% is a significant enough outlier that's not enough to convict IMO (of course if it's not an outlier then that's easy, case closed, less work, unless everyone is using Leela!). To do this well you need to analyse a decent number of games of people who have studied a lot with Leela and see how much of an outlier 98% is to that. It may be hard to find enough such people/games around 4d level. Carlo's offline games are a good start (if recent enough), but I want to know how many were analysed, not just a plural and result is 70-80%.

Update: Rather than going for a walk in the park at lunch I finished analysing moves 50-149 of my PGETC game :) . I scored 80% similarity and my opp* scored 90%. So we have the report at least 2 of Carlo's offline games are in the 70-80% range, and my data of [80,90]. 98 is looking less conclusive. I attach the spreadsheet I used for interest/verification. I note the sequence from move 93 to 136 would have all counted as copies under this metric were it not for 2 timesujis I played in the middle (most people don't get into byo yomi as earlier as I do). Next I might look at the stricter "top 1" instead of "top 3" metric.

* for ez4u, now ranked 3d, but a former 5d.
Attachments
Leela.xlsx
(11.61 KiB) Downloaded 552 times
User avatar
ez4u
Oza
Posts: 2414
Joined: Wed Feb 23, 2011 10:15 pm
Rank: Jp 6 dan
GD Posts: 0
KGS: ez4u
Location: Tokyo, Japan
Has thanked: 2351 times
Been thanked: 1332 times

Re: “Decision: case of using computer assistance in League A

Post by ez4u »

jlt wrote:
ez4u wrote:If you take 89% (Uberdude's 4d opponent), you get 1.8% or about one time in 54. However, if you plug in 80% (upper figure for what was observed for the player under discussion) you get a very different result = 0.0000065% or one time in 15 million. And if you take 70% (the lower figure), you get 5.0E-16 or about one time in 2 quadrillion.


No you don't. If p=0.8 then npn-1(1-p) is about 1/5600. If p=0.7 then npn-1(1-p) is about 1/(2.5 million).

The orders of magnitude are certainly very different, but I purposely picked p=0.89 to be on the safe side, i.e. I would prefer a few cheaters to go unpunished rather than too many punished innocents (together with the whole team).

And also, p=0.89 may not be unrealistic for that particular game if you assume that most moves were very "ordinary" (dixit Robert Jasiek).

You are right, a spreadsheet error on my part! :sad:
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
OferZ
Beginner
Posts: 6
Joined: Sat Jan 08, 2011 4:13 am
Rank: EGF 4d
GD Posts: 0
KGS: Song17
IGS: OferZivo
Wbaduk: ofer
Location: Israel

Re: “Decision: case of using computer assistance in League A

Post by OferZ »

I wonder if we'll get a comment from Metta himself anytime soon...

It's quite a complicated issue...

I really wonder why was only this game from the PGETC checked.
It seems a bit suspicious, considering it was perhaps the least impressive of his wins (he lost only to Csaba Mero and won against several 6d's and 5d's). Perhaps the reason is that it was the only game that someone appealed against (surprisingly, as it didn't change the outcome of the match, unlike other cases).

From Carlo's side, assuming the accusation is correct, I can understand why someone might choose to cheat when playing in a team. He's not playing just for his own sake and he knows that winning can have a great effect on his friends.. Staying or going down a league affects the whole next year.
(Still, no doubt, it's the wrong thing to do)...

Somehow I feel no problem with him being a referee in the EGC.
He'll be doing service to the EGF, and he'll be more obliged to prove himself...
bernds
Lives with ko
Posts: 259
Joined: Sun Apr 30, 2017 11:18 pm
Rank: 2d
GD Posts: 0
Has thanked: 46 times
Been thanked: 116 times

Re: “Decision: case of using computer assistance in League A

Post by bernds »

jlt wrote:Let's pick the highest percentage, i.e. 89%. Suppose for simplicity that for each move, the probability to find Leela's move is p=0.89. Then for n=50 moves, the probability to find correctly exactly 49 moves is npn-1(1-p)
How do you arrive at this?
Post Reply