“Decision: case of using computer assistance in League A”
- Bonobo
- Oza
- Posts: 2223
- Joined: Fri Dec 23, 2011 6:39 pm
- Rank: OGS 9k
- GD Posts: 0
- OGS: trohde
- Universal go server handle: trohde
- Location: Germany
- Has thanked: 8262 times
- Been thanked: 924 times
- Contact:
Re: “Decision: case of using computer assistance in League A
»New evidence for re-opening the Carlo Metta case«:
https://psv4.userapi.com/c848124/u63838 ... idence.pdf
(again, I don’t understand any of this)
https://psv4.userapi.com/c848124/u63838 ... idence.pdf
(again, I don’t understand any of this)
“The only difference between me and a madman is that I’m not mad.” — Salvador Dali ★ Play a slooooow correspondence game with me on OGS? 
-
AlesCieply
- Dies in gote
- Posts: 65
- Joined: Mon Sep 10, 2012 5:07 am
- GD Posts: 0
- Has thanked: 31 times
- Been thanked: 55 times
Re: “Decision: case of using computer assistance in League A
Hello, I guess you know I was involved in dealing with the case as a member of the PGETC appeals committee. Since about that time I started to look into the matter also on my own trying to devise a better and statistically more sound method to check if someone used AI in internet games or not. My analysis is based on comparing the player performance in internet and live games. While working on it I found what I believe is a new evidence and last Friday I informed the EGF executive and involved parties. As it looks it is becoming a public knowledge you may better have it directly from me, the supplied document is here,
https://drive.google.com/file/d/1NaWwHx ... sp=sharing
The analysis itself evolved a bit since then (in particular, the Kulkov-Metta PGETC game was added), the current version is here
https://docs.google.com/spreadsheets/d/ ... sp=sharing
I would very much appreciate if it was reviewed and checked by others, I am also open to any critic how to improve it or what mistakes you find in it. I still do not consider it as an end product. I would like to add few more games and make comparison with games analyzed for different players. I would also like to repeat the analysis for some sequences where Leela might not be sufficiently precise or consistent. However, the work on it is rather slow and tedious. Feel free to contribute to it.
On the first sheet you also find a Pearson's chi-square test to compare the compatibility of the two histograms, for Carlo's internet and regular games. I am not an expert on statistics but I was told that the p-value represents the probability that the two sets could be results for the same population. In our case it means that the probability of both sets of games being played by one person is now about 0.0001 (=0.01%).
Finally, I would very much appreciate if Carlo Metta came out and explained why he presented an apparently fabricated game record to the league manager. I do believe he is in principle an honest man who has done a lot for the go community and can continue to do so. I just think he made a mistake with using AI in his internet games and now is afraid of admitting it.
EDIT: Here I refer to a game record from the Shakhov-Metta game Carlo himself suplied (among several other records) claiming it was played at regular tournament and contained also many moves "similar to Leela". In fact, the game was played at KGS and the record was edited to look as played "live", see the report for more details on it.
https://drive.google.com/file/d/1NaWwHx ... sp=sharing
The analysis itself evolved a bit since then (in particular, the Kulkov-Metta PGETC game was added), the current version is here
https://docs.google.com/spreadsheets/d/ ... sp=sharing
I would very much appreciate if it was reviewed and checked by others, I am also open to any critic how to improve it or what mistakes you find in it. I still do not consider it as an end product. I would like to add few more games and make comparison with games analyzed for different players. I would also like to repeat the analysis for some sequences where Leela might not be sufficiently precise or consistent. However, the work on it is rather slow and tedious. Feel free to contribute to it.
On the first sheet you also find a Pearson's chi-square test to compare the compatibility of the two histograms, for Carlo's internet and regular games. I am not an expert on statistics but I was told that the p-value represents the probability that the two sets could be results for the same population. In our case it means that the probability of both sets of games being played by one person is now about 0.0001 (=0.01%).
Finally, I would very much appreciate if Carlo Metta came out and explained why he presented an apparently fabricated game record to the league manager. I do believe he is in principle an honest man who has done a lot for the go community and can continue to do so. I just think he made a mistake with using AI in his internet games and now is afraid of admitting it.
EDIT: Here I refer to a game record from the Shakhov-Metta game Carlo himself suplied (among several other records) claiming it was played at regular tournament and contained also many moves "similar to Leela". In fact, the game was played at KGS and the record was edited to look as played "live", see the report for more details on it.
Last edited by AlesCieply on Tue Jun 05, 2018 6:25 am, edited 1 time in total.
-
Javaness2
- Gosei
- Posts: 1545
- Joined: Tue Jul 19, 2011 10:48 am
- GD Posts: 0
- Has thanked: 111 times
- Been thanked: 322 times
- Contact:
Re: “Decision: case of using computer assistance in League A
I think that the first point in your summary is quite debatable right now.
Has it really not been already the subject of analysis to look at every internet tournament he played; show performance rating there, relative to his offline rating performances? Especially in light of this third point, I would say that Carlo's overall performance in the KGS event - I see it is http://www.europeangodatabase.eu/EGD/To ... n=16762284 (but there is also) http://www.europeangodatabase.eu/EGD/To ... y=T171018A - would be interesting.
Coming back to Bojanic's point, it is hard to believe smart guys are going to deliberately disguise their internet games. So I imagine that there is some explanation there.
The basic idea, that he did very well in this year's PGETC is of course relevant as an initial starting point. However, the winning percentages from Go Rating are probably not very reliable. Thus the figure you quote (1/3000) is probably best left out. Andrew Simon's already mentioned two players with similar 'super' performances this year.Carlo Metta’s performance in the first 7 PGETC league games was so exceptional that such a feat may occur[e] once in about 3000 tournaments.
Has it really not been already the subject of analysis to look at every internet tournament he played; show performance rating there, relative to his offline rating performances? Especially in light of this third point, I would say that Carlo's overall performance in the KGS event - I see it is http://www.europeangodatabase.eu/EGD/To ... n=16762284 (but there is also) http://www.europeangodatabase.eu/EGD/To ... y=T171018A - would be interesting.
Coming back to Bojanic's point, it is hard to believe smart guys are going to deliberately disguise their internet games. So I imagine that there is some explanation there.
Last edited by Javaness2 on Tue Jun 05, 2018 4:46 am, edited 1 time in total.
-
Tryss
- Lives in gote
- Posts: 502
- Joined: Tue May 24, 2011 1:07 pm
- Rank: KGS 2k
- GD Posts: 100
- KGS: Tryss
- Has thanked: 1 time
- Been thanked: 153 times
Re: “Decision: case of using computer assistance in League A
A possible source of biais is that you're comparing games he won with games he mostly lost.
The game against Vasquez is closer to the online games than the other regular games, but that is also the one he won.
The game against Vasquez is closer to the online games than the other regular games, but that is also the one he won.
- Charlie
- Lives in gote
- Posts: 310
- Joined: Mon Feb 06, 2012 2:19 am
- Rank: EGF 4 kyu
- GD Posts: 0
- Location: Deutschland
- Has thanked: 272 times
- Been thanked: 126 times
Re: “Decision: case of using computer assistance in League A
Just to be clear: the allegedly fabricated game record is *not* the kifu from the game that raised accusations of cheating but another game, between Carlo Metta and Kim Shakhov. You really should be specific in this instance.AlesCieply wrote:Finally, I would very much appreciate if Carlo Metta came out and explained why he presented an apparently fabricated game record to the league manager.
-
AlesCieply
- Dies in gote
- Posts: 65
- Joined: Mon Sep 10, 2012 5:07 am
- GD Posts: 0
- Has thanked: 31 times
- Been thanked: 55 times
Re: “Decision: case of using computer assistance in League A
May you provide a reference? I do not recall any 4d (and not fast improving!) player performace like that. Of course, there are fast improving 1d players who perform as 3d at tournaments regularly. I agree, the figure 3000 tournaments is approximate, thought even if it was 1000 ...The basic idea, that he did very well in this year's PGETC is of course relevant as an initial starting point. However, the winning percentages from Go Rating are probably not very reliable. Thus the figure you quote (1/3000) is probably best left out. Andrew Simon's already mentioned two players with similar 'super' performances this year.
I am quite specific on it in the report, did not feel like copy/pasting from the report when people can read it.Just to be clear: the allegedly fabricated game record is *not* the kifu from the game that raised accusations of cheating but another game, between Carlo Metta and Kim Shakhov. You really should be specific in this instance.
A possible source of bias is that you're comparing games he won with games he mostly lost. A possible source of bias is that you're comparing games he won with games he mostly lost.
I am definitely aware of it. The problem is they are not that many regular games Carlo won recently with the records available.
- Charlie
- Lives in gote
- Posts: 310
- Joined: Mon Feb 06, 2012 2:19 am
- Rank: EGF 4 kyu
- GD Posts: 0
- Location: Deutschland
- Has thanked: 272 times
- Been thanked: 126 times
Re: “Decision: case of using computer assistance in League A
Do not be arrogant. Many people who read this thread will not go and download your PDF and read it in great detail.AlesCieply wrote:I am quite specific on it in the report, did not feel like copy/pasting from the report when people can read it.Just to be clear: the allegedly fabricated game record is *not* the kifu from the game that raised accusations of cheating but another game, between Carlo Metta and Kim Shakhov. You really should be specific in this instance.
You are not only accusing someone of cheating but also accusing them of fraudulently fabricating evidence! The very least you could do is exercise some care and diligence in doing so!
-
Javaness2
- Gosei
- Posts: 1545
- Joined: Tue Jul 19, 2011 10:48 am
- GD Posts: 0
- Has thanked: 111 times
- Been thanked: 322 times
- Contact:
Re: “Decision: case of using computer assistance in League A
I am actually surprised you don't already have this data, because this tournament is so obvious to check.AlesCieply wrote: May you provide a reference? I do not recall any 4d (and not fast improving!) player performace like that. Of course, there are fast improving 1d players who perform as 3d at tournaments regularly. I agree, the figure 3000 tournaments is approximate, thought even if it was 1000
Just sort this list of performances http://www.europeangodatabase.eu/EGD/To ... y=T160920A
Hidden is the raw gain (at 50%) but of course TPR should be bigger
-
Javaness2
- Gosei
- Posts: 1545
- Joined: Tue Jul 19, 2011 10:48 am
- GD Posts: 0
- Has thanked: 111 times
- Been thanked: 322 times
- Contact:
Re: “Decision: case of using computer assistance in League A
Let us say that the word 'fabricated' was not a good choice here. I would have gone for 'modified'. Especially in a paper like this.Just to be clear: the allegedly fabricated game record is *not* the kifu from the game that raised accusations of cheating but another game, between Carlo Metta and Kim Shakhov. You really should be specific in this instance.
-
bernds
- Lives with ko
- Posts: 259
- Joined: Sun Apr 30, 2017 11:18 pm
- Rank: 2d
- GD Posts: 0
- Has thanked: 46 times
- Been thanked: 116 times
Re: “Decision: case of using computer assistance in League A
The report, as I understand it, says Carlo submitted it as an example of an over-the-board tournament game, and it turned out to be a KGS record instead. The word "fabrication" is entirely appropriate if that is indeed correct, and IMO if this is indeed what happened, it justifies any penalty. If you lie to the court, you deserve whatever you get.Javaness2 wrote:Let us say that the word 'fabricated' was not a good choice here. I would have gone for 'modified'. Especially in a paper like this.Just to be clear: the allegedly fabricated game record is *not* the kifu from the game that raised accusations of cheating but another game, between Carlo Metta and Kim Shakhov. You really should be specific in this instance.
-
Uberdude
- Judan
- Posts: 6727
- Joined: Thu Nov 24, 2011 11:35 am
- Rank: UK 4 dan
- GD Posts: 0
- KGS: Uberdude 4d
- OGS: Uberdude 7d
- Location: Cambridge, UK
- Has thanked: 436 times
- Been thanked: 3718 times
Re: “Decision: case of using computer assistance in League A
From earlier in thread:AlesCieply wrote:May you provide a reference? I do not recall any 4d (and not fast improving!) player performace like that. Of course, there are fast improving 1d players who perform as 3d at tournaments regularly. I agree, the figure 3000 tournaments is approximate, thought even if it was 1000 ...The basic idea, that he did very well in this year's PGETC is of course relevant as an initial starting point. However, the winning percentages from Go Rating are probably not very reliable. Thus the figure you quote (1/3000) is probably best left out. Andrew Simon's already mentioned two players with similar 'super' performances this year.
Just to go back to Carlo, I thought I'd work out his performance rating for this season's PGETC. He had great results for a 4d:
- beat Andrey Kulkov 6d (Russia) by 1.5
- beat Ondrej Kruml 5d (Czechia) by 2.5
- beat Dragos Bajenaru 6d (Romania) by resign
- beat Reem Ben David 4d (Israel) by resign *** the famous 98% game
- lost to Mero Csaba 6d (Hungary) by 2.5
- beat Mijodrag Stankovic "5d" 3d by resign
- lost to Andrij Kravets 7d/1p by 7.5
At the start of the season in (1st) September Carlo's rating was 2381 [very similar to me], this was after picking up 50 points at the EGC. Of course his true strength could have been more than that and grown since then too but his rating lagged. His performance rating (using EGD GoR calculator), using current ratings of opponents is 2629, or +248.
How does that compare to other good performances?
Forum regulars may remember I beat Victor Chow 7d from South Africa a few years ago. UK were in league C for the 2014/15 season and my initial rating was 2361. My results were:
- beat Petrauskas 3d (Lithuania) by resign
- beat Chow 6/7d (South Africa) by 0.5
- beat Ganeyev 3k (Kazakhstan) by resign.
As I had no losses my performance rating with the "adjust until input = output" method is infinite, anchoring with a loss to 2700 gives 2666, anchoring with loss to 2800 gives 2719. So +300 ish with big uncertainty as no losses and few games, the only useful information is I beat a 2616 in one game, how flukey was that?
Last season Daniel on the UK team had no losses, this season he had just 1:
- beat Rasmusson 4d (Denmark)
- beat Karadaban 5d (Turkey)
- beat Welticke 6d (Germany)
- lost to Lin 6d (Austria)
Initial rating was 2402. Performance rating 2616 (+214).
If you include the wins (included some 5ds) from the previous season (for which his initial rating was 2262 but he probably wasn't much weaker than he is now) as well then you get performance rating of 2677 (+415).
Update: Chris this season:
- beat Isaksen 2d (Denmark)
- beat Schlattner 2d (Switzerland)
- beat Kuntay 2d (Turkey)
- beat Palant "5d" 4d (Germany) [quotes is his stated grade, no quotes is GoR where 4d is 2351->2450]
- beat Laatikainen "5d" 4d (Finland)
- beat Unger "3d" 4d (Austria)
- beat Hanevik 3d (Norway)
- beat Groenen "6d" 5d (Netherlands)
- beat Ouchterlony "4d" 3d (Sweden)
- lost to Metta 4d (Italy)
Initial rating 2284. Performance rating 2568 (+284). And if like Lukan you believe Carlo was using LeelaZero (I estimate EGF GoR ~2900) in the last game he gets 2781 (+497)
-
Javaness2
- Gosei
- Posts: 1545
- Joined: Tue Jul 19, 2011 10:48 am
- GD Posts: 0
- Has thanked: 111 times
- Been thanked: 322 times
- Contact:
Re: “Decision: case of using computer assistance in League A
If you don't want your evidence to seem neutral, go ahead and choose fabrication. There are some other f words (foolish) you can in there while you are at it.bernds wrote:The report, as I understand it, says Carlo submitted it as an example of an over-the-board tournament game, and it turned out to be a KGS record instead. The word "fabrication" is entirely appropriate if that is indeed correct, and IMO if this is indeed what happened, it justifies any penalty. If you lie to the court, you deserve whatever you get.
-
AlesCieply
- Dies in gote
- Posts: 65
- Joined: Mon Sep 10, 2012 5:07 am
- GD Posts: 0
- Has thanked: 31 times
- Been thanked: 55 times
Re: “Decision: case of using computer assistance in League A
This one really stands out, I admit. Thanks for providing the reference. Such performances are still quite rare and I would not consider it as a proof of anyone cheating on its own. I hope that is also clear from what I say in the report. Do also note that Daniel's strength/rating is still improving and does not look as settled as the Carlo Metta's one.Uberdude wrote: Last season Daniel on the UK team had no losses, this season he had just 1:
- beat Rasmusson 4d (Denmark)
- beat Karadaban 5d (Turkey)
- beat Welticke 6d (Germany)
- lost to Lin 6d (Austria)
Initial rating was 2402. Performance rating 2616 (+214).
If you include the wins (included some 5ds) from the previous season (for which his initial rating was 2262 but he probably wasn't much weaker than he is now) as well then you get performance rating of 2677 (+415).
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: “Decision: case of using computer assistance in League A
To me, this behavioral evidence of doctoring and submitting a game record is the strongest evidence of cheating so far. (Assuming that it holds up, OC.AlesCieply wrote:Finally, I would very much appreciate if Carlo Metta came out and explained why he presented an apparently fabricated game record to the league manager. I do believe he is in principle an honest man who has done a lot for the go community and can continue to do so. I just think he made a mistake with using AI in his internet games and now is afraid of admitting it.
EDIT: Here I refer to a game record from the Shakhov-Metta game Carlo himself suplied (among several other records) claiming it was played at regular tournament and contained also many moves "similar to Leela". In fact, the game was played at KGS and the record was edited to look as played "live", see the report for more details on it.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
Bojanic
- Lives with ko
- Posts: 142
- Joined: Fri May 06, 2011 1:35 pm
- Rank: 5 dan
- GD Posts: 0
- Has thanked: 27 times
- Been thanked: 89 times
Re: “Decision: case of using computer assistance in League A
Analysis on rating that Ales made imho are just signal for a lamp to go up.
Same as when weaker player wins, or when someone is playing stronger online.
Also signal for alarm is when I look at deviations diagram in GRP, when I notice that it goes up for one side and continues to rise, is rather suspicious.
But there should be next step in analysis, going move by move, because some things could be deceiving.
Today I have analyzed game from PGETC (none of the mentioned here) where basically every move from one player is Leela's suggestion.
Basically, 90% of moves were A and B suggestions, and only one move was not suggested by Leela (although it looks nice).
Same as when weaker player wins, or when someone is playing stronger online.
Also signal for alarm is when I look at deviations diagram in GRP, when I notice that it goes up for one side and continues to rise, is rather suspicious.
But there should be next step in analysis, going move by move, because some things could be deceiving.
Today I have analyzed game from PGETC (none of the mentioned here) where basically every move from one player is Leela's suggestion.
Basically, 90% of moves were A and B suggestions, and only one move was not suggested by Leela (although it looks nice).