AlesCieply wrote:frmor wrote:3) If you are not using a measurable quantity, but expert opinion, you should use more than one expert. You should let the experts analyze the games without them knowing which ones are from live games and which ones are from online games and without them knowing whether Carlo played white or black. Moreover, you should secretly give them also some random live games by other players of similar level as a control group.
I cannot agree more with you on this. In fact, this is what I suggested when talking to some people that Carlo's games should be sent for an expert (high level pro) review:
- 3 experts are found, each of them is provided with 3 sets of games, played by 3 players
- the players would be anonymized as Player A (Carlo in PGETC), Player B (Carlo at regular tournaments), Player C (any EGF pro player, maybe some less known to the experts)
- the experts would be asked to estimate how strong the players are (just ordering them according their strength would do) and if they feel 2 of the sets were played by the same player
-
I would consider the expert's view as a proof of cheating (innocence) if all of them agreed on both questions and thought that Player A was different from (or the same as) Player B
(Emphasis mine.)
First,
proof is a bit too strong.
Evidence is better.
But all this, to me, is maddeningly indirect. It substitutes the question — does Carlo A play like Carlo B? — for the question of whether Carlo A cheated. OC, the questions are related, but not the same. And answering the question does not necessarily require expertise at go. Also, the answer to the other question, about the level of play, is indicated by the choice of games. We know that Carlo played better in that tournament than in other games. The question before us is why.
To utilize the expertise of the judges, I would like them to examine the game records to look for evidence of cheating (or not!). Let me give a couple of examples. I do not claim to be an expert, but I took a look at the fifty plays in question in the Metta-Reem game. It seemed to me that all but eleven were either obvious plays that a kyu player might well find, or plays that were part of one lane roads, where plays were part of a consistent sequence, so that a play in the sequence, if not necessarily obvious, only made sense given earlier plays in the sequence. I did not judge whether any of those eleven plays were evidence of cheating or not, but I attempted to eliminate the other plays from consideration. In chess, we have an example of play by a known cheater. See the link,
https://www.chess.com/news/view/life-ti ... r-cheating , which sorin posted here. It is the first example of play in the article. The cheater, who had the possibility of simplifying an obviously won game by trading queens, a line of play which, human vs. human, would probably have led to a quick resignation, instead chose a lengthy combination in which he sacrificed three pawns but ended with a checkmate in three, at which point his opponent resigned. The evidence of cheating in that game is not statistical, but behavioral. That is why I said, let the case be made, by Bojanic and/or others examining the game records. The case, it seems to me, would rest upon behavioral evidence, not statistical evidence.
But there is an important statistical question that arises. Can expert go players reliably evaluate game records for evidence of cheating? To answer this question we can do a test such as Javaness2 suggests, in which players cheat in some games. To make the test sensitive, and to some extent to simulate the situation where a player is already suspected of cheating, have half the games be ones with cheating and half be ones without cheating. Let each expert divide the games into two groups accordingly. For instance, you could have half the games in which 6 dans played without cheating and half in which 4 dans cheated using Leela 11. (My suspicion is that even go pros could easily fail the test, at this point in time.)