Every assumption that we make about a theory or about its evidence weakens the support for the theory. Now to make a case for a theory, in lawyerly fashion, may mean making assumptions. That is not an indictment of making a case. But the standpoint of a scientist or detective is different from that of a lawyer or advocate. Not that scientists and detectives don't make cases for their conclusions, but their process, at least ideally, is largely as the Sherlock Holmes quote suggests, to eliminate theories.
The case against Metta began, IIUC, with the observation of a member of the Israeli team who was watching his game vs. Ben David, that a surprising number of Metta's plays matched the plays suggested by Leela 11. OC, that fact raises the question of cheating, and, IMO, justified filing a complaint. One way of cheating in online chess is to make the plays suggested by a superhuman engine. Analysis reveals that the suspected player, who has a modest rating, makes neither blunder nor mistake, but only makes a few of what chess players term
inaccuracies. Such play is superhuman. In addition, there is often behavioral evidence, such as the suspected player belittling his opponent.
Now, Simba strongly believes that Metta cheated in their game. He assumes that Metta used Leela Zero to do so. OC, that may be so, but it is an assumption. If Metta was cheating, why did he blunder away a number of stones early in the game? Simba assumes that Metta did not cheat early in the game, because Leela Zero is so strong that Metta could use it to win if he got behind. Now that is a plausible theory of cheating, but it is also ad hoc, tailored to fit the evidence.
Then there is the curious case, advanced by the anonymous accuser, of the now infamous move 156, where Metta did not pick the play recommended by Leela 11, which was a mistake, but picked the play recommended by Leela Zero. Why that might be relevant is a puzzle, given the theory that Metta used Leela Zero to cheat, anyway. To get there the anonymous accuser had to assume, as he stated on reddit, that once he was comfortably ahead Carlo started using both Leela 11 and Leela Zero, side by side, to cheat. Not only is that an additional assumption, it is implausible on its face.
Now, making assumptions weakens a case, but when the assumptions are not pointed out, doing so can appear to strengthen a case. The anonymous accuser's gratuitous assumption performs that function brilliantly. On move 156 Leela 11's play, which Metta did not choose, is a significant error, while Leela Zero's play, which Metta did choose, is not. Such a large discrepancy in evaluation between Leela 11 and Leela Zero is unusual, and Anonymous Accuser uses that fact as proof that Metta was cheating.
OC, the discrepancy is relevant only given the assumption that Metta was using both Leela 11 and Leela Zero simultaneously. And the so-called proof requires the hidden assumption that Metta would not have found move 156 if he were not cheating. In fact, it is an obvious candidate move.
The original case against Metta, made by the Israeli team, also makes assumptions that make the case appear stronger than it is while actually weakening it. One known way of cheating online is to use a superhuman program to choose your plays. Here the obvious suspect is Leela, especially since Metta says that he used it for training. One problem with that theory is that many of his moves do not match Leela's choice. One possibility, OC, is that Metta cheated in a different way. When asked how they might cheat using a superstrong bot, players on this forum suggested that they might use it to avoid blunders. To test that theory would involve looking at individual plays for evidence of errors. I have actually done that, as have others.
Another possibility is that, because Leela is non-deterministic, plays that it suggested to Metta might not match all of its suggestions when the program is run to check for matches. It is, in fact, very likely that the checking run of the program will not find all of the plays where Metta made the same play as Leela suggested. However, it works the other way, as well. There will be plays that match the checking run that did not match the run that Metta used, if he used Leela at all. This behavior of Leela is something that a scientist or detective would examine. In fact, because of the phenomenon called
regression to the mean, a run chosen
because it has a high number of matches would likely have a higher number of matches than a random run. In any event, the possibility that Metta made plays suggested by Leela that the checking run of Leela does not match does not justify matching second or third choices of that run to Metta's plays. The likely result of doing so would be to grossly overestimate the number of matches, when it is plausible that the number matches to Leela's top choice alone is already an overestimate, assuming that Metta picked Leela's top choice for several plays.
This is not a criticism of the Israeli team. Their job was to present a case, not to find a verdict. But by matching a range of Metta's plays to Leela's top three choices they gave the appearance, echoed by the claim — by whom, I forget, but it does not really matter — that the probability that Metta cheated in that game is greater than 90%. (The Israeli team found a match of 98% for Metta's plays in the range of moves 50 - 150. Other runs found, as we might have expected, lower matching rates around 93%.) Matches to Leela's top choice were 72%; other runs may have found matches in the mid-60% range. Adding matches to the second and third choices made a big difference in the impression that the evidence gave. 98% matches?
A guilty verdict appeared to be a slam dunk.
To paint such an impressive picture required the assumption that Metta would sometimes pick Leela's second choice and the assumption that he would sometimes pick Leela's third choice. The assumption that he would sometimes pick Leela's fourth choice was not necessary.
The additional assumption was made that he would not pick an obviously bad play. The assumption was also made that Metta would pick a second or third choice in order to avoid detection that he was cheating. This assumption seems rather implausible, given that Leela reveals its second and third choices, possibly among others. How does picking one of them avoid detection?
The choice of the range of 50 of Metta's plays is also suspicious. The arguments that later plays in the endgame might be unreliable and that earlier plays in the opening might provide too many matches because of joseki are plausible. But I cannot avoid the nagging suspicion that a wider range might have been less impressive.
(In particular, Metta's move 37 is problematic for the cheating hypothesis, as we shall see.)
One thing my undergraduate research methods professor stressed was this:
Quote:
Do not throw away any data.
OC, you may have data that are questionable, or outliers that you ignore in reaching your final conclusion. But you have to address those data and make your arguments. You don't just make some plausible assertions and then ignore data without even considering it. The human world is full of plausible assertions.
A lot of people assume that the opening is not a good place to look for evidence of cheating in go by using a super strong bot. I disagree. That may make sense in chess, where players memorize openings and chess engines use opening books. But the opening is more fluid in go, and, perhaps more importantly, super strong go bots excel in strategy, which is paramount in the opening. That's where you can use a bot to advantage to take an early lead. And humans are imitating bots in the opening already. They are making early 3-3 invasions, playing some new AI joseki, and making attachments and diagonal contact plays that humans used to avoid in the opening. Some New Fuseki style plays have made a comeback, as well. Using a bot in the opening is not a dead giveaway. (OC, using a bot in a semeai may be a giveaway when the bot makes a mistake.
)
While I have raised questions about the accusations against Metta in this note, my main point is to show the deleterious effect of assumptions. Not only do they weaken a case, they can make it look better than it is.