Bill Spight wrote:Depends upon what you mean by statistical.
Take the question, will the sun rise tomorrow? If we do not know about celestial mechanics, that is iffy. Will Phoebus Apollo have a hangover tomorrow and sleep in?
A statistical answer was attempted, based upon historical (i.e., biblical) evidence that the sun had risen every day for 6,000 years or so. Using a Laplacian prior, the probability is near certainty. But based upon knowledge that the earth revolves on its axis, the probability is even closer to 1. Keynes and Good would have been happy to combine astronomical knowledge with statistical knowledge in terms of Bayesian probability. (Maybe not in this particular instance, but generally, utilizing non-statistical knowledge in the prior. Keynes's priors were not necessarily numerical.) Moi, I distinguish between the types of evidence. (As I did in these sentences.)

For cheating, Regan's physical and behavioral evidence I do not consider to be statistical.
BlindGroup wrote:I think we may be trying to make slightly different points. If I understand you correctly, what you are saying is that you prefer to distinguish between two types of evidence: evidence that is easily quantifiable and evidence that while relevant does not lend itself to mathematical treatment.
The social sciences distinguish between quantitative and qualitative evidence, and today a good bit of research involves "triangulation", i.e., a combination of both. The current replication crisis in the social sciences comes in part from a realization that in the past too much weight was given to statistical evidence alone. Rejecting a null hypothesis is disconfirmatory, but that is only confirmatory for any other hypothesis. It is hardly surprising that results based upon weak evidence are not replicated.
A good example of that -- not, I repeat, not -- an example of social science research comes from a Science and Consciousness talk I went to back in the 1990s at the University of California in San Francisco. A mathematician had made a study of a psychokinesis experiment at Princeton (

) and found that the data were very close to a normal distribution (p << 0.001), among other findings, which he took to be indicative of ESP. One physicist stood up and roundly criticized the mathematician's conclusions on the basis of physical theory. As a Bayesian, I was not terribly concerned about the fact that the guy had obviously gone looking for a low p values which had not been specified beforehand. He had found a good one.

However, I did not take it as evidence for ESP, but as evidence that the data had been faked.
My point though is a bit different. Acknowledging that there are both types of evidence, there is a tendency to say, because we can't quantify everything, let's ignore statistics.
My experience is the opposite, at least among those trying to do science. Maybe we run with different crowds.
I'm arguing that is a mistake.
I agree.
Statistics has more to offer than just a quantification tool. Even if it is not possible to calculate actual probabilities for things using statistical formulas, the mathematical properties can still guide us in how to evaluate evidence and set up decision rules even when considering non-statistical evidence.
I agree, as well.
But confirmatory statistics about 50 possible matches in one game is not good statistical evidence. It may be good enough to raise suspicions and invite the collection of further evidence, but that's all.
Uberdude did go looking for further evidence, including the matches to Leela's choices in other games that Carlo won in the same tournament. Those games were against stronger players than Carlo's opponent in the game in question and had lower numbers of matches than that game. To me, those results cast further doubt upon the assertion that Carlo had been cheating.
Let me go back to the ESP research. The mathematician had no theory as to why a close fit of the data to a normal distribution would indicate ESP. It just did. I, OTOH, had a good theory as to why that close fit would indicate faking the data. It is well known that a large amount of data usually conform to a normal distribution, so if you are faking it, you want the fake data to conform, as well. The question of too good a fit was not a concern to the faker or fakers, because who -- except maybe a crank mathematician -- would test that goodness of fit?
Based upon online cheating at chess (outside of tournaments) it seems like a lot of that involves using a chess engine to pick the plays. Because the top plays fluctuate as the engine does its calculations, and because different engines might differ slightly in choice of plays, one among the 3 top choices, as long as it is not too bad, will produce nearly a 100% match. Perhaps that is where the idea of using a match to the 3 top choices comes from.
Suppose we accept that theory. Then Carlo's moves in the games against the stronger players should also show a nearly 100% match. They don't. So what do we say about that? Carlo chose to cheat against a 4 dan, but not to cheat against 6 dans?
There is an analogy to Rasch testing here. In Rasch testing if a test taker does better on harder questions than easier questions, it may be that the meaning of some of those questions is different for that person than for others. Games against 6 dans are like harder questions, a game against a 4 dan is like an easier question. If any theory explains the matching results, how can it be to cheat by playing Leela's choices against the 4 dan but not against the 6 dans? OC, an explanation may be possible, but one has not been given.