“Decision: case of using computer assistance in League A”

Bill Spight · Post by **Bill Spight** » Sun Mar 25, 2018 5:02 pm

hyperpape wrote:The go world will probably have to spend some time looking over Ken Regan's work in Chess. I can't vouch for it, though what I have read by him impresses me. Key things I can say: he claims that it is well within the realm of possibility to substantiate cheating allegations from a single game, but you usually have to go well beyond "how many of the engine's preferred moves did the human play?" to do so.

Thanks for the links to Regan's writing.

He provides another link to the Parable of the Golfers ( https://www.cse.buffalo.edu//~regan/che ... lfers.html ), in which he states what is close to my position.

Ken Regan wrote:the statistical analysis can only be supporting evidence of cheating, in cases that have some other concrete distinguishing mark.

I am enough of a Bayesian to accept very strong statistical evidence by itself, especially when the question is one of throwing out a result. In that article he refers to the civil court level of evidence. For disciplinary action I think that we should require much more. IMO there is enough evidence to have Carlos's play in future tournaments monitored.

Kirby · Post by **Kirby** » Sun Mar 25, 2018 7:02 pm

As I understand, AlphaGo Lee was trained on high dan-level games to create a policy network that gave it a probability distribution from a given board position that a pro would play a given move. Couldn't one use a similar process to train "policy networks" for various dan levels? So then, given a board position, you could see the probability that a 3d player would play move X, which may be different than the probability that a 2d player would play X, etc.

If you had a reliable policy network for each level of play, then you could calculate that overall probability that a 2d player would play the sequence of moves that he played in the game.

At least at that point, we could say with some confidence that, according to this policy network, there's a 1 in <some-number> chance that this player with this rank played this game.

mhlepore · Post by **mhlepore** » Sun Mar 25, 2018 7:05 pm

Once can envision future iterations of software where you can calibrate moves that are good, but just under a predetermined detection threshold.

I haven't played on IGS in ages, but if I recall, KGS tracks *when* moves were made. If IGS does the same, couldn't one see how quickly he is clicking in the game where cheating is alleged, and compare to other league games? Seems if he's using the computer there will be a lag between certain moves that won't be there without relying on Leela.

ez4u · Post by **ez4u** » Sun Mar 25, 2018 8:15 pm

mhlepore wrote:Once can envision future iterations of software where you can calibrate moves that are good, but just under a predetermined detection threshold.

I haven't played on IGS in ages, but if I recall, KGS tracks *when* moves were made. If IGS does the same, couldn't one see how quickly he is clicking in the game where cheating is alleged, and compare to other league games? Seems if he's using the computer there will be a lag between certain moves that won't be there without relying on Leela.

Based on the idea that your typical kyu player does not think?

ez4u · Post by **ez4u** » Sun Mar 25, 2018 8:28 pm

Bill Spight wrote:...

Thanks for the links to Regan's writing. He provides another link to the Parable of the Golfers ( https://www.cse.buffalo.edu//~regan/che ... lfers.html ), in which he states what is close to my position.
Ken Regan wrote:the statistical analysis can only be supporting evidence of cheating, in cases that have some other concrete distinguishing mark.
I am enough of a Bayesian to accept very strong statistical evidence by itself, especially when the question is one of throwing out a result. In that article he refers to the civil court level of evidence. For disciplinary action I think that we should require much more. IMO there is enough evidence to have Carlos's play in future tournaments monitored.

In this case shouldn't the Parable of the Golfers be modified to ask how many golfers we need in order to observe someone sink their drive on all 4 par 3's on a typical course?

As a Bayesian, if you start with an expectation of 80% accuracy (the high end of what was observed in FTF games), how do you interpret 98 out of 100 (or 49 out of 50?)? [This is not argumentative; I simply would like to know!]

pnprog · Post by **pnprog** » Sun Mar 25, 2018 9:21 pm

Jonas Egeberg wrote: As the manager of League A in PGETC I have been in charge of dealing with this matter. I of course had help from other strong, non-biased players in analyzing the games etc. For those asking, what we did is that we checked several of his offline games from recent tournaments, and we also verified with his opponents that they were the actual games played. We checked moves 50-150 and noted the moves as similar if they were within Leela's top 3 moves, and no further than 5% away from its top move. In those games the similarity was 70-80%. We then went back to the PGETC games and checked the game against Isreal. In that game we found that the similarity was 98%, where the only move that was different was Leela's move number 4, but still within 1% winrate of its top move.

I am not that good at statistics, but if we consider that a player, in average, plays 70-80% of moves that are similar to Leela offline (let's take 80% moves), then what is the probability that he plays 49 or 50 similar moves out of 50 online?

In my simulation, the probability is below 0.02%

Of course, one have to check carefully that the measurement is consistent of both cases, and backed by enough data for the offline measure, and we make the assumption that offline performance can be translated in online performance, and that it is consistent across different opponents, and so on, and so on, and so on...

Beside, this is mildly related to the topic, but sometime ago, I stumbled upon this article https://www.chess.com/article/view/bett ... m#comments that is considering a replacement for ELO. In my understanding, this measure a similarity of play between chess players and chess bots as a measure of the player's strength. I was considering adding that sort of calculation into GoReviewPartner, but just for fun

Bill Spight · Post by **Bill Spight** » Sun Mar 25, 2018 10:24 pm

ez4u wrote:
Bill Spight wrote:...

Thanks for the links to Regan's writing. He provides another link to the Parable of the Golfers ( https://www.cse.buffalo.edu//~regan/che ... lfers.html ), in which he states what is close to my position.
Ken Regan wrote:the statistical analysis can only be supporting evidence of cheating, in cases that have some other concrete distinguishing mark.
I am enough of a Bayesian to accept very strong statistical evidence by itself, especially when the question is one of throwing out a result. In that article he refers to the civil court level of evidence. For disciplinary action I think that we should require much more. IMO there is enough evidence to have Carlos's play in future tournaments monitored.
In this case shouldn't the Parable of the Golfers be modified to ask how many golfers we need in order to observe someone sink their drive on all 4 par 3's on a typical course?

First, let me say something about being a Bayesian. Before the 20th century, everybody was a Bayesian, but by then were aware of its problems. To be a Bayesian is to believe that hypotheses have, or may have, probabilities. Frequentists believe that only events have probabilities. Frequentists won the field in the early 20th century, as ideas such as hypothesis testing were developed without assigning probabilities to hypotheses. In the mid 20th century there were still a few famous Bayesians, such as L. J. Savage and I. J. Good. I was fortunate enough to spend an afternoon with Savage, who showed me that I was a closet Bayesian.

Later in the 20th century Bayesianism had a rebirth, perhaps because of computer programmers who were able to write programs that made sense in Bayesian terms, and those programs worked.

Bayesiansism when I came across it, in the mid 20th century, was largely subjectivist, because how you alter your probabilities in the face of evidence depends upon your prior beliefs. This subjective property, OC, is a major problem from a scientific point of view. I. J. Good talked about how to interrogate yourself to find out what your probabilistic beliefs were. The question you ask is a good example. What would be convincing statistical evidence in itself? How about aceing all four par threes on the course?

As a Bayesian, if you start with an expectation of 80% accuracy (the high end of what was observed in FTF games), how do you interpret 98 out of 100 (or 49 out of 50?)? [This is not argumentative; I simply would like to know!]

IIRC, Uberdude observed 89% in his opponent, who was of comparable strength but had not undergone Carlos's training regime. Anyway, as a Bayesian, you have to have a non-zero belief in any probability that you may possibly come to assign to a hypothesis, or you will never assign that probability. IOW, your beliefs have a distribution over a range of probabilities. So you can't just focus on, say, 75% or 80% in itself.

I know in a way I am dodging the question, but let me say that first, I would want to know more about the games and what kind of moves he made which were like Leela's and what kind were not, and so on. Second, like just about everybody else, I think that there is enough evidence to think that in this one game he played unusually like Leela. But there could be any number of reasons for that. Cheating is not the only hypothesis to consider. It may be the one with the highest prior degree of belief, and the one that is most bolstered by the evidence, but that still may not give it a high posterior degree of belief. It may be the most supported hypothesis, but that is a different question.

Another question that Baysians must ask is how do we know what we know? An example from the early scientific literature has to do with the probability that the sun will rise tomorrow. Suppose we start by assigning equal weight to each non-zero probability that the sun would rise the next day, starting from day 1, which in those days was supposed to be some 6,000 years in the past. As we update our beliefs with the sun rising every day for 6,000 years, the probability that it would rise tomorrow becomes very, very, close to 1. (Yay!) But someone, I forget who and where, published a paper pointing out that on the same evidence the probability that the sun would rise 5,000 years from today was only ⅔. That result was not appealing.

That result illustrates the problematic nature of Bayesianism for science. Speaking for myself, my belief about the sun rising has to do with the rotation of the earth. Given that knowledge, whether the sun rose yesterday is irrelevant to whether it will rise tomorrow.

In the parable of the golfers Regan states that the chance of a scratch golfer making a hole in one on a par three is about 1/5000. If we gathered 10,000 such golfers and had them each hit a ball from the tee on a par three hole, we would expect 2 of them to make a hole in one. He goes on to say that suppose that we slipped a piece of paper with a black dot on it into the pocket of 10 of those 10,000, and one of them made a hole in one. The probability that (at least) one of them would do so is around 1/500, i.e., (1 - (4999/5000)^10). The statistical evidence is very strong that this is not just a chance event, that there is a reason for it. Good enough, Regan suggests, to win a civil case in court.

Something that he does not go on to say, but I think that he should have said, is that once we know about the black dots, we should chalk up that hole in one to chance. Having a black dot in your pocket is irrelevant to your golfing ability. But in the parable he has the black dot stand for "physical or observational evidence of cheating". I.e., given such evidence, then the hole in one (statistically very rare occurrence) may offer support. In this case, all we have is the statistical evidence. IMO, this evidence supports looking for physical or observational evidence of cheating. For example, by monitoring Carlos's play in the future.

Frequentists agree, BTW. I still remember the prof announcing, "Statistics proves nothing."

RobertJasiek · Post by **RobertJasiek** » Sun Mar 25, 2018 10:57 pm

Humans can learn from programs and apply what they have learnt. The result can be very similar games. Therefore, probability is not evidence. What we do need in important online games is referees locally supervising the players to prevent almost all cheating of all forms.

Javaness2 · Post by **Javaness2** » Sun Mar 25, 2018 11:06 pm

The game in question has 166 moves, the allegation is presumably that after move 4 black starts using Leela. In other words he turned on Leela after opting for a double 3-3 fuseki. After that every single move chosen is a 'top' Leela move. The game doesn't look that remarkable to me. The mathematics behind the decision has, especially in the case of an appeal, to be the subject of some debate if online games are still to be taken into account.

Nobody has answered my question yet though, is there any appeal of the decision?

Uberdude · Post by **Uberdude** » Mon Mar 26, 2018 1:01 am

On the issue of the (mis)use of statistics in (in)justice:
- https://en.wikipedia.org/wiki/Roy_Meadow
- https://en.wikipedia.org/wiki/Lucia_de_Berk
- https://en.wikipedia.org/wiki/Birmingham_Six (many other problems here, but part of the evidence was an expert witness saying he was 99% sure a chemical test proved the accused had been handling explosives, but it turned out handling playing cards could produce a false positive (from their similar to nitroglycerine coating), and they had indeed been playing cards)

Javaness2 · Post by **Javaness2** » Mon Mar 26, 2018 1:28 am

Uberdude wrote:On the issue of the (mis)use of statistics in justice:
- https://en.wikipedia.org/wiki/Roy_Meadow
- https://en.wikipedia.org/wiki/Lucia_de_Berk
- https://en.wikipedia.org/wiki/Birmingham_Six

I wasn't aware that the Birmingham Six was an example of mis-use of statistics; wasn't it mostly the simple fabrication of evidence? Mind you the TOS probably forbids me discussing that here, so here is the game from November instead.

]

aitkensam · Post by **aitkensam** » Mon Mar 26, 2018 2:14 am

Is there a reason why the other games played in this year's league did not form part of the investigation? They would presumably provide useful data for determining whether the game against Israel was out of the ordinary for this player. Or whether all online games show a different pattern to all offline games etc.

zermelo · Post by **zermelo** » Mon Mar 26, 2018 2:15 am

Some people have here mentioned 98% out of 100 moves, but I suppose it is actually 50 moves of the player, within moves 50-150 of the game.

I think it is very important why exactly moves 50-150 were studied. Was this decided beforehand? Why are not numbers given for the whole game?

goer · Post by **goer** » Mon Mar 26, 2018 2:18 am

This guy Metta is the chief referee of the coming EGF Congress in Pisa. What happens then, will the EGF allow him to referee?

goer · Post by **goer** » Mon Mar 26, 2018 2:19 am

aitkensam wrote:Is there a reason why the other games played in this year's league did not form part of the investigation? They would presumably provide useful data for determining whether the game against Israel was out of the ordinary for this player. Or whether all online games show a different pattern to all offline games etc.

That's a good point.

Life In 19x19

“Decision: case of using computer assistance in League A”

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A