It is currently Thu Mar 28, 2024 3:07 pm

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 720 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 36  Next
Author Message
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #21 Posted: Sun Mar 25, 2018 9:21 pm 
Lives with ko
User avatar

Posts: 284
Liked others: 94
Was liked: 153
Rank: OGS 7 kyu
Jonas Egeberg wrote:
As the manager of League A in PGETC I have been in charge of dealing with this matter. I of course had help from other strong, non-biased players in analyzing the games etc. For those asking, what we did is that we checked several of his offline games from recent tournaments, and we also verified with his opponents that they were the actual games played. We checked moves 50-150 and noted the moves as similar if they were within Leela's top 3 moves, and no further than 5% away from its top move. In those games the similarity was 70-80%. We then went back to the PGETC games and checked the game against Isreal. In that game we found that the similarity was 98%, where the only move that was different was Leela's move number 4, but still within 1% winrate of its top move.


I am not that good at statistics, but if we consider that a player, in average, plays 70-80% of moves that are similar to Leela offline (let's take 80% moves), then what is the probability that he plays 49 or 50 similar moves out of 50 online?

In my simulation, the probability is below 0.02%

Of course, one have to check carefully that the measurement is consistent of both cases, and backed by enough data for the offline measure, and we make the assumption that offline performance can be translated in online performance, and that it is consistent across different opponents, and so on, and so on, and so on...

Beside, this is mildly related to the topic, but sometime ago, I stumbled upon this article https://www.chess.com/article/view/better-than-ratings-chess-com-s-new-caps-system#comments that is considering a replacement for ELO. In my understanding, this measure a similarity of play between chess players and chess bots as a measure of the player's strength. I was considering adding that sort of calculation into GoReviewPartner, but just for fun :mrgreen:

_________________
I am the author of GoReviewPartner, a small software aimed at assisting reviewing a game of Go. Give it a try!

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #22 Posted: Sun Mar 25, 2018 10:24 pm 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
ez4u wrote:
Bill Spight wrote:
...

Thanks for the links to Regan's writing. :) He provides another link to the Parable of the Golfers ( https://www.cse.buffalo.edu//~regan/che ... lfers.html ), in which he states what is close to my position.
Ken Regan wrote:
the statistical analysis can only be supporting evidence of cheating, in cases that have some other concrete distinguishing mark.


I am enough of a Bayesian to accept very strong statistical evidence by itself, especially when the question is one of throwing out a result. In that article he refers to the civil court level of evidence. For disciplinary action I think that we should require much more. IMO there is enough evidence to have Carlos's play in future tournaments monitored.


In this case shouldn't the Parable of the Golfers be modified to ask how many golfers we need in order to observe someone sink their drive on all 4 par 3's on a typical course?


First, let me say something about being a Bayesian. Before the 20th century, everybody was a Bayesian, but by then were aware of its problems. To be a Bayesian is to believe that hypotheses have, or may have, probabilities. Frequentists believe that only events have probabilities. Frequentists won the field in the early 20th century, as ideas such as hypothesis testing were developed without assigning probabilities to hypotheses. In the mid 20th century there were still a few famous Bayesians, such as L. J. Savage and I. J. Good. I was fortunate enough to spend an afternoon with Savage, who showed me that I was a closet Bayesian. ;) Later in the 20th century Bayesianism had a rebirth, perhaps because of computer programmers who were able to write programs that made sense in Bayesian terms, and those programs worked.

Bayesiansism when I came across it, in the mid 20th century, was largely subjectivist, because how you alter your probabilities in the face of evidence depends upon your prior beliefs. This subjective property, OC, is a major problem from a scientific point of view. I. J. Good talked about how to interrogate yourself to find out what your probabilistic beliefs were. The question you ask is a good example. What would be convincing statistical evidence in itself? How about aceing all four par threes on the course?

Quote:
As a Bayesian, if you start with an expectation of 80% accuracy (the high end of what was observed in FTF games), how do you interpret 98 out of 100 (or 49 out of 50?)? [This is not argumentative; I simply would like to know!]


IIRC, Uberdude observed 89% in his opponent, who was of comparable strength but had not undergone Carlos's training regime. Anyway, as a Bayesian, you have to have a non-zero belief in any probability that you may possibly come to assign to a hypothesis, or you will never assign that probability. IOW, your beliefs have a distribution over a range of probabilities. So you can't just focus on, say, 75% or 80% in itself.

I know in a way I am dodging the question, but let me say that first, I would want to know more about the games and what kind of moves he made which were like Leela's and what kind were not, and so on. Second, like just about everybody else, I think that there is enough evidence to think that in this one game he played unusually like Leela. But there could be any number of reasons for that. Cheating is not the only hypothesis to consider. It may be the one with the highest prior degree of belief, and the one that is most bolstered by the evidence, but that still may not give it a high posterior degree of belief. It may be the most supported hypothesis, but that is a different question.

Another question that Baysians must ask is how do we know what we know? An example from the early scientific literature has to do with the probability that the sun will rise tomorrow. Suppose we start by assigning equal weight to each non-zero probability that the sun would rise the next day, starting from day 1, which in those days was supposed to be some 6,000 years in the past. As we update our beliefs with the sun rising every day for 6,000 years, the probability that it would rise tomorrow becomes very, very, close to 1. (Yay!) But someone, I forget who and where, published a paper pointing out that on the same evidence the probability that the sun would rise 5,000 years from today was only ⅔. That result was not appealing. :lol: That result illustrates the problematic nature of Bayesianism for science. Speaking for myself, my belief about the sun rising has to do with the rotation of the earth. Given that knowledge, whether the sun rose yesterday is irrelevant to whether it will rise tomorrow.

In the parable of the golfers Regan states that the chance of a scratch golfer making a hole in one on a par three is about 1/5000. If we gathered 10,000 such golfers and had them each hit a ball from the tee on a par three hole, we would expect 2 of them to make a hole in one. He goes on to say that suppose that we slipped a piece of paper with a black dot on it into the pocket of 10 of those 10,000, and one of them made a hole in one. The probability that (at least) one of them would do so is around 1/500, i.e., (1 - (4999/5000)^10). The statistical evidence is very strong that this is not just a chance event, that there is a reason for it. Good enough, Regan suggests, to win a civil case in court.

Something that he does not go on to say, but I think that he should have said, is that once we know about the black dots, we should chalk up that hole in one to chance. Having a black dot in your pocket is irrelevant to your golfing ability. But in the parable he has the black dot stand for "physical or observational evidence of cheating". I.e., given such evidence, then the hole in one (statistically very rare occurrence) may offer support. In this case, all we have is the statistical evidence. IMO, this evidence supports looking for physical or observational evidence of cheating. For example, by monitoring Carlos's play in the future.

Frequentists agree, BTW. I still remember the prof announcing, "Statistics proves nothing." ;)

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.


This post by Bill Spight was liked by: mhlepore
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #23 Posted: Sun Mar 25, 2018 10:57 pm 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
Humans can learn from programs and apply what they have learnt. The result can be very similar games. Therefore, probability is not evidence. What we do need in important online games is referees locally supervising the players to prevent almost all cheating of all forms.


This post by RobertJasiek was liked by 2 people: goTony, joellercoaster
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #24 Posted: Sun Mar 25, 2018 11:06 pm 
Gosei

Posts: 1494
Liked others: 111
Was liked: 315
The game in question has 166 moves, the allegation is presumably that after move 4 black starts using Leela. In other words he turned on Leela after opting for a double 3-3 fuseki. After that every single move chosen is a 'top' Leela move. The game doesn't look that remarkable to me. The mathematics behind the decision has, especially in the case of an appeal, to be the subject of some debate if online games are still to be taken into account.

Nobody has answered my question yet though, is there any appeal of the decision?

_________________
North Lecale

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #25 Posted: Mon Mar 26, 2018 1:01 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
On the issue of the (mis)use of statistics in (in)justice:
- https://en.wikipedia.org/wiki/Roy_Meadow
- https://en.wikipedia.org/wiki/Lucia_de_Berk
- https://en.wikipedia.org/wiki/Birmingham_Six (many other problems here, but part of the evidence was an expert witness saying he was 99% sure a chemical test proved the accused had been handling explosives, but it turned out handling playing cards could produce a false positive (from their similar to nitroglycerine coating), and they had indeed been playing cards)


This post by Uberdude was liked by: sybob
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #26 Posted: Mon Mar 26, 2018 1:28 am 
Gosei

Posts: 1494
Liked others: 111
Was liked: 315
Uberdude wrote:


I wasn't aware that the Birmingham Six was an example of mis-use of statistics; wasn't it mostly the simple fabrication of evidence? Mind you the TOS probably forbids me discussing that here, so here is the game from November instead.

]

_________________
North Lecale

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #27 Posted: Mon Mar 26, 2018 2:14 am 
Beginner

Posts: 4
Liked others: 0
Was liked: 0
Rank: EGF 4d
KGS: aitkensam
Is there a reason why the other games played in this year's league did not form part of the investigation? They would presumably provide useful data for determining whether the game against Israel was out of the ordinary for this player. Or whether all online games show a different pattern to all offline games etc.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #28 Posted: Mon Mar 26, 2018 2:15 am 
Dies in gote

Posts: 46
Liked others: 109
Was liked: 34
Rank: Euro 1 dan
GD Posts: 7
Some people have here mentioned 98% out of 100 moves, but I suppose it is actually 50 moves of the player, within moves 50-150 of the game.

I think it is very important why exactly moves 50-150 were studied. Was this decided beforehand? Why are not numbers given for the whole game?

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #29 Posted: Mon Mar 26, 2018 2:18 am 
Beginner

Posts: 3
Liked others: 0
Was liked: 0
This guy Metta is the chief referee of the coming EGF Congress in Pisa. What happens then, will the EGF allow him to referee?


Last edited by goer on Mon Mar 26, 2018 2:19 am, edited 1 time in total.
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #30 Posted: Mon Mar 26, 2018 2:19 am 
Beginner

Posts: 3
Liked others: 0
Was liked: 0
aitkensam wrote:
Is there a reason why the other games played in this year's league did not form part of the investigation? They would presumably provide useful data for determining whether the game against Israel was out of the ordinary for this player. Or whether all online games show a different pattern to all offline games etc.


That's a good point.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #31 Posted: Mon Mar 26, 2018 2:22 am 
Gosei

Posts: 1494
Liked others: 111
Was liked: 315
This is what they did
Quote:
For those asking, what we did is that we checked several of his offline games from recent tournaments, and we also verified with his opponents that they were the actual games played. We checked moves 50-150 and noted the moves as similar if they were within Leela's top 3 moves, and no further than 5% away from its top move. In those games the similarity was 70-80%. We then went back to the PGETC games and checked the game against Isreal. In that game we found that the similarity was 98%, where the only move that was different was Leela's move number 4, but still within 1% winrate of its top move.


Leela move 4 - does that correspond to move 7 of the game, or does it mean move 57? I guess we will see a technical report soon to clear up this confusion. In any EGF event you have 3 stages of appeal. #1.Appeal to the Referee, #2.Appeal to the Tournament Appeals Committee, #3.Appeal to the EGF version of #2. Given that this happened in November and has only become known now, which steps were taken?

_________________
North Lecale

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #32 Posted: Mon Mar 26, 2018 2:52 am 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
The game almost only has ordinary moves. Already this lets it be possible that two different players / programs find most of the same moves, especially if they use a similar playing style. That the player is said to have studied with the program for 2 years makes this all the more likely that same moves are not coincidence and not cheating but a direct consequence of adapting a playing style and "knowledge" / "experience" from training with the program during the years before the game. I do hope very sincerely that the judgement is overturned by higher instances.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #33 Posted: Mon Mar 26, 2018 2:54 am 
Gosei
User avatar

Posts: 2011
Location: Groningen, NL
Liked others: 202
Was liked: 1087
Rank: Dutch 4D
GD Posts: 645
Universal go server handle: herminator
Javaness2 wrote:
This is what they did
Quote:
For those asking, what we did is that we checked several of his offline games from recent tournaments, and we also verified with his opponents that they were the actual games played. We checked moves 50-150 and noted the moves as similar if they were within Leela's top 3 moves, and no further than 5% away from its top move. In those games the similarity was 70-80%. We then went back to the PGETC games and checked the game against Isreal. In that game we found that the similarity was 98%, where the only move that was different was Leela's move number 4, but still within 1% winrate of its top move.


Leela move 4 - does that correspond to move 7 of the game, or does it mean move 57? I guess we will see a technical report soon to clear up this confusion. In any EGF event you have 3 stages of appeal. #1.Appeal to the Referee, #2.Appeal to the Tournament Appeals Committee, #3.Appeal to the EGF version of #2. Given that this happened in November and has only become known now, which steps were taken?


Leela move 4 means the 4th best move in that position according to Leela's analysis.

Of the 50 moves considered, 49 were in the top 3 best moves according to Leela's analysis (with the additional constraint that the move should not be more than 5% worse than Leela's top choice).

This is where the 98% number comes from, it is 49/50.

The only move out of 50 not in Leela's top 3 was Leela's 4th choice in that position, and was less than 1% worse than Leela's top choice.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #34 Posted: Mon Mar 26, 2018 4:03 am 
Gosei

Posts: 1494
Liked others: 111
Was liked: 315
Ah, you are right, that seems like the best way to interpret that statement. So did they then only look at a section of the game or at the whole game? For me it would be kind of strange if they didn't look at the whole game. I suppose that a script already exists to show the comparison data per ply for the game.

_________________
North Lecale

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #35 Posted: Mon Mar 26, 2018 4:18 am 
Gosei
User avatar

Posts: 1753
Liked others: 177
Was liked: 491
Uberdude wrote:
Out of interest of the quality of the similarity metric used, I downloaded Leela 0.11 (to my crappy laptop, about 10 seconds to get 30k nodes, I don't know how strong it is) and analysed moves 50-80 of Carlo's game vs Israel, and my last PGETC game moves 50-88. In that small section of Carlo's game he got 100% similar, Israel 67%. In my game I got 74% and my opp 89%. (...) if it's possible for an innocent to get 89% on 38 moves then 98% on 100 moves when you've been studying with Leela is suspicious but not good enough proof for punishment.


Let's pick the highest percentage, i.e. 89%. Suppose for simplicity that for each move, the probability to find Leela's move is p=0.89. Then for n=50 moves, the probability to find correctly exactly 49 moves is npn-1(1-p) which is about 2%. During rounds 1--3 of the Pandanet EGC, 60 games were played, so you would expect at least one false positive.


This post by jlt was liked by: Javaness2
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #36 Posted: Mon Mar 26, 2018 4:44 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
jlt, a good point to illustrate, but even that is placing too much significance on this test. That 2% is the chance of a randomly chosen game being at 98% Leela (and you should include 100% too). But when the game you choose to investigate is selected because someone else noticed it was similar to Leela it's like putting the black spot in the golf analogy on the player who hit the hole-in-one after and because he did so. It's not independent so the simple probabilities are not appropriate.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #37 Posted: Mon Mar 26, 2018 4:49 am 
Gosei

Posts: 1494
Liked others: 111
Was liked: 315
Finally I saw a confirmation, an appeal is planned, so the decision is not final. Judging by the last appeal I saw, it could take a year to figure this out.

Russia

CzechRepublic

Romania

Hungary

Serbia

Ukraine

_________________
North Lecale


Last edited by Javaness2 on Mon Mar 26, 2018 5:40 am, edited 2 times in total.

This post by Javaness2 was liked by: Bonobo
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #38 Posted: Mon Mar 26, 2018 5:11 am 
Oza
User avatar

Posts: 2401
Location: Tokyo, Japan
Liked others: 2338
Was liked: 1332
Rank: Jp 6 dan
KGS: ez4u
jlt wrote:
Uberdude wrote:
Out of interest of the quality of the similarity metric used, I downloaded Leela 0.11 (to my crappy laptop, about 10 seconds to get 30k nodes, I don't know how strong it is) and analysed moves 50-80 of Carlo's game vs Israel, and my last PGETC game moves 50-88. In that small section of Carlo's game he got 100% similar, Israel 67%. In my game I got 74% and my opp 89%. (...) if it's possible for an innocent to get 89% on 38 moves then 98% on 100 moves when you've been studying with Leela is suspicious but not good enough proof for punishment.


Let's pick the highest percentage, i.e. 89%. Suppose for simplicity that for each move, the probability to find Leela's move is p=0.89. Then for n=50 moves, the probability to find correctly exactly 49 moves is npn-1(1-p) which is about 2%. During rounds 1--3 of the Pandanet EGC, 60 games were played, so you would expect at least one false positive.

This calculation was why I was trying to get something more 'common sense' from Bill. :)
If you take 89% (Uberdude's 4d opponent), you get 1.8% or about one time in 54. However, if you plug in 80% (upper figure for what was observed for the player under discussion) you get a very different result = [edit --> wrong! 0.0000065% or one time in 15 million] 0.018% or one time in about 5,600. And if you take 70% (the lower figure), you get [edit --> wrong!! 5.0E-16 or about one time in 2 quadrillion] 3.8E-7 or about one time in 2.5 million.

_________________
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21


Last edited by ez4u on Mon Mar 26, 2018 6:26 am, edited 2 times in total.
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #39 Posted: Mon Mar 26, 2018 5:26 am 
Gosei
User avatar

Posts: 1753
Liked others: 177
Was liked: 491
ez4u wrote:
If you take 89% (Uberdude's 4d opponent), you get 1.8% or about one time in 54. However, if you plug in 80% (upper figure for what was observed for the player under discussion) you get a very different result = 0.0000065% or one time in 15 million. And if you take 70% (the lower figure), you get 5.0E-16 or about one time in 2 quadrillion.


No you don't. If p=0.8 then npn-1(1-p) is about 1/5600. If p=0.7 then npn-1(1-p) is about 1/(2.5 million).

The orders of magnitude are certainly very different, but I purposely picked p=0.89 to be on the safe side, i.e. I would prefer a few cheaters to go unpunished rather than too many punished innocents (together with the whole team).

And also, p=0.89 may not be unrealistic for that particular game if you assume that most moves were very "ordinary" (dixit Robert Jasiek).

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #40 Posted: Mon Mar 26, 2018 5:28 am 
Gosei
User avatar

Posts: 2011
Location: Groningen, NL
Liked others: 202
Was liked: 1087
Rank: Dutch 4D
GD Posts: 645
Universal go server handle: herminator
IMO, for the appeal, they should analyse a large sample of PGETC games to see how much of an outlier 98% is.

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 720 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 36  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group