It is currently Tue Mar 19, 2024 2:57 am

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 720 posts ]  Go to page Previous  1 ... 5, 6, 7, 8, 9, 10, 11 ... 36  Next
Author Message
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #141 Posted: Wed Apr 04, 2018 4:38 pm 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
sybob wrote:
Bill Spight wrote:
Still, drawing conclusions from one game is absurd.

Huh?
The issue is: did he cheat IN THIS GAME.
That's what he is accused of. It does not matter how/what other games are.


Why do you think that they are irrelevant? If he played well enough without cheating, as evidenced by other games in which he beat stronger players than his opponent in that game, why would he cheat in that game? Yes, there is evidence that he played like Leela in that game, but that is not the same as cheating.

Edit: And, indeed, they did conclude that Carlo probably cheated in the other games, based upon playing like Leela in that one game. They threw those other results out, as well. If his play in that game is relevant to his play in the other games, isn't his play in those games relevant to his play in that game?

Also, if all we are going by is similarity to Leela's play, we need a lot more evidence than we can get in one game. If we have behavioral or physical evidence of cheating, that's a different matter. But we do not.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #142 Posted: Thu Apr 05, 2018 12:23 am 
Gosei

Posts: 1492
Liked others: 111
Was liked: 314
Regarding the EGF matter, the report will not be released until the appeals process is finished.
Regarding the CIT ... ?

_________________
North Lecale

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #143 Posted: Thu Apr 05, 2018 12:33 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
sybob wrote:
Bill Spight wrote:
Still, drawing conclusions from one game is absurd.

Huh?
The issue is: did he cheat IN THIS GAME.
That's what he is accused of. It does not matter how/what other games are.


I wouldn't go quite as far as calling it "absurd", but "requiring of far stronger evidence that anything significant happened compared to looking at multiple games and therefore it is unlikely you can disprove the null hypothesis and can justifiably convict".

If you really want to only look at this one game from Carlo in isolation then you could, but you need to be careful with the stats (a human analysis of the plausibility of the plays like Stanislaw did is much better). Setting aside the much discussed problems of looking at matching rates against a bot, once you've got the 98% top 3 match figure you still need to know what is a typical value for it, which you need to get from looking at other games (to account for playing style it would be best if Carlo's, but others' would do too). "98% is a big number" is not good enough. To be flippant, Carlo played 100% of his moves on the intersections of the board, just like Leela did too.

I only have 10 data points, but fitting them to a normal distribution (dubious: too small sample, could be different shape, plus 100 is a hard max) I get a mean of 80 and standard deviation of 8. So then you might say 98 is 2.2 sds from the mean, what's the chance of that? Look up your normal distribution probability tables and you get 1.2%. That's small, an inept statistician would say, less than the oft used 0.05 significance level, he must be guilty! But that's the chance a randomly selected game has that value (based on the false assumption the metric is normally distributed with those parameters). But this game was not randomly selected, it was chosen to be examined precisely because it has a high similarity. So such a probability is invalid. As Feynman eloquently said:

Quote:
You know, the most amazing thing happened to me tonight. I was coming here, on the way to the lecture, and I came in through the parking lot. And you won’t believe what happened. I saw a car with the license plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight? Amazing!


This post by Uberdude was liked by: Akura
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #144 Posted: Thu Apr 05, 2018 1:09 am 
Gosei

Posts: 1492
Liked others: 111
Was liked: 314
I think you're supposed to expect up to 3 sd from the norm in any distribution model?

_________________
North Lecale

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #145 Posted: Thu Apr 05, 2018 1:23 am 
Oza

Posts: 3644
Liked others: 20
Was liked: 4620
Quote:
You know, the most amazing thing happened to me tonight. I was coming here, on the way to the lecture, and I came in through the parking lot. And you won’t believe what happened. I saw a car with the license plate ARW 357.


As a Londoner I can point to something even more amazing - yesterday I actually saw an empty parking space!!!!!

But more seriously, I remember the registration number of my father's first car from 60 years ago, and I can't even remember which day it is now. UK registration plates are area-based, but last year I saw that same plate on a new car here, 300 miles away. That sort of coincidence reminds me of the work on coincidences of an Austrian mathematician whose name I've forgotten but I think begins with Ka- (and for some reason my brain also associates frogs with him). I'd like to be reminded of his name, but the point is he showed that coincidences are normal, and even fourth-order coincidences are not extraordinary. I read that as a student and I've never believed in conspiracy theories since.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #146 Posted: Thu Apr 05, 2018 1:27 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
One other requirement for null hypothesis testing is that the data be independent, but the moves in a single go game are far from independent.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #147 Posted: Thu Apr 05, 2018 2:09 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
Bill Spight wrote:
One other requirement for null hypothesis testing is that the data be independent, but the moves in a single go game are far from independent.

Each datum in the situation mentioned though is the matching percentage for a whole game (or rather one player's moves in 50-149 chunk), so the lack of independence of individual moves doesn't matter and is subsumed into that single value. The relevant question is then is each game's matching % independent from another's? I should think so, though there will certainly be correlations to properties like player strength, player style, time-limits, or seriousness of event so you need to make sure your sample is from a relevant population.

The lack of independence of moves though would, I suspect, cause these data to be less tightly clustered around the mean than otherwise. So less like a normal distribution with a nice tight peak and more of a pancake. So you can't just blindly slap a normal distribution on it and do your P(X>mean+f*sd) test.

P.S. Analysed an Ilya vs Artem game, updated table above.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #148 Posted: Thu Apr 05, 2018 5:04 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
Uberdude wrote:
Bill Spight wrote:
One other requirement for null hypothesis testing is that the data be independent, but the moves in a single go game are far from independent.

Each datum in the situation mentioned though is the matching percentage for a whole game (or rather one player's moves in 50-149 chunk), so the lack of independence of individual moves doesn't matter and is subsumed into that single value. The relevant question is then is each game's matching % independent from another's? I should think so, though there will certainly be correlations to properties like player strength, player style, time-limits, or seriousness of event so you need to make sure your sample is from a relevant population.

The lack of independence of moves though would, I suspect, cause these data to be less tightly clustered around the mean than otherwise. So less like a normal distribution with a nice tight peak and more of a pancake. So you can't just blindly slap a normal distribution on it and do your P(X>mean+f*sd) test.

P.S. Analysed an Ilya vs Artem game, updated table above.


Point well taken about the independence of data across games. That can reasonably be assumed. I was concerned about the application of the standard deviation, but you raise that issue, as well. More on that problem below.

The lack of independence between moves in a single game raises the question of what you count. Go players regard the hane-and-connect as a unit. Why count it as two matches instead of one? Semeai may not be one lane roads, because the order of play can vary, but they produce a sequence of play where matches to the top three options of individual plays is higher than normal. Now, over a large sample of games the average number of obvious responses to forcing moves, joseki sequences, and one lane roads, etc., evens out, so that counting one move matches is an OK proxy for a better matching metric. But that does not apply when you are looking only at one game. For instance, if the 100 move sequence in a game included a 20 move one lane road, that would push up the single move match percentage. A long ko fight would generate a large number of obvious responses to forcing moves, and that would increase the single move match percentage, as well. You can't just rely on testing a single game, using single move match criteria.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.


This post by Bill Spight was liked by: Charlie
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #149 Posted: Thu Apr 05, 2018 6:59 am 
Lives in gote

Posts: 388
Liked others: 295
Was liked: 64
IGS: 4k
Universal go server handle: BlindGroup
Uberdude wrote:
I only have 10 data points, but fitting them to a normal distribution (dubious: too small sample, could be different shape, plus 100 is a hard max) I get a mean of 80 and standard deviation of 8. So then you might say 98 is 2.2 sds from the mean, what's the chance of that? Look up your normal distribution probability tables and you get 1.2%. That's small, an inept statistician would say, less than the oft used 0.05 significance level, he must be guilty! But that's the chance a randomly selected game has that value (based on the false assumption the metric is normally distributed with those parameters). But this game was not randomly selected, it was chosen to be examined precisely because it has a high similarity. So such a probability is invalid. As Feynman eloquently said:

Quote:
You know, the most amazing thing happened to me tonight. I was coming here, on the way to the lecture, and I came in through the parking lot. And you won’t believe what happened. I saw a car with the license plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight? Amazing!


Uberdude, your taking the time to go through even these 10 games seems to be more than we've seen anyone else doing to systematically assess these decisions. A few thoughts to contribute:

1. As you note a sample size of 10 data points is VERY small. I think even "inept statisticians" would be uncomfortable move forward with only these data. That said, this is not meant to criticize your efforts, but rather to argue that your are on the right track and that your efforts should be extended significantly by some organization with significantly greater access to computational resources.

2. I think you have the logic of the hypothesis testing framework slightly twisted and it affects the interpretation of the 1.2 percent error rate (the "Type I" rate). You are right, we chose the game with the 98 percent top-3 match rate deliberately -- it was the game under question. The 1.2 percent that you have estimated gives you the odds of getting a match rate of 98 percent or more given "normal go play". Said differently, if you set up a decision framework that classifies any match rate that is 98 percent or more as cheating, you will falsely classify 1.2 percent of all normal (non-cheating) games as cheating. From a research perspective, this is well within the accepted probabilities of error for most disciplines. But is it small enough for the purposes of identifying cheating in go? Probably not. This rate would mean that on average, at least one person would be convicted of cheating at every tournament with 100 people. That seems like an uncomfortably high level of false convictions to me.

Relative to the Feynman quote, the 1.2 percent tells us how likely it would be to observe the AWS 357 license plate through random variation. The question is whether or not to assign significance to this occurrence (e.g. declare it to be unusual and worthy of further investigation) or to let it go. If we set up a decision process that assigns significance to it if the probability of observation is 1.2 percent or less, then even when there is no true significance to it, we will be wasting our time investigating it 1.2 percent of the time. The point of the quote is that we experience rare events more often than most people realize and so, need to be careful about using the rareness of the event alone as justifying further investigation. In this case, someone has run into the classroom and told Feynam that they saw a care with the license plate AWS 357 in the parking lot just before a local bank was robbed. We have to decide whether or not that observation warrants following up on the owner of that car or to let the lead go.

3. There are statistical techniques for handling data with unknown distributions, but they are very "data hungry" in that they require very large data sets. Same goes for dealing with data that is not "independently identically distributed". Your and Bills comments are on point, but given a reasonable amount of data, these issues are easily addressed.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #150 Posted: Thu Apr 05, 2018 7:25 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
BlindGroup wrote:
Uberdude, your taking the time to go through even these 10 games seems to be more than we've seen anyone else doing to systematically assess these decisions. A few thoughts to contribute:

1. As you note a sample size of 10 data points is VERY small. I think even "inept statisticians" would be uncomfortable move forward with only these data. That said, this is not meant to criticize your efforts, but rather to argue that your are on the right track and that your efforts should be extended significantly by some organization with significantly greater access to computational resources.


Let me second that. :) And also add the necessity to apply the Adkins Principle (named, not by me, after my late wife): At some point, doesn't thinking have to go on?

Quote:
3. There are statistical techniques for handling data with unknown distributions, but they are very "data hungry" in that they require very large data sets. Same goes for dealing with data that is not "independently identically distributed". Your and Bills comments are on point, but given a reasonable amount of data, these issues are easily addressed.


The main point is, we do not yet have a reasonable amount of data regarding either single move matches or cheating or their possible relationship.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #151 Posted: Thu Apr 05, 2018 7:42 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
BTW, last night I found an interesting approach to detecting online cheaters at chess, by Brendan Norman, at https://www.youtube.com/watch?v=RTfH5gntsug . He notes that run of the mill online chess cheaters mainly do it to boost their egos and to put down their opponents. He uses the lichess computer analysis tool on suspect games, and typically finds that the suspected cheaters make 0 blunders, 0 mistakes, and 0 to few inaccuracies. (These categories are not apparently based upon matches to any single chess engine, nor to matches per se. Norman does not go into the workings of the analysis tool.) Grandmasters do not play that well.

One thing that I find interesting is that Norman will play games against his suspected cheaters and play poorly on purpose, so that his opponent does not have to cheat to win the game. ;) Then his opponent plays picture perfect chess, anyway. :lol: Another fish caught in the net. :D

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #152 Posted: Thu Apr 05, 2018 8:54 am 
Lives in gote

Posts: 422
Liked others: 269
Was liked: 129
KGS: captslow
Online playing schedule: irregular and by appointment
jeromie wrote:
That’s only true if you consider the likelihood of cheating in one game to be independent of cheating in other games AND you think there is nothing to learn from a player’s performance in other games. But that’s probably not true.

At the very least, a person’s general level of play adds some important data. If I were to suddenly start beating dan level players on KGS after a long period of stable play as a 3 kyu, you’d have good grounds to be suspicious of my improvement.

That's the difficulty others have already pointed out: the difference between a legal view (what's the accusation, what's the evidence, how hard is the evidence), and the probability/statistical view (what are the chances).
These two views are very hard to reconcile.

Others are better than me to express views on the statistical view. But statistics are very difficult to provide convincing evidence in individual cases. And yes, from a statistical point of view, you want more data and comparisons. But even so, it may not provide conclusive evidence from a legal point of view.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #153 Posted: Thu Apr 05, 2018 9:52 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
sybob wrote:
jeromie wrote:
That’s only true if you consider the likelihood of cheating in one game to be independent of cheating in other games AND you think there is nothing to learn from a player’s performance in other games. But that’s probably not true.

At the very least, a person’s general level of play adds some important data. If I were to suddenly start beating dan level players on KGS after a long period of stable play as a 3 kyu, you’d have good grounds to be suspicious of my improvement.

That's the difficulty others have already pointed out: the difference between a legal view (what's the accusation, what's the evidence, how hard is the evidence), and the probability/statistical view (what are the chances).
These two views are very hard to reconcile.


That's not the only difference. The "legal" question is did he cheat? The statistical question that was posed is did he play like Leela? Common sense tells us that if he cheated he probably did so using a bot, so the questions are related. But they are still different questions.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #154 Posted: Thu Apr 05, 2018 11:00 am 
Lives in gote

Posts: 388
Liked others: 295
Was liked: 64
IGS: 4k
Universal go server handle: BlindGroup
Bill Spight wrote:
That's not the only difference. The "legal" question is did he cheat? The statistical question that was posed is did he play like Leela? Common sense tells us that if he cheated he probably did so using a bot, so the questions are related. But they are still different questions.


I disagree, and it depends a bit on what you mean by "did he cheat?". If by that you mean "Can we know with certainty through some sort of fact finding process that he used Leela?", I argue that this is an unanswerable question and not a useful way to frame the question. To wit, under any fact pattern in a legal setting, there will always be some grounds for doubting he cheated. They may not be "reasonable" doubts, but they will exist. It's impossible to answer this question with certainty.

I think the answerable question is "Under what circumstances (i.e. under what evidence, patterns of play, outcomes of statistical inferences) are we comfortable concluding that he cheated and operating under that assumption to deliver a punishment." From this perspective the statistical and legal questions are logically isomorphic -- the structure of the decision problem is the same.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #155 Posted: Thu Apr 05, 2018 12:08 pm 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
BlindGroup wrote:
Bill Spight wrote:
That's not the only difference. The "legal" question is did he cheat? The statistical question that was posed is did he play like Leela? Common sense tells us that if he cheated he probably did so using a bot, so the questions are related. But they are still different questions.


I disagree, and it depends a bit on what you mean by "did he cheat?". If by that you mean "Can we know with certainty through some sort of fact finding process that he used Leela?", I argue that this is an unanswerable question and not a useful way to frame the question. To wit, under any fact pattern in a legal setting, there will always be some grounds for doubting he cheated. They may not be "reasonable" doubts, but they will exist. It's impossible to answer this question with certainty.


Agreed. :) Edit: To be clear, I stick by what I said, but I agree that we cannot answer that question with certainty.

Quote:
I think the answerable question is "Under what circumstances (i.e. under what evidence, patterns of play, outcomes of statistical inferences) are we comfortable concluding that he cheated and operating under that assumption to deliver a punishment." From this perspective the statistical and legal questions are logically isomorphic -- the structure of the decision problem is the same.


Depends upon what you mean by statistical.

Take the question, will the sun rise tomorrow? If we do not know about celestial mechanics, that is iffy. Will Phoebus Apollo have a hangover tomorrow and sleep in?

A statistical answer was attempted, based upon historical (i.e., biblical) evidence that the sun had risen every day for 6,000 years or so. Using a Laplacian prior, the probability is near certainty. But based upon knowledge that the earth revolves on its axis, the probability is even closer to 1. Keynes and Good would have been happy to combine astronomical knowledge with statistical knowledge in terms of Bayesian probability. (Maybe not in this particular instance, but generally, utilizing non-statistical knowledge in the prior. Keynes's priors were not necessarily numerical.) Moi, I distinguish between the types of evidence. (As I did in these sentences.) :) For cheating, Regan's physical and behavioral evidence I do not consider to be statistical.

Edit: Also, it is important to distinguish between the question of cheating and the question of whether Carlo played like Leela. As Regan points out, the key statistical question of cheating is whether Carlo played better than his non-cheating self. In this tournament Carlo played less like Leela when he beat stronger players than the one he beat in the game in question. That certainly raises questions about the particular statistical question asked, a certain way of matching Leela's choices, and the question of cheating. If you just say that the statistical and legal questions are isomorphic, you can't ask those questions.

I meant to add, examination of the game record is also important to the question of cheating. In the CIT case I did so and came to the opposite conclusion than that indicated by the statistical evidence alone. In an example of suspected cheating in chess (sorry, I don't have a link right now) examination of a lost game offered a clue. This was not online cheating, but FTF. It was suspected that the player was signaled somehow to make the moves recommended by a chess engine. Then there was this loss after a stupid blunder. We count that as negative evidence of cheating, statistically. However, if the chess piece had been placed on an adjacent square to the one played, he could have won brilliantly. If we assume that he either was sent the wrong signal, or misinterpreted it, the move is forensic evidence of cheating. (Also, how could a player that good make a blunder that bad?) This is reminiscent of the 90 ft. tall man paradox. Good has an example with some biological range and a border. (Are there butterflies on the other side of a political border, as well as on this side? Something like that.)

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #156 Posted: Thu Apr 05, 2018 1:19 pm 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
BlindGroup wrote:
Uberdude, your taking the time to go through even these 10 games seems to be more than we've seen anyone else doing to systematically assess these decisions. ...
1. As you note a sample size of 10 data points is VERY small. I think even "inept statisticians" would be uncomfortable move forward with only these data.

My worry is that the 10 data points (from 5 games*) is more than the referees looked at, and that they are being inept / lacking rigour (Hanlon's razor applies)! I hope I am wrong, and there's plenty of mathsy analytical types in the Go population who presumably got involved in the investigation which should avoid it, but fear that I may be right. This fear is not assuaged by the fact my question as to if there was a control group was ignored (rather than cheerily answered, "Of course! With 100 games. We've done stats 101.") as were other concerned comments on the facebook thread. We also now learn the report of the investigation "won't be published as long as not all parties agree on it.". I see a few plausible explanations:
1) They (as in league organisers/EGF officials) don't read facebook/L19/reddit so are unaware of the large amount of discussion/opinion/concern. Or off on Easter holidays.
2) They read it but don't care about so ignore us (the chattering classes, not directly involved; though I think as many of us are league participants we are).
3) There was no control group (or absurdly small), so stay silent to avoid admitting they messed up.
4) There was a good-sized control group, but it showed 98% was not significant. Stay silent as above or for some reason e.g. not communicating on unofficial platforms, discuss amongst selves first, thinking silent justice is better than engaging with a raucous community.
5) There was a good-sized control group, and it showed 98% was a significant outlier. But stay silent for reasons above, even though releasing info would placate community. This would mean my results with high %s are a fluke (which I could believe if a large study was released and could be verified, but I'm doubtful).

The Machiavellian streak in me thinks I should accuse my opponent with the 88% match (or whoever I next find with a higher %) of Leela cheating. Even better if I find a 98% from an old season before Leela existed! ;-) The problem of cheating using bots, either real cases or spurious accusations, is unfortunately here to stay and we need to form robust processes for dealing with it. Schayan Hamrah (an Austrian 5d who plays in the league) pointed out that the existing rules of the tournament have too little detail on dealing with bot cheating and this needs to be rectified and agreed by EGF members. I believe this is best done with an open and frank approach, not hiding from scrutiny.

* 20 data from 10 games now! Just did my game vs Victor Chow. And Cornel vs breakfast. And Daniel vs crazy Jonas. And my first pro game, Lee vs Park.


This post by Uberdude was liked by: BlindGroup
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #157 Posted: Fri Apr 06, 2018 12:56 pm 
Lives in gote

Posts: 388
Liked others: 295
Was liked: 64
IGS: 4k
Universal go server handle: BlindGroup
Bill Spight wrote:
Depends upon what you mean by statistical.

Take the question, will the sun rise tomorrow? If we do not know about celestial mechanics, that is iffy. Will Phoebus Apollo have a hangover tomorrow and sleep in?

A statistical answer was attempted, based upon historical (i.e., biblical) evidence that the sun had risen every day for 6,000 years or so. Using a Laplacian prior, the probability is near certainty. But based upon knowledge that the earth revolves on its axis, the probability is even closer to 1. Keynes and Good would have been happy to combine astronomical knowledge with statistical knowledge in terms of Bayesian probability. (Maybe not in this particular instance, but generally, utilizing non-statistical knowledge in the prior. Keynes's priors were not necessarily numerical.) Moi, I distinguish between the types of evidence. (As I did in these sentences.) :) For cheating, Regan's physical and behavioral evidence I do not consider to be statistical.


I think we may be trying to make slightly different points. If I understand you correctly, what you are saying is that you prefer to distinguish between two types of evidence: evidence that is easily quantifiable and evidence that while relevant does not lend itself to mathematical treatment. In our current context, the former would be the kind of analysis that Uberdude is pushing and the latter would be something like finding out that a player had a network connection in their private lavatory or had recently visited sites entitle "How to Cheat at Go". I agree with that. I do not believe in forcing things into mathematical frameworks when it seems unnatural.

My point though is a bit different. Acknowledging that there are both types of evidence, there is a tendency to say, because we can't quantify everything, let's ignore statistics. I'm arguing that is a mistake. Statistics has more to offer than just a quantification tool. Even if it is not possible to calculate actual probabilities for things using statistical formulas, the mathematical properties can still guide us in how to evaluate evidence and set up decision rules even when considering non-statistical evidence. These are things like the inherent trade-off between false convictions and failure to convict the guilty, Uberdude's point that unlikely events do happen, and that we even have to consider whether observed evidence is really rare. (For the latter, if the webpage "How to Cheat at Go" caused a stir, it's possible that many people in the profession may have visited the site just to see it. It is then harder to argue that this suggests cheating.) This is what I meant by the processes being isomorphic -- that the relationships from they hypothesis testing framework can provide a useful guide in reasoning through these issues even if one cannot quantify the data to do formal statistical analysis.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #158 Posted: Sun Apr 08, 2018 12:42 pm 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
BlindGroup wrote:
[evidence that is easily quantifiable] ... would be the kind of analysis that Uberdude is pushing.

Just to be clear, I agree with Regan/Bill that this kind of statistical evidence (particularly of the rather broad and dumb "count how many moves were in Leela's top 3" rather than e.g. looking for moves which are low policy network probability but Leela likes which he played or other ideas similar to Regan's for chess) is unsatisfactory if it is used to convict on its own (but being an online tournament physical evidence is harder to come by). It could be useful as an automated screening process for all games in an event/server to flag suspicious games for further investigation. The 98% was the only piece of evidence that was publicly released with the announcement of his conviction/punishment. So I want to know how significant an outlier it is. Even if it is significant (at whatever level we choose), I think the suspicious game should be examined further, such as Stanislaw did to see how plausible a skilled human thinks the play is, were there moves Leela liked that the human didn't play, did he make big mistakes according to Leela etc. Comparison to other games played by the accused player is also useful as "this player has been consistently performing above his expected level so we think he is cheating in many of his games, so will find suspicious behaviour in many of them" is an easier proposition to prove beyond reasonable doubt than "this player has been consistently performing above his expected level, but we think he was cheating in only one of them" [and that wasn't a fluke of the comparison statistic: 1 game being a 1 in 100 chance is not surprising in a tournament with over 100 games, but 4 games each of 1 in 100 chance by the same player is much harder to explain innocently].

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #159 Posted: Sun Apr 08, 2018 1:50 pm 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
And here's a histogram of the top 3 similarity metric with 24 data points.
Attachment:
Leela similarity histogram.png
Leela similarity histogram.png [ 7.53 KiB | Viewed 9058 times ]


Interestingly, the top 1 match % has a flatter distribution, here they are together.
Attachment:
Leela similarity histogram top 1 and 3.png
Leela similarity histogram top 1 and 3.png [ 11.38 KiB | Viewed 9033 times ]


This post by Uberdude was liked by 3 people: Bill Spight, BlindGroup, Bonobo
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #160 Posted: Sun Apr 08, 2018 3:27 pm 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
Bill Spight wrote:
Depends upon what you mean by statistical.

Take the question, will the sun rise tomorrow? If we do not know about celestial mechanics, that is iffy. Will Phoebus Apollo have a hangover tomorrow and sleep in?

A statistical answer was attempted, based upon historical (i.e., biblical) evidence that the sun had risen every day for 6,000 years or so. Using a Laplacian prior, the probability is near certainty. But based upon knowledge that the earth revolves on its axis, the probability is even closer to 1. Keynes and Good would have been happy to combine astronomical knowledge with statistical knowledge in terms of Bayesian probability. (Maybe not in this particular instance, but generally, utilizing non-statistical knowledge in the prior. Keynes's priors were not necessarily numerical.) Moi, I distinguish between the types of evidence. (As I did in these sentences.) :) For cheating, Regan's physical and behavioral evidence I do not consider to be statistical.


BlindGroup wrote:
I think we may be trying to make slightly different points. If I understand you correctly, what you are saying is that you prefer to distinguish between two types of evidence: evidence that is easily quantifiable and evidence that while relevant does not lend itself to mathematical treatment.


The social sciences distinguish between quantitative and qualitative evidence, and today a good bit of research involves "triangulation", i.e., a combination of both. The current replication crisis in the social sciences comes in part from a realization that in the past too much weight was given to statistical evidence alone. Rejecting a null hypothesis is disconfirmatory, but that is only confirmatory for any other hypothesis. It is hardly surprising that results based upon weak evidence are not replicated.

A good example of that -- not, I repeat, not -- an example of social science research comes from a Science and Consciousness talk I went to back in the 1990s at the University of California in San Francisco. A mathematician had made a study of a psychokinesis experiment at Princeton ( :shock: ) and found that the data were very close to a normal distribution (p << 0.001), among other findings, which he took to be indicative of ESP. One physicist stood up and roundly criticized the mathematician's conclusions on the basis of physical theory. As a Bayesian, I was not terribly concerned about the fact that the guy had obviously gone looking for a low p values which had not been specified beforehand. He had found a good one. :mrgreen: However, I did not take it as evidence for ESP, but as evidence that the data had been faked. :D

Quote:
My point though is a bit different. Acknowledging that there are both types of evidence, there is a tendency to say, because we can't quantify everything, let's ignore statistics.


My experience is the opposite, at least among those trying to do science. Maybe we run with different crowds. :)

Quote:
I'm arguing that is a mistake.


I agree. :D

Quote:
Statistics has more to offer than just a quantification tool. Even if it is not possible to calculate actual probabilities for things using statistical formulas, the mathematical properties can still guide us in how to evaluate evidence and set up decision rules even when considering non-statistical evidence.


I agree, as well. :)

But confirmatory statistics about 50 possible matches in one game is not good statistical evidence. It may be good enough to raise suspicions and invite the collection of further evidence, but that's all.

Uberdude did go looking for further evidence, including the matches to Leela's choices in other games that Carlo won in the same tournament. Those games were against stronger players than Carlo's opponent in the game in question and had lower numbers of matches than that game. To me, those results cast further doubt upon the assertion that Carlo had been cheating.

Let me go back to the ESP research. The mathematician had no theory as to why a close fit of the data to a normal distribution would indicate ESP. It just did. I, OTOH, had a good theory as to why that close fit would indicate faking the data. It is well known that a large amount of data usually conform to a normal distribution, so if you are faking it, you want the fake data to conform, as well. The question of too good a fit was not a concern to the faker or fakers, because who -- except maybe a crank mathematician -- would test that goodness of fit? :lol:

Based upon online cheating at chess (outside of tournaments) it seems like a lot of that involves using a chess engine to pick the plays. Because the top plays fluctuate as the engine does its calculations, and because different engines might differ slightly in choice of plays, one among the 3 top choices, as long as it is not too bad, will produce nearly a 100% match. Perhaps that is where the idea of using a match to the 3 top choices comes from.

Suppose we accept that theory. Then Carlo's moves in the games against the stronger players should also show a nearly 100% match. They don't. So what do we say about that? Carlo chose to cheat against a 4 dan, but not to cheat against 6 dans?

There is an analogy to Rasch testing here. In Rasch testing if a test taker does better on harder questions than easier questions, it may be that the meaning of some of those questions is different for that person than for others. Games against 6 dans are like harder questions, a game against a 4 dan is like an easier question. If any theory explains the matching results, how can it be to cheat by playing Leela's choices against the 4 dan but not against the 6 dans? OC, an explanation may be possible, but one has not been given.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.


Last edited by Bill Spight on Sun Apr 08, 2018 4:01 pm, edited 2 times in total.
Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 720 posts ]  Go to page Previous  1 ... 5, 6, 7, 8, 9, 10, 11 ... 36  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group