It is currently Tue Mar 19, 2024 3:50 am

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 720 posts ]  Go to page Previous  1 ... 6, 7, 8, 9, 10, 11, 12 ... 36  Next
Author Message
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #161 Posted: Sun Apr 08, 2018 3:32 pm 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
Uberdude wrote:
And here's a histogram of the top 3 similarity metric with 24 data points.


What are the data points? Thanks.

Edit: You also show a histogram of matching Leela's top choice.

Now, with 24 data points, we can, even without knowing the underlying distribution, take the game and player with the highest number of matches and say, in this game the player play like Leela on moves 50 - 149. Both the top choice and top 3 choices are for Metta vs. Reem. If we do this for every tournament, we can take a look at the top matching 5% of games or so. That may be a reasonable thing to do, but concluding without further evidence that the player cheated is not reasonable.

Edited for accuracy. :)

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.


Last edited by Bill Spight on Sun Apr 08, 2018 10:40 pm, edited 1 time in total.
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #162 Posted: Sun Apr 08, 2018 10:32 pm 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
Bill Spight wrote:
What are the data points? Thanks.


Code:
+-----------------+------+----------------+------+---------+---------+---------+---------+
|      Black      | Rank |     White      | Rank | B top 3 | W top 3 | B top 1 | W top 1 |
+-----------------+------+----------------+------+---------+---------+---------+---------+
| [Carlo Metta]   |  4d  | Reem Ben David |  4d  |    * 98 |      80 |    * 72 |      54 |   http://pandanet-igs.com/system/sgfs/6374/original/WWIWTFDSGS.sgf
| Andrey Kulkov   |  6d  | [Carlo Metta]  |  4d  |      80 |    * 86 |      68 |    * 62 |   http://pandanet-igs.com/system/sgfs/6314/original/AMTRMFSDAB.sgf
| Dragos Bajenaru |  6d  | [Carlo Metta]  |  4d  |      74 |    * 78 |      50 |    * 60 |   http://pandanet-igs.com/system/sgfs/6354/original/JRZPCWSANY.sgf
| [Andrew Simons] |  4d  | Jostein Flood  |  3d  |      80 |      88 |      54 |      62 |   http://pandanet-igs.com/system/sgfs/6612/original/XSJUGZZTOX.sgf
| Geert Groenen   |  5d  | [Daniel Hu]    |  4d  |      74 |      66 |      40 |      46 |   http://britgo.org/files/pandanet2016/mathmo-GGroenen-2017-01-10.sgf
| [Ilya Shikshin] |  1p  | Artem Kachan.  |  1p  |      56 |      76 |      38 |      60 |   http://pandanet-igs.com/system/sgfs/6384/original/RYSGTEGMXT.sgf
| [Andrew Simons] |  4d  | Victor Chow    |  7d  |      84 |      76 |      44 |      44 |   http://britgo.org/files/pandanet2014/RoseDuke-Egmump-2015-01-13.sgf
| Cornel Burzo    |  6d  | [A. Dinerstein]|  3p  |      74 |      66 |      40 |      48 |   http://pandanet-igs.com/system/sgfs/6349/original/SCNSFSJXTI.sgf
| Jonas Welticke  |  6d  | [Daniel Hu]    |  4d  |      54 |      64 |      34 |      42 |   http://britgo.org/files/pandanet2017/mathmo-iryumika-2017-12-12.sgf
| [Park Junghwan] |  9p  | Lee Sedol      |  9p  |      74 |      64 |      64 |      38 |   http://www.go4go.net/go/games/sgfview/68053
| Lothar Spiegel  |  5d  | [Daniel Hu]    |  4d  |      66 |      58 |      48 |      42 |   http://britgo.org/files/pandanet2016/mathmo-Mekanik-2017-04-25.sgf
| Gilles v.Eeden  |  6d  | [Viktor Lin]   |  6d  |      82 |      70 |      56 |      46 |   http://pandanet-igs.com/system/sgfs/6616/original/FMKVQBHBBV.sgf
+-----------------+------+----------------+------+---------+---------+---------+---------+


Some notes on recent games.
- Ilya Shikshin 1p vs Artem Kachanovskyi 1p. These players are quite possibly stronger than Leela 0.11 on 50k nodes. So not matching could mean they are playing better rather than worse moves than Leela. As expected the more territorial and orthodox Artem was more similar than creative fighter Ilya. This was also, I think, the first game I analysed to feature a ko (which makes a lot of obvious matches for taking the ko, but also threats can differ).
- my game vs Victor Chow 7d from a few years ago as another example of a weaker player scoring an upset against a stronger one with a solid style. I played well in the opening and middlegame and got a good lead (but only won by half a point when he turned on super endgame and I was under time pressure, after move 150). For over 50 moves of the game Leela really wanted me to invade the left side at c7, which I was aware of but as I was leading against a 7d I knew was a strong fighter I didn't invade there to avoid complications I would well mess up. This was responsible for a lot of my failed matches with Leela's top 1 (often still top 3, but a few times not), plus of course some straight out mistakes from both of us.
- Cornel Burzo 6d vs Alexander Dinerstein 3p. Cornel has an elegant honte style, whilst Dinerstein is territorial and lead the whole time with a territory lead and ways into Cornel's flaky centre. As with Kulkov and Groenen games the player with highest top 3 match wasn't the same as with highest top 1 match.
- Daniel vs Jonas Welticke. Jonas is known for crazy openings and weird style, which he did here opening on the sides, only 25% win after 50 moves. As expected his wacky moves didn't match much. Daniel played solidly and matched a lot, except Leela got confused by a simple semeai so wanted to be stupid. Also despite having won the semeai already, in calm positions Leela wanted to keep playing the semeai rather than some profitable move elsewhere (but Daniel was winning so much maybe he could essentially pass and still win).
- First (non EGF) pro game. My expectation was pros might score lower matching against Leela than us mid-high amateur dans as they are much stronger and could be playing unexpected better moves. I chose Park Junghwan and Lee Sedol's last game at some festival. Park is a fairly conventional player, whilst Lee is more creative, so I expected Park to match more. Park did match more, but they were both similar to us amateurs. Maybe Leela is stronger than I realised. Leela did not expect the moves which made me feel "Wow, cool pro moves" (often tenuki), but she did better than I did (with brief thinking) in predicting the contact fighting.
- Another of Daniel's from last year, vs Lothar Spiegel 5d from Austria who is a fairly sensible player. Lots of matching during long but joseki-ish middle game invasions, but also misses from mistakes and also both players overlooking important sente exchange for a while (f11/g10).
- Gilles van Eeden 6d (classic good shape Dutch 6d) vs Viktor Lin 6d. Most mismatches were due to a ko fight, and a few disagreements in early yose. Going into yose Leela gave Gilles 77% win, but this looks like a misunderstanding of his dead group at top left: if I played out a few more moves to make it clearly dead then the win% collapsed to 57%. In the end he lost by 2.5.


This post by Uberdude was liked by: Bill Spight
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #163 Posted: Sun Apr 08, 2018 10:47 pm 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
Thanks, Uberdude. So can we say that these are informally chosen recent games?

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #164 Posted: Mon Apr 09, 2018 12:13 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
Bill Spight wrote:
Thanks, Uberdude. So can we say that these are informally chosen recent games?

Yes, not randomly, but with criteria such as
- Daniel's 2 games (Groenen, Spiegel) from last season during his Leela period (thought they might have high match, but didn't)
- mine vs Victor for lower ranked upset (and I wanted to see Leela's opinion), few years ago, remainder of games are from this season of the league
- vs Jonas because crazy style
- Ilya vs Artem for classic match up of top Europeans with contrasting styles
- Cornel vs Dinerstein another top 2 Europeans
- van Eeeden vs Lin recent 6d game in league B
- Park vs Lee for top pros, chose their most recent game

I checked the games didn't end in an early resign before analysing (discared le Calve vs Bajenaru for this reason). I think I should do some 5d games next.

Edit: I attached the spreadsheet so Javaness can make the chart hot pink or whatever is his favourite colour.


Attachments:
Leela similarity analysis.xlsx [120 KiB]
Downloaded 316 times
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #165 Posted: Mon Apr 09, 2018 12:16 am 
Gosei

Posts: 1492
Liked others: 111
Was liked: 314
Might not a metric such as "Average distance from Leela's Goodness Value for its first choice move" be a more interesting metric?
Also, can the chart please use a different colour than blue.

_________________
North Lecale

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #166 Posted: Mon Apr 09, 2018 5:48 am 
Lives with ko

Posts: 259
Liked others: 46
Was liked: 116
Rank: 2d
Uberdude wrote:
I checked the games didn't end in an early resign before analysing (discared le Calve vs Bajenaru for this reason). I think I should do some 5d games next.

If you want some of mine (3D OGS/IGS) to help fill the data set, I probably still have a bunch of Leela .rsgf files from analyzing my OGS correspondence games.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #167 Posted: Mon Apr 09, 2018 10:03 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
Javaness2 wrote:
Might not a metric such as "Average distance from Leela's Goodness Value for its first choice move" be a more interesting metric?

Yes, but these win% actually can't be trusted much, even as Leela's evaluation of a position, when there aren't many simulations. The protocol I have been using is load the game for analysis into Leela, kick off the analysis until it reaches around 50k nodes and then use analysis window (which is sorted by # simulations, not win %) to check if the move played was in top 3. See example screenshot below. So at this point Leela wants to play d15 with 51.3% win, the move played in the game was d14 which is #3 and 47.6%, so still within the 5% band, but few simulations so that number is not so reliable. This counts as a match, if d15 fluctuated up another 1.3% the difference would be over 5% and this wouldn't count as a match.
Attachment:
d15#1.PNG
d15#1.PNG [ 251.82 KiB | Viewed 8375 times ]


Go forward one move and let Leela analyse the position with d14 on the board for 50k nodes, then go back to before and you get the below, it now thinks d14 is the best move with highest win%! This seems to indicate Leela's algorthim isn't tuned to be exploratory enough. Another analysis protocol would be to ensure the actual move played also gets 50k (or whatever) nodes of exploration, this could lead to higher match rates if Leela changes its mind to think the real game move is better than it anticipated when focusing on its #1 move. I chose my method as it's how I imagine a Leela cheater wondering what to play next would operate.
Attachment:
d14#1.PNG
d14#1.PNG [ 250.22 KiB | Viewed 8375 times ]


This post by Uberdude was liked by: Charlie
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #168 Posted: Mon Apr 09, 2018 5:47 pm 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
BlindGroup wrote:
Uberdude wrote:
I only have 10 data points, but fitting them to a normal distribution (dubious: too small sample, could be different shape, plus 100 is a hard max) I get a mean of 80 and standard deviation of 8. So then you might say 98 is 2.2 sds from the mean, what's the chance of that? Look up your normal distribution probability tables and you get 1.2%. That's small, an inept statistician would say, less than the oft used 0.05 significance level, he must be guilty! But that's the chance a randomly selected game has that value (based on the false assumption the metric is normally distributed with those parameters). But this game was not randomly selected, it was chosen to be examined precisely because it has a high similarity. So such a probability is invalid. As Feynman eloquently said:

Quote:
You know, the most amazing thing happened to me tonight. I was coming here, on the way to the lecture, and I came in through the parking lot. And you won’t believe what happened. I saw a car with the license plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight? Amazing!


Uberdude, your taking the time to go through even these 10 games seems to be more than we've seen anyone else doing to systematically assess these decisions. A few thoughts to contribute:

1. As you note a sample size of 10 data points is VERY small. I think even "inept statisticians" would be uncomfortable move forward with only these data. That said, this is not meant to criticize your efforts, but rather to argue that your are on the right track and that your efforts should be extended significantly by some organization with significantly greater access to computational resources.

2. I think you have the logic of the hypothesis testing framework slightly twisted and it affects the interpretation of the 1.2 percent error rate (the "Type I" rate). You are right, we chose the game with the 98 percent top-3 match rate deliberately -- it was the game under question.


Having looked at a number of videos about cheating at online chess, I think I'll second Uberdude here. It appears that a lot of online chess cheating is by playing every move chosen by a superhuman chess engine. The match to the top three plays, except for a mistake or blunder, is because the engine is unknown, as is how long the engine ran, and on what hardware. That metric seems to have been chosen to give almost 100% matches. Almost 100% matches to a superhuman chess engine, then, seems to be part of the theory of online cheating at chess. It looks like this game was came into question because of the near 100% matches with Leela. (A 4 dan losing to a 4 dan is not enough to question the game.) If so, a Fisherian cannot use that game to prove cheating, because it was not randomly chosen. A Bayesian can, but, speaking as one, I think that all the game does is to raise suspicion. Confirmatory evidence is weak.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #169 Posted: Tue Apr 10, 2018 12:51 am 
Oza

Posts: 3644
Liked others: 20
Was liked: 4620
Quote:
A Fisherian cannot use that game to prove cheating, because it was not randomly chosen. A Bayesian can, but, speaking as one, I think that all the game does is to raise suspicion. Confirmatory evidence is weak.


Even without understanding the statistical nuances I can easily agree that what has happened to Carlo has been highly unsatisfactory, and the bulk of opinion in this thread seems to be of like mind.

But I think it is also important to try to see it all from the point of view of the organisers and referees.

Kasparov championed a form of chess in which a pro played with a machine to help him (just like the alleged involvement of Leela here). Despite his fame it didn't get any traction. There have been a tiny handful of such games in go (mainly in Taiwan) and they sank without trace as far as I could see. I never even saw a report on how often the pro had recourse to the machine. That, plus the amount of comments on chess cheating I've seen, leads to me to believe that people see cheating in chess/go, just as people see drugs in athletics, very much in black-and-white terms. Halfway houses and discreet averting of the eyes are not tolerated by the vast majority. Cheating must be stamped out - even if some people suffer wrongly, it seems.

Now, given that this can only be done on the basis of some sort of probabilistic assumptions, is it possible to lend support to organisers and referees (and through them to the overwhelming majority of players) by using statistics in the same way that seems to be accepted elsewhere. What I have in mind is something like the "significance" factor which is often mentioned in connection with 95% probability. How can such a metric be devised and accepted?

Acceptance doesn't seem to be a problem to me because human are used to running their entire lives on the basis of probability. But it could perhaps be made easier to accept if the first "punishment" was not so swingeing. E.g. a player could be put on notice that he is suspected of cheating. The arbiters could also indicate what measures need to be taken to satisfy them in future (e.g. a player could video himself while playing an important on-line game and use that to show that he is not consulting a machine).


This post by John Fairbairn was liked by 2 people: Charlie, ez4u
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #170 Posted: Tue Apr 10, 2018 4:36 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
John Fairbairn wrote:
But I think it is also important to try to see it all from the point of view of the organisers and referees.

{snip}

people see cheating in chess/go, just as people see drugs in athletics, very much in black-and-white terms. Halfway houses and discreet averting of the eyes are not tolerated by the vast majority. Cheating must be stamped out - even if some people suffer wrongly, it seems.


There was a cheating scandal in an IGS tournament in the 1990s in which Sprint, a strong Chinese amateur, was discovered to have gotten help from a Chinese pro. That may have made Pandanet sensitive to accusations of cheating in their tournaments. AFAICT, nothing written here criticizing the treatment of the evidence in this case, or the CIT case, condones cheating.

Quote:
Now, given that this can only be done on the basis of some sort of probabilistic assumptions,


That may be so in these cases, but not in general, as Regan has pointed out. (Unless you are a Bayesian. ;)) Even in the case of casual online chess cheating, the cheaters typically put down their opponents, a form of behavioral evidence. (In itself weak, OC, but not just statistical.)

Quote:
is it possible to lend support to organisers and referees (and through them to the overwhelming majority of players) by using statistics in the same way that seems to be accepted elsewhere.


That was not done in this case. I have argued in Bayesian terms, first, because I am a Bayesian, and second, because Bayesians, like most of the public, and like the organizers and referees, believe in confirmatory evidence. But, unlike most of the public, we know that it is very, very weak. The use of confirmatory evidence is not generally accepted statistical practice.

Quote:
What I have in mind is something like the "significance" factor which is often mentioned in connection with 95% probability. How can such a metric be devised and accepted?


Regan addresses that in chess, not with the question of whether a player plays like Houdini or other top engine (confirmatory evidence), but whether the player plays better than he does without cheating (disconfirmatory evidence). Regan can make use of individual moves, because he is able to rate them. Thus, an obvious play, even though every engine would play it, does not count against the player because it is what he would play without cheating. In go, we are not able to do that yet; give us a few years. What we have to do instead is to rely upon the judgement of strong players. For instance, in the Reem vs. Metta game, consider the sequence, :b87: - :w96:, where Black secures the bottom right corner. Black has options for :b87:, but given that play and White's responses, the four plays, :b89: - :b95:, would be played by not only by Carlo Metta, but also by weaker dan players who were not cheating. Even in cases of suspected online cheating at chess, accusers look at the plays of suspected cheaters and point out plays that are unlike human plays, or human plays of the level of the suspect. That is, the accusers look for disconfirmatory evidence, not confirmatory evidence, or not just confirmatory evidence. The four Black plays, :b89: - :b95:, are confirmatory evidence of the proposition, "He plays like Leela", but are not evidence of cheating. The question is not just a reliance upon statistical evidence alone, but a reliance upon the wrong statistical evidence.

Now, if one is using a bot to cheat, then one's play will resemble that of the bot, to some extent. Therefore, as Blindgroup points out, given enough games, the number of plays that are matches to Leela's choices but not because of cheating should even out, on average. But that is not the case for a single game. You need to look at a number of games in which Carlo is suspected of cheating, such as all of his games in this tournament, and compare them with other games in which he is not suspected of cheating. That is, we must look for disconfirmatory evidence: Carlo plays differently in one set of games from how he plays in the other set of games. If you suspect him of cheating in all games, then you compare his play against the play of other players of similar ability. OC, in that case the similarity of his play to Leela's may simply be evidence, not of cheating, but of intensive training with Leela for a couple of years.

So you could, if you gloss over the question of randomization, set up a significance test using some metric of similarity to Leela's play. But doing so would involve the use of a large number of games, and any statistically significant result would not be a 98% match in a single game.

Quote:
Acceptance doesn't seem to be a problem to me because human are used to running their entire lives on the basis of probability. But it could perhaps be made easier to accept if the first "punishment" was not so swingeing. E.g. a player could be put on notice that he is suspected of cheating. The arbiters could also indicate what measures need to be taken to satisfy them in future (e.g. a player could video himself while playing an important on-line game and use that to show that he is not consulting a machine).


As I have said, the evidence in that one game is enough to raise suspicion. And that would justify the organizers to treat Carlo like Caesar's wife, requiring him to be above suspicion, and require that his future games be monitored. It would also justify looking at the plays in the questioned game to see whether the result might be voided. It might even be possible to find further evidence of cheating by analyzing that game.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.


This post by Bill Spight was liked by 2 people: Charlie, dfan
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #171 Posted: Tue Apr 10, 2018 6:29 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
Here is one way to set up a significance test, but it does not require a lot of games. :)

Recruit a panel of three 6 dans who are unfamiliar with Carlo's games in this tournament. (To give Carlo the benefit of the doubt with regard to his level, since he did beat 6 dans in this tournament.) Have each of them, without consultation, record what they would play where Carlo had the move in the range, move 51 - 100, for each game he played. Then eliminate from comparison moves where Carlo matched both one of Leela's three choices and one of the panel's choices, as not being evidence of cheating. This gives you disconfirmatory evidence: not like the panel, who we know not to be cheating. :) The remaining plays are potentially cheating plays. Those that do not match Leela's choices are, by presumption, not cheating plays; those that do match are still potentially cheating plays. But we still do not have a null hypothesis.

However, we know that the judges on the panel are not cheating. Treat their plays in like manner. For each of them, put Carlo on their panel and eliminate those plays where their play matches both Leela and one of the other three players. This process yields a 2x4 matrix, where we have cells labeled Carlo-like-Leela, Carlo-unlike-Leela, Judge1-like-Leela, Judge1-unlike-Leela, etc. Is Carlo's play significantly different from that of the Judges? (The null hypothesis is that they are not different.)

:D

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.


This post by Bill Spight was liked by: gamesorry
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #172 Posted: Tue Apr 10, 2018 5:30 pm 
Lives in gote

Posts: 389
Liked others: 81
Was liked: 128
KGS: lepore
Bill Spight wrote:
...This process yields a 2x4 matrix, where we have cells labeled Carlo-like-Leela, Carlo-unlike-Leela, Judge1-like-Leela, Judge1-unlike-Leela, etc. Is Carlo's play significantly different from that of the Judges? (The null hypothesis is that they are not different.)

:D


This null hypothesis could be rejected if one of the judges plays differently than Carlo and the other two judges. The null is rejected, but we couldn't draw a negative inference about Carlo.

The quest for a statistical test that answers what we all want to know seems truly impossible. We are always ending up with a few moves that "could be called suspicious" but we can't agree on much more than that. I bet the same could be said of most randomly selected games between two 5 dans.

If, as Uberdude points out - he were using Leela Zero to cheat, then it would be an easier problem to solve because he would be playing highly effective, but unintuitive, moves.

Not guilty because there is massive reasonable doubt. Time to move on.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #173 Posted: Tue Apr 10, 2018 9:36 pm 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
mhlepore wrote:
Bill Spight wrote:
...This process yields a 2x4 matrix, where we have cells labeled Carlo-like-Leela, Carlo-unlike-Leela, Judge1-like-Leela, Judge1-unlike-Leela, etc. Is Carlo's play significantly different from that of the Judges? (The null hypothesis is that they are not different.)

:D


This null hypothesis could be rejected if one of the judges plays differently than Carlo and the other two judges. The null is rejected, but we couldn't draw a negative inference about Carlo.


I think that you are thinking of the null that all of the players play alike. That's not the same thing. For starters, Carlo's play has to be closer to Leela's than each of the Judge's play. We can reject the hypothesis that all of the players play alike, but if any of them plays more like Leela than Carlo does, then we cannot reject the null hypothesis that Carlo's play is more like Leela's than the Judges' play. His play is within the fold. (Yes, I did not quite express the null correctly, or precisely.)

Quote:
The quest for a statistical test that answers what we all want to know seems truly impossible.


We can't use, say, a simple Chi-Squared test, but there are statistical tests that handle this kind of situation, with multiple comparisons. :)

As Regan points out, without other kinds of evidence besides similarity to Leela's play, we need very strong evidence to convict someone of cheating. So with this test we might compare Carlo's play with each of the judge's and require a p value less than 0.66% for each comparison.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #174 Posted: Wed Apr 11, 2018 3:15 am 
Beginner

Posts: 2
Liked others: 0
Was liked: 0
Uberdude wrote:
Yes, but these win% actually can't be trusted much, even as Leela's evaluation of a position, when there aren't many simulations. The protocol I have been using is load the game for analysis into Leela, kick off the analysis until it reaches around 50k nodes and then use analysis window (which is sorted by # simulations, not win %) to check if the move played was in top 3.

Did you check if the Leela evaluates the same moves when the game is loaded for analysis and when the game is rebuild from scratch (so engine doesn't know the next move)?

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #175 Posted: Wed Apr 11, 2018 4:32 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
Dmytro wrote:
Uberdude wrote:
Yes, but these win% actually can't be trusted much, even as Leela's evaluation of a position, when there aren't many simulations. The protocol I have been using is load the game for analysis into Leela, kick off the analysis until it reaches around 50k nodes and then use analysis window (which is sorted by # simulations, not win %) to check if the move played was in top 3.

Did you check if the Leela evaluates the same moves when the game is loaded for analysis and when the game is rebuild from scratch (so engine doesn't know the next move)?


Although I load the whole game sgf into Leela, when I ask it for what it wants to play for move X I haven't done any analysis for moves after X (I used a separate sgf replayer to know what the human played) so I don't think the fact the sgf contains that information is used by Leela, but I will check with a truncated sgf. (It's a manual position-by-position analysis rather than bulk analysis of the game like go review partner does). If you go forward from X and do analysis then these simulations of the game tree are used if you move back to X and continue analysis.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #176 Posted: Wed Apr 11, 2018 1:54 pm 
Beginner

Posts: 2
Liked others: 0
Was liked: 0
Uberdude wrote:
Although I load the whole game sgf into Leela, when I ask it for what it wants to play for move X I haven't done any analysis for moves after X (I used a separate sgf replayer to know what the human played) so I don't think the fact the sgf contains that information is used by Leela, but I will check with a truncated sgf. (It's a manual position-by-position analysis rather than bulk analysis of the game like go review partner does). If you go forward from X and do analysis then these simulations of the game tree are used if you move back to X and continue analysis.

I do not know much about Leela interface. But, logically, your way for game analysis looks good. Still, I would prefer to use truncated sgf to be 100% sure that there is no influence from next moves.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #177 Posted: Thu Apr 12, 2018 12:59 am 
Gosei
User avatar

Posts: 2011
Location: Groningen, NL
Liked others: 202
Was liked: 1087
Rank: Dutch 4D
GD Posts: 645
Universal go server handle: herminator
Uberdude wrote:
Yes, but these win% actually can't be trusted much, even as Leela's evaluation of a position, when there aren't many simulations. The protocol I have been using is load the game for analysis into Leela, kick off the analysis until it reaches around 50k nodes and then use analysis window (which is sorted by # simulations, not win %) to check if the move played was in top 3. See example screenshot below. So at this point Leela wants to play d15 with 51.3% win, the move played in the game was d14 which is #3 and 47.6%, so still within the 5% band, but few simulations so that number is not so reliable. This counts as a match, if d15 fluctuated up another 1.3% the difference would be over 5% and this wouldn't count as a match.

[[snip image]]

Go forward one move and let Leela analyse the position with d14 on the board for 50k nodes, then go back to before and you get the below, it now thinks d14 is the best move with highest win%! This seems to indicate Leela's algorthim isn't tuned to be exploratory enough. Another analysis protocol would be to ensure the actual move played also gets 50k (or whatever) nodes of exploration, this could lead to higher match rates if Leela changes its mind to think the real game move is better than it anticipated when focusing on its #1 move. I chose my method as it's how I imagine a Leela cheater wondering what to play next would operate.

[[snip image]]


So, given that Leela's preferred moves are non-deterministic like this, it is possible that the same move might on one run be Leela's top choice, and on another be outside the top 3 or outside the 5% margin?

If so, I wonder what the following test would yield:

Given one of your test games, for every position between moves 50-150, let Leela analyse the position five times, independently (i.e. close and reopen the position between runs). Then record if the human move played was ever Leela's top choice.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #178 Posted: Thu Apr 12, 2018 1:42 am 
Oza

Posts: 3644
Liked others: 20
Was liked: 4620
There is a fascinating parallel to this case that has just been revived. A play, Who Wants To be A Millionaire, with the same basic theme has just opened in London. It is based on a true event in which a contestant on a major UK tv quiz programne won a million pounds and the organisers later challenged that win because they alleged he had help from his wife in the audience. He had to choose one answer from three read out and they alleged she coughed just after what she believed to be the correct answer (as I understand it, it was only her opinion - she did not have access to the answers and as this was 2001 she could not easily look the answers up online, and certainly not away from the gaze of other audience members).

Both denied cheating but the contestant ended up in court and was convicted by a jury. Winning a million pounds was not unusual - it was the coughing that was allegedly unusual.

The play changes the personae a little and does not follow the true ending (in real life the convicted contestant was given a suspended gaol sentence) but instead requires the audience to vote electronically on whether or not cheating took place.

Maybe we could try an electronic vote here, too.

The fuller story (and maybe corrections for anything I've mis-stated) can be found at http://www.bbc.co.uk/news/entertainment-arts-43700097. Two comments from me: (1) I can't see anything in it that would make this a typically British crime; (2) the show's presenter apparently thought the contestant was as "guilty as sin." That presenter was a hugely popular figure at the time - did his celebrity influence how the jury voted?

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #179 Posted: Thu Apr 12, 2018 2:26 am 
Honinbo

Posts: 9545
Liked others: 1600
Was liked: 1711
KGS: Kirby
Tygem: 커비라고해
My view on this whole thing...

* Cheating in online tournaments can't be prevented. Even before Leela, people could use online resources, etc., to cheat.
* Requiring a webcam, for example, could mitigate (but not solve) the issue.
* I don't think punitive measures can fairly be taken without absolute proof of cheating.
* For important tournaments, sponsors should realize the potential for cheating, and try to reduce the risk of cheating as much as possible. For example, it's probably harder to cheat in an in-person tournament.

It's mathematically interesting to analyze probabilities of moves and all that stuff, but at the end of the day, I think you can't fairly punish someone just because their moves seem like a computer's.

_________________
be immersed

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #180 Posted: Thu Apr 12, 2018 3:34 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
John Fairbairn wrote:
There is a fascinating parallel to this case that has just been revived. A play, Who Wants To be A Millionaire, with the same basic theme has just opened in London. It is based on a true event in which a contestant on a major UK tv quiz programne won a million pounds and the organisers later challenged that win because they alleged he had help from his wife in the audience. He had to choose one answer from three read out and they alleged she coughed just after what she believed to be the correct answer (as I understand it, it was only her opinion - she did not have access to the answers and as this was 2001 she could not easily look the answers up online, and certainly not away from the gaze of other audience members).

Both denied cheating but the contestant ended up in court and was convicted by a jury. Winning a million pounds was not unusual - it was the coughing that was allegedly unusual.


I wonder if they were bridge players. ;) To quote myself from earlier in this thread:
Quote:
When I was in high school a couple of little old ladies told me about some ways to cheat at bridge. At that time some people would open One Club with fewer than four cards in the suit. (Actually, a lot of people played that system.) The cheaters would politely cough before bidding One Club with only three cards in the suit. ;) OC, everybody at the table was in on the secret, so it was not exactly cheating.

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 720 posts ]  Go to page Previous  1 ... 6, 7, 8, 9, 10, 11, 12 ... 36  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group