It is currently Wed Nov 20, 2019 12:14 am

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 720 posts ]  Go to page Previous  1 ... 11, 12, 13, 14, 15, 16, 17 ... 36  Next
Author Message
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #261 Posted: Tue Jun 05, 2018 5:00 am 
Dies in gote

Posts: 65
Liked others: 31
Was liked: 55
Quote:
The basic idea, that he did very well in this year's PGETC is of course relevant as an initial starting point. However, the winning percentages from Go Rating are probably not very reliable. Thus the figure you quote (1/3000) is probably best left out. Andrew Simon's already mentioned two players with similar 'super' performances this year.


May you provide a reference? I do not recall any 4d (and not fast improving!) player performace like that. Of course, there are fast improving 1d players who perform as 3d at tournaments regularly. I agree, the figure 3000 tournaments is approximate, thought even if it was 1000 ...

Quote:
Just to be clear: the allegedly fabricated game record is *not* the kifu from the game that raised accusations of cheating but another game, between Carlo Metta and Kim Shakhov. You really should be specific in this instance.


I am quite specific on it in the report, did not feel like copy/pasting from the report when people can read it.

Quote:
A possible source of bias is that you're comparing games he won with games he mostly lost. A possible source of bias is that you're comparing games he won with games he mostly lost.


I am definitely aware of it. The problem is they are not that many regular games Carlo won recently with the records available.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #262 Posted: Tue Jun 05, 2018 5:16 am 
Lives in gote
User avatar

Posts: 306
Location: Deutschland
Liked others: 264
Was liked: 125
Rank: EGF 4 kyu
AlesCieply wrote:
Quote:
Just to be clear: the allegedly fabricated game record is *not* the kifu from the game that raised accusations of cheating but another game, between Carlo Metta and Kim Shakhov. You really should be specific in this instance.


I am quite specific on it in the report, did not feel like copy/pasting from the report when people can read it.


Do not be arrogant. Many people who read this thread will not go and download your PDF and read it in great detail.

You are not only accusing someone of cheating but also accusing them of fraudulently fabricating evidence! The very least you could do is exercise some care and diligence in doing so!

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #263 Posted: Tue Jun 05, 2018 5:21 am 
Lives in sente

Posts: 1257
Liked others: 102
Was liked: 265
AlesCieply wrote:
May you provide a reference? I do not recall any 4d (and not fast improving!) player performace like that. Of course, there are fast improving 1d players who perform as 3d at tournaments regularly. I agree, the figure 3000 tournaments is approximate, thought even if it was 1000


I am actually surprised you don't already have this data, because this tournament is so obvious to check.
Just sort this list of performances http://www.europeangodatabase.eu/EGD/To ... y=T160920A


Hidden is the raw gain (at 50%) but of course TPR should be bigger
Code:
GoR after tournament: 2051.345    52.544    Chris Bryant (UK)
GoR after tournament: 2171.653    47.167          a 1d
GoR after tournament: 2309.134    46.753   Daniel Hu (UK)
GoR after tournament: 2259.935    35.214    a 2d
GoR after tournament: 2305.714    30.255    Carlo Metta
GoR after tournament: 2092.218    30.121
GoR after tournament: 2397.215    24.673
GoR after tournament: 2043.823    21.05
GoR after tournament: 2433.031    18.133
GoR after tournament: 2121.004    17.008
GoR after tournament: 2188.365    16.505
GoR after tournament: 2311.131    14.747
GoR after tournament: 2310.588    13.407
GoR after tournament: 2126.191    13.002
GoR after tournament: 1658.588    12.903
GoR after tournament: 2083.033    11.992
GoR after tournament: 2327.441    11.614
GoR after tournament: 2270.889    11.504
GoR after tournament: 2252.379    11.152
GoR after tournament: 2338.201    10.159
GoR after tournament: 2148.369    10.044
GoR after tournament: 2032.917    10.041
GoR after tournament: 2345.868    9.099
GoR after tournament: 2062.001    9.088
GoR after tournament: 2554.068    9.065
GoR after tournament: 2102.409    7.755
GoR after tournament: 2188.786    6.338
GoR after tournament: 2082.72    6.08
GoR after tournament: 2231.591    4.753
GoR after tournament: 1784.558    4.605
GoR after tournament: 2435.44    4.306
GoR after tournament: 2121.256    4.192
GoR after tournament: 2226.391    4.011
GoR after tournament: 2262.345    3.901
GoR after tournament: 2415.592    3.895
GoR after tournament: 2708.218    3.89
GoR after tournament: 2145.097    3.855
GoR after tournament: 2503.11    3.139
GoR after tournament: 2739.956    2.731
GoR after tournament: 2347.463    2.379
GoR after tournament: 2118.761    2.275
GoR after tournament: 2160.829    2.194
GoR after tournament: 2174.745    2.171
GoR after tournament: 2297.122    1.151
GoR after tournament: 1959.172    1.076
GoR after tournament: 2333.719    0.693
GoR after tournament: 2065.674    0.663
GoR after tournament: 2327.001    0.086
GoR after tournament: 1833.388    -0.314
GoR after tournament: 1702.06    -0.344
GoR after tournament: 1799.57    -0.369
GoR after tournament: 1989.18    -0.4
GoR after tournament: 1687.072    -0.405
GoR after tournament: 2191.142    -1.31
GoR after tournament: 2148.85    -1.474
GoR after tournament: 1949.142    -1.593
GoR after tournament: 1857.958    -1.651
GoR after tournament: 2350.28    -1.728
GoR after tournament: 2502.07    -1.741
GoR after tournament: 2351.544    -2.286
GoR after tournament: 2168.554    -2.343
GoR after tournament: 2329.714    -2.393
GoR after tournament: 2150.505    -2.815
GoR after tournament: 2191.358    -2.824
GoR after tournament: 2329.81    -3.071
GoR after tournament: 2396.55    -3.084
GoR after tournament: 2228.872    -3.798
GoR after tournament: 1985.133    -3.924
GoR after tournament: 2045.834    -4.042
GoR after tournament: 1977.848    -4.236
GoR after tournament: 2382.181    -4.275
GoR after tournament: 2283.186    -4.295
GoR after tournament: 2269.634    -4.33
GoR after tournament: 2158.263    -4.514
GoR after tournament: 2485.878    -4.538
GoR after tournament: 2441.816    -5.301
GoR after tournament: 2337.461    -5.596
GoR after tournament: 2242.681    -5.609
GoR after tournament: 2607.648    -5.901
GoR after tournament: 2332.817    -7.041
GoR after tournament: 2369.122    -7.446
GoR after tournament: 2206.372    -7.641
GoR after tournament: 2169.903    -8.211
GoR after tournament: 2246.102    -8.296
GoR after tournament: 2131.445    -8.419
GoR after tournament: 2212.346    -8.452
GoR after tournament: 2295.239    -8.911
GoR after tournament: 2618.884    -9.866
GoR after tournament: 2153.442    -10.663
GoR after tournament: 2441.498    -10.776
GoR after tournament: 2387.235    -11.255
GoR after tournament: 2447.258    -11.847
GoR after tournament: 1972.993    -11.971
GoR after tournament: 2362.567    -12.1
GoR after tournament: 2306.047    -12.167
GoR after tournament: 2480.258    -13.333
GoR after tournament: 2485.259    -13.444
GoR after tournament: 2502.231    -13.591
GoR after tournament: 2331.652    -13.996
GoR after tournament: 2370.293    -14.179
GoR after tournament: 2256.515    -15.038
GoR after tournament: 2429.096    -16.269
GoR after tournament: 2077.376    -22.624
GoR after tournament: 2160.054    -24.437
GoR after tournament: 1922.164    -31.362
GoR after tournament: 2249.452    -32.76

_________________
North Lecale


This post by Javaness2 was liked by: Uberdude
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #264 Posted: Tue Jun 05, 2018 5:22 am 
Lives in sente

Posts: 1257
Liked others: 102
Was liked: 265
Quote:
Just to be clear: the allegedly fabricated game record is *not* the kifu from the game that raised accusations of cheating but another game, between Carlo Metta and Kim Shakhov. You really should be specific in this instance.


Let us say that the word 'fabricated' was not a good choice here. I would have gone for 'modified'. Especially in a paper like this.

_________________
North Lecale

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #265 Posted: Tue Jun 05, 2018 5:36 am 
Lives with ko

Posts: 183
Liked others: 25
Was liked: 60
Rank: 2d
Javaness2 wrote:
Quote:
Just to be clear: the allegedly fabricated game record is *not* the kifu from the game that raised accusations of cheating but another game, between Carlo Metta and Kim Shakhov. You really should be specific in this instance.


Let us say that the word 'fabricated' was not a good choice here. I would have gone for 'modified'. Especially in a paper like this.

The report, as I understand it, says Carlo submitted it as an example of an over-the-board tournament game, and it turned out to be a KGS record instead. The word "fabrication" is entirely appropriate if that is indeed correct, and IMO if this is indeed what happened, it justifies any penalty. If you lie to the court, you deserve whatever you get.

Top
 Profile  
 
Online
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #266 Posted: Tue Jun 05, 2018 5:41 am 
Judan

Posts: 6190
Location: Cambridge, UK
Liked others: 354
Was liked: 3340
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
AlesCieply wrote:
Quote:
The basic idea, that he did very well in this year's PGETC is of course relevant as an initial starting point. However, the winning percentages from Go Rating are probably not very reliable. Thus the figure you quote (1/3000) is probably best left out. Andrew Simon's already mentioned two players with similar 'super' performances this year.


May you provide a reference? I do not recall any 4d (and not fast improving!) player performace like that. Of course, there are fast improving 1d players who perform as 3d at tournaments regularly. I agree, the figure 3000 tournaments is approximate, thought even if it was 1000 ...

From earlier in thread:

Just to go back to Carlo, I thought I'd work out his performance rating for this season's PGETC. He had great results for a 4d:
- beat Andrey Kulkov 6d (Russia) by 1.5
- beat Ondrej Kruml 5d (Czechia) by 2.5
- beat Dragos Bajenaru 6d (Romania) by resign
- beat Reem Ben David 4d (Israel) by resign *** the famous 98% game
- lost to Mero Csaba 6d (Hungary) by 2.5
- beat Mijodrag Stankovic "5d" 3d by resign
- lost to Andrij Kravets 7d/1p by 7.5

At the start of the season in (1st) September Carlo's rating was 2381 [very similar to me], this was after picking up 50 points at the EGC. Of course his true strength could have been more than that and grown since then too but his rating lagged. His performance rating (using EGD GoR calculator), using current ratings of opponents is 2629, or +248.

How does that compare to other good performances?

Forum regulars may remember I beat Victor Chow 7d from South Africa a few years ago. UK were in league C for the 2014/15 season and my initial rating was 2361. My results were:
- beat Petrauskas 3d (Lithuania) by resign
- beat Chow 6/7d (South Africa) by 0.5
- beat Ganeyev 3k (Kazakhstan) by resign.
As I had no losses my performance rating with the "adjust until input = output" method is infinite, anchoring with a loss to 2700 gives 2666, anchoring with loss to 2800 gives 2719. So +300 ish with big uncertainty as no losses and few games, the only useful information is I beat a 2616 in one game, how flukey was that?

Last season Daniel on the UK team had no losses, this season he had just 1:
- beat Rasmusson 4d (Denmark)
- beat Karadaban 5d (Turkey)
- beat Welticke 6d (Germany)
- lost to Lin 6d (Austria)
Initial rating was 2402. Performance rating 2616 (+214).
If you include the wins (included some 5ds) from the previous season (for which his initial rating was 2262 but he probably wasn't much weaker than he is now) as well then you get performance rating of 2677 (+415).

Update: Chris this season:
- beat Isaksen 2d (Denmark)
- beat Schlattner 2d (Switzerland)
- beat Kuntay 2d (Turkey)
- beat Palant "5d" 4d (Germany) [quotes is his stated grade, no quotes is GoR where 4d is 2351->2450]
- beat Laatikainen "5d" 4d (Finland)
- beat Unger "3d" 4d (Austria)
- beat Hanevik 3d (Norway)
- beat Groenen "6d" 5d (Netherlands)
- beat Ouchterlony "4d" 3d (Sweden)
- lost to Metta 4d (Italy)
Initial rating 2284. Performance rating 2568 (+284). And if like Lukan you believe Carlo was using LeelaZero (I estimate EGF GoR ~2900) in the last game he gets 2781 (+497) :)


This post by Uberdude was liked by 2 people: Bill Spight, Javaness2
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #267 Posted: Tue Jun 05, 2018 6:06 am 
Lives in sente

Posts: 1257
Liked others: 102
Was liked: 265
bernds wrote:
The report, as I understand it, says Carlo submitted it as an example of an over-the-board tournament game, and it turned out to be a KGS record instead. The word "fabrication" is entirely appropriate if that is indeed correct, and IMO if this is indeed what happened, it justifies any penalty. If you lie to the court, you deserve whatever you get.


If you don't want your evidence to seem neutral, go ahead and choose fabrication. There are some other f words (foolish) you can in there while you are at it.

_________________
North Lecale

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #268 Posted: Tue Jun 05, 2018 6:13 am 
Dies in gote

Posts: 65
Liked others: 31
Was liked: 55
Uberdude wrote:
Last season Daniel on the UK team had no losses, this season he had just 1:
- beat Rasmusson 4d (Denmark)
- beat Karadaban 5d (Turkey)
- beat Welticke 6d (Germany)
- lost to Lin 6d (Austria)
Initial rating was 2402. Performance rating 2616 (+214).
If you include the wins (included some 5ds) from the previous season (for which his initial rating was 2262 but he probably wasn't much weaker than he is now) as well then you get performance rating of 2677 (+415).


This one really stands out, I admit. Thanks for providing the reference. Such performances are still quite rare and I would not consider it as a proof of anyone cheating on its own. I hope that is also clear from what I say in the report. Do also note that Daniel's strength/rating is still improving and does not look as settled as the Carlo Metta's one.


This post by AlesCieply was liked by: Javaness2
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #269 Posted: Tue Jun 05, 2018 7:52 am 
Honinbo

Posts: 9040
Liked others: 2754
Was liked: 3073
AlesCieply wrote:
Finally, I would very much appreciate if Carlo Metta came out and explained why he presented an apparently fabricated game record to the league manager. I do believe he is in principle an honest man who has done a lot for the go community and can continue to do so. I just think he made a mistake with using AI in his internet games and now is afraid of admitting it.
EDIT: Here I refer to a game record from the Shakhov-Metta game Carlo himself suplied (among several other records) claiming it was played at regular tournament and contained also many moves "similar to Leela". In fact, the game was played at KGS and the record was edited to look as played "live", see the report for more details on it.


To me, this behavioral evidence of doctoring and submitting a game record is the strongest evidence of cheating so far. (Assuming that it holds up, OC. :)) As is so often the case, it is the coverup that gets you.

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

Everything with love.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #270 Posted: Tue Jun 05, 2018 8:18 am 
Lives with ko

Posts: 141
Liked others: 26
Was liked: 89
Rank: 5 dan
Analysis on rating that Ales made imho are just signal for a lamp to go up.
Same as when weaker player wins, or when someone is playing stronger online.

Also signal for alarm is when I look at deviations diagram in GRP, when I notice that it goes up for one side and continues to rise, is rather suspicious.
But there should be next step in analysis, going move by move, because some things could be deceiving.

Today I have analyzed game from PGETC (none of the mentioned here) where basically every move from one player is Leela's suggestion.
Basically, 90% of moves were A and B suggestions, and only one move was not suggested by Leela (although it looks nice).

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #271 Posted: Tue Jun 05, 2018 8:28 am 
Honinbo

Posts: 9040
Liked others: 2754
Was liked: 3073
AlesCieply wrote:
Quote:
Just to be clear: the allegedly fabricated game record is *not* the kifu from the game that raised accusations of cheating but another game, between Carlo Metta and Kim Shakhov. You really should be specific in this instance.


I am quite specific on it in the report, did not feel like copy/pasting from the report when people can read it.


I understood that it was a different game, even without looking at either the game record or the report.

AlesCieply wrote:
Quote:
A possible source of bias is that you're comparing games he won with games he mostly lost. A possible source of bias is that you're comparing games he won with games he mostly lost.


I am definitely aware of it. The problem is they are not that many regular games Carlo won recently with the records available.

Emphasis mine.

Given the lack of comparative data (game records) and the uncertainties of evaluation (something I will address in another note), I doubt if a strong statistical case can currently be made against Metta. As Regan points out, a purely statistical case can rarely be made. You need physical or behavioral evidence. Game records can include behavioral evidence. IMO, the record of the game vs. Reem points away from cheating. And the behavioral evidence of doctoring a game record points the other way.

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

Everything with love.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #272 Posted: Tue Jun 05, 2018 8:39 am 
Honinbo

Posts: 9040
Liked others: 2754
Was liked: 3073
Bojanic wrote:
It would be surprising is someone used program for entire game, which would be idiotic to say the least.


Using a program for the entire game seems to be a way of cheating at chess on the internet, at least in non-tournament games, where there is less scrutiny. The main indicator seems to be that the player makes no mistakes or blunders, only "inaccuracies". The program (chess engine) the cheater is using is unknown, but his moves match the top three choices of any given strong engine.

This may be the source of the one-of-the-top-three indicator used in Metta's case, but no theory of cheating has been offered for it. Your theory of cheating is a good one, but would not produce the 98% matches in the Metta-Reem game.

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

Everything with love.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #273 Posted: Tue Jun 05, 2018 9:49 am 
Lives in gote
User avatar

Posts: 602
Liked others: 50
Was liked: 211
AlesCieply wrote:

On page 3 you talk about three points:

(1) Carlo Metta performed unusually well at PGETC;
(2) During PGETC he made more good moves and less bad moves than usual;
(3) He modified an internet game and presented it as a regular tournament game record.

IMO these are only two points, not three. If you make more good moves and less bad moves than usual (point 2), you often beat stronger opponents than usual (point 1).

(But (2) is more precise than (1), as it gives clues about the manner a game was won.)

I also don't understand how you get your statement that such a feat would occur in 1 out of 3000 tournaments. He had 4 victories against 6d, 5d, 6d and 4d, then 1 loss, a victory against 3d and 1 loss. If you only take into account the four first matches, according to http://www.europeangodatabase.eu/EGD/winning_stats.php such a winning streak occurs with a probability 0.1762x0.3x0.5 which is about 0.5 %, i.e. one person accomplishes such a feat once every 200 tournaments; or if you prefer, during a tournament with 200 participants, you can expect one such performance.

Of course the calculation is very rough, but my point is that (1) is not so unusual, as Uberdude also pointed out by giving concrete examples.

Concerning point (2): to determine whether the percentage of good or bad moves was unusually high or low for a 4d player, it would be necessary to analyse a large number of games (say 100) played by 4d players and determine the average percentage of good and bad moves, as well as the standard deviation. In your document, the number of analysed games was much too low to allow any significant statistical analysis.

On the other hand, if (3) is true and if Carlo Metta cannot provide a convincing explanation, the conjunction of (1) and (3) does cast some serious doubts.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #274 Posted: Tue Jun 05, 2018 10:19 am 
Honinbo

Posts: 9040
Liked others: 2754
Was liked: 3073
AlesCieply wrote:
Hello, I guess you know I was involved in dealing with the case as a member of the PGETC appeals committee. Since about that time I started to look into the matter also on my own trying to devise a better and statistically more sound method to check if someone used AI in internet games or not. My analysis is based on comparing the player performance in internet and live games.


IMO this is the right tack, to look for differences instead of similarities. :) Matching data (confirmatory evidence) is weak.

I have looked at your report but not studied it. A few comments, quoting you from it.

AlesCieply wrote:
I decided to try another approach inspired by Ken Reagan’s works (see e.g. [3]) on convincing chess players of using AI help in their games. The method is based on a statistical analysis determining how often players make mistakes of a given magnitude. The stronger the players are they do less mistakes (or smaller mistakes) and the distribution of the mistakes made by a particular player forms a pattern characteristic for the player and his strength.

Emphasis mine.

Two questions that arise are what is a mistake, and how large is it?

First, nobody believes that Leela plays perfectly, or even as well as other current AI bots. So deviations from Leela's play cannot be considered mistakes, even if they probably are. Second, it is improper to confound making a mistake with failing to match Leela's top choice of plays when matching Leela per se is taken as evidence of cheating. AlphaGo Zero is not available for testing plays -- although DeepMind might be persuaded to make it available to go organizations for the purpose of detecting cheating --, but the Facebook network is, I understand, as it has been incorporated into Leela Zero. Not only is it better than Leela, it is different. Use it instead of Leela.

Third, unlike in chess, I do not think that we have a proven method of evaluating plays. That sounds silly, since all the top go bots evaluate plays. However, that is in context of playing a whole game well, not of evaluating specific plays. A general evaluation method that is good enough to play well is not the same thing. Just as top humans can make mistaken evaluations of single plays, even while playing well, so can top bots. Top chess engines seem to be able to distinguish three categories of single errors: inaccuracies, mistakes, and blunders. Go bots have not been shown to achieve that level of evaluation of plays.

One problem is the evaluation of plays in terms of win rates. In the Monte Carlo Tree Search (MCTS) era, win rates were found to be better than score estimates in producing good overall play. However, win rates are not as well defined as score estimates. (You need another parameter similar to komi to use score estimates, anyway. ;)) This lack of definition is indicated by the lack of error estimates for the win rates. Win rates other than 0 or 1 depend upon mistakes. But what level of mistakes, what frequency, and what kind? Who knows?

I do not have enough experience with Leela to say, but MCTS bots were known to make strange plays and win rate estimates in the endgame, unless the game was close. This suggests that the win rate estimates were of a different nature than the win rate estimates earlier in the game. A bot which was behind might make a play that a human dan player might immediately dismiss as a mistake. (Programmers dismissed these human evaluations by saying that humans don't understand win rates. :roll: :roll: :roll: ) One possibility is that the bot's play left open the possibility of a horrendous blunder by the opponent, one that the human player would dismiss out of hand, judging it to be impossible for a player as strong as the opponent. Another possibility is that the randomness of Monte Carlo playouts in such situations simply made the win rate estimates unreliable. In either of these cases, the win rate estimates would be qualitatively different from those earlier in the game. The choice of plays would be less likely to be good, and the size of deviations from the top choice would be less indicative of the size of a mistake (if any).

Now, a good evaluation function for individual plays can surely be developed. For instance, if a particular bot played out the game with White playing first from a certain position 10,000 times you might get a win rate estimate and error estimate for Black of x% ± y%. Suppose that a play from a second position yielded estimates of v% ± w%, and v + w < x and v < x - y. Then we might regard a Black play to the second position instead of the first to be a mistake. :)

AlesCieply wrote:
the work on it is rather slow and tedious.


I suppose by "it" you mean the game analyses and report. I am afraid that a lot of tedious work needs to be done to reach a point where a program can reliably evaluate individual plays to the degree that current chess engines can. Statistical evidence should be based upon making fewer and smaller mistakes, given the opportunity to cheat than otherwise.

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

Everything with love.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #275 Posted: Tue Jun 05, 2018 10:39 am 
Honinbo

Posts: 9040
Liked others: 2754
Was liked: 3073
jlt wrote:
AlesCieply wrote:

On page 3 you talk about three points:

(1) Carlo Metta performed unusually well at PGETC;
(2) During PGETC he made more good moves and less bad moves than usual;
(3) He modified an internet game and presented it as a regular tournament game record.

IMO these are only two points, not three. If you make more good moves and less bad moves than usual (point 2), you often beat stronger opponents than usual (point 1).

(But (2) is more precise than (1), as it gives clues about the manner a game was won.)

I also don't understand how you get your statement that such a feat would occur in 1 out of 3000 tournaments. He had 4 victories against 6d, 5d, 6d and 4d, then 1 loss, a victory against 3d and 1 loss. If you only take into account the four first matches, according to http://www.europeangodatabase.eu/EGD/winning_stats.php such a winning streak occurs with a probability 0.1762x0.3x0.5 which is about 0.5 %, i.e. one person accomplishes such a feat once every 200 tournaments; or if you prefer, during a tournament with 200 participants, you can expect one such performance.

Of course the calculation is very rough, but my point is that (1) is not so unusual, as Uberdude also pointed out by giving concrete examples.

Concerning point (2): to determine whether the percentage of good or bad moves was unusually high or low for a 4d player, it would be necessary to analyse a large number of games (say 100) played by 4d players and determine the average percentage of good and bad moves, as well as the standard deviation. In your document, the number of analysed games was much too low to allow any significant statistical analysis.


There are a few problems with (2), as implemented. First, Leela's top choices are confounded with good moves. (It's a better comparison that matching one of three, but of the same kind.) Another way of determining good moves and mistakes should be used. Second, a lack of fit could be the result, not only of Metta playing differently, but of his having different opponents. (Which he obviously did.) Third, the way in which Metta plays poorly may be quite different from how he plays well. As a counterexample, in a recent game against one of the top bots Haylee gradually lost ground. That is, she played pretty much the same as when she wins. Another player, such as an amateur, might blunder. Another pro might sense losing ground and embark on risky maneuvers. Such specific differences between winning play and losing play could produce poor fits which have nothing to do with cheating. (Again, the lack of a theory of cheating reveals itself.)

Edit: There is another problem with the Chi Square test, using sparse groupings. There should be only four categories for the test, combining the less frequent categories.

Edit: And another problem. Suppose that Leela's top choice is an obvious one, like replying to a sente. A cheater does not need to copy Leela to play an obvious move, so it is irrelevant to the question of cheating, and should not be used in the test. The same goes for sufficiently easy plays, such as a 2 kyu would play. They have to be hard enough that the suspected cheater might miss them without cheating.

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

Everything with love.


Last edited by Bill Spight on Tue Jun 05, 2018 12:37 pm, edited 3 times in total.
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #276 Posted: Tue Jun 05, 2018 11:05 am 
Lives in gote
User avatar

Posts: 602
Liked others: 50
Was liked: 211
Bill Spight wrote:
There are a few problems with (2), as implemented.


But here, we are dealing with a player suspected of cheating with Leela, and not with Elf, Zen, or a 9p giving hints.

If you don't like the terms "good" and "bad", let's use "nice" and "ugly" instead. By definition, a player using Leela to cheat will produce more nice and less ugly moves than during normal play, so if one can prove that C.M. played a very unusually high number of nice moves and a very unusually low number of ugly ones during PGETC, whatever the definition of "nice" and "ugly" is, then this will raise suspicions of cheating using Leela.


Last edited by jlt on Tue Jun 05, 2018 1:25 pm, edited 1 time in total.
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #277 Posted: Tue Jun 05, 2018 12:01 pm 
Honinbo

Posts: 9040
Liked others: 2754
Was liked: 3073
jlt wrote:
Bill Spight wrote:
There are a few problems with (2), as implemented.


But here, we are dealing with a player suspected of cheating with Leela, and not with Elf, Zen, or a 9p giving hints.

If you don't like the terms "good" and "bad", let's use "nice" and "ugly" instead. By definition, a player using Leela to cheat will produce more nice and less ugly moves than during normal play, so if one can prove that C.M. played a very unusually high number of nice moves and a very unusually low number of ugly ones during PGETC, whatever the definition of "nice" and "ugly" is, than this will raise suspicions of cheating using Leela.


There is, IMO, enough evidence to indicate that if Metta cheated, he did so by copying Leela. That being the case, trying to prove by comparing his plays to Leela's confounds the question of how he might have cheated with the question of whether he cheated. They are separate questions, and if we can, without using Leela, address the question of whether Metta cheated, we should do so. Since we now have at least one other way of measuring the quality of individual plays, with the combination of LeelaZero with the Facebook neural network, we can use that, or perhaps use something else. The theory of cheating, as Regan points out, is that the cheater played significantly better, given the opportunity to cheat, than without that opportunity. Use some other bot to evaluate the difficulty of individual plays and the margin of error. Besides, LeelaZero, with or without the Facebook neural net, is better able to rate plays than Leela is. So using it we are better able to compare the quality of Metta's plays.

_________________
The Adkins Principle:

At some point, doesn't thinking have to go on?

— Winona Adkins

Everything with love.

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #278 Posted: Tue Jun 05, 2018 1:11 pm 
Lives with ko

Posts: 183
Liked others: 25
Was liked: 60
Rank: 2d
Lukan wrote:
At the end of his post, I would also like to reveal something strange. On 31st May, this post apparently written by some Italian player, has appeared in this discussion, but it disappeared in about 10 minutes for an unknown reason... (see the file with the screen)
That's quite a serious accusation as well, so I went digging through games from the "metta" account (I don't know for certain that it belongs to the player accused of cheating, but it seems plausible). I found one where he marked opponent stones in seki as dead, but it was around 20k so I'm inclined to disregard it. More interesting is a game from Nov 2007, poporo [3k] vs metta [4k], which ends with Black getting rekt, and he leaves the game with the words:
Quote:
metta [4k]: i'm sorry but i'm used to play go not this disgusting game, please when you'll learn to play this game inform me so i can take you off of my censor list
So that seems to be partial confirmation for the claims in the previous message.

edit: same month, after losing some stones:
Quote:
metta [4k]: thanks for your unfairness
metta [4k]: insert in my censor list
metta [4k]: i inform admins immediately
metta [4k]: please don't play with me again
metta [4k]: at your level it's needed that you grow up a little
metta [4k]: bye little boy
konstantyn [4k]: read my terms...no undo!

edit2:
Quote:
duyen [3k]: it's not a misclick
metta [4k]: congratulations!!! you are the first name in my censor list
metta [4k]: thanks for your unfairness, please don't play with again, in 2 minutes all the english game room will know about your unfairness
duyen [3k]: it's not a misclick, so you can't undo


Last edited by bernds on Tue Jun 05, 2018 1:22 pm, edited 1 time in total.

This post by bernds was liked by: Hidoshito
Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #279 Posted: Tue Jun 05, 2018 1:43 pm 
Lives in gote

Posts: 402
Liked others: 80
Was liked: 123
Rank: igs 4d
bernds wrote:
(...) I went digging through games from the "metta" account (I don't know for certain that it belongs to the player accused of cheating, but it seems plausible). I found one where he marked opponent stones in seki as dead, but it was around 20k so I'm inclined to disregard it. More interesting is a game from Nov 2007, poporo [3k] vs metta [4k], which ends with Black getting rekt, and he leaves the game with the words: (...)

It is also the account of a very frequent escaper. Not that this says much about the cheating case but if this is/was indeed Carlo Metta's account, the idea that such a player can be a referee during the EGC is disturbing/laughable...

Top
 Profile  
 
Offline
 Post subject: Re: “Decision: case of using computer assistance in League A
Post #280 Posted: Tue Jun 05, 2018 2:47 pm 
Beginner

Posts: 6
Liked others: 1
Was liked: 6
Rank: 1 kyu
AlesCieply wrote:
Finally, I would very much appreciate if Carlo Metta came out and explained why he presented an apparently fabricated game record to the league manager. I do believe he is in principle an honest man who has done a lot for the go community and can continue to do so. I just think he made a mistake with using AI in his internet games and now is afraid of admitting it.
EDIT: Here I refer to a game record from the Shakhov-Metta game Carlo himself suplied (among several other records) claiming it was played at regular tournament and contained also many moves "similar to Leela". In fact, the game was played at KGS and the record was edited to look as played "live", see the report for more details on it.


Dear cieply,

I'm Maurizio Parton, one of the authors of the appeal document. Mirco Fanti asked me to answer your messages here on the forum, because he already lost a lot of time answering your emails, and he has an important tournament to organize. I have an EGC to organize, thus I will try to be short and clear.

Carlo agreed with the referee to share some SGFs in order to clarify his style. Carlo looked among his files and indeed made a mistake: he attributed one of his SGFs to a live game, while it was in fact an online game.

But why on earth would Carlo have done this on purpose? What would have been the malignant objective of this manipulation? This game has a low 'similarity' with Leela: why would have Carlo lied to his own disadvantage?!?

The other question: why is the game slightly different from the actual game, like if it was not downloaded from KGS, but handwritten? Well because it *is* handwritten. Every week we meet at our Go club in Pisa, and quite often we ask Carlo to show us a game: he then writes the game down while he comments it. After that, that game is on Carlo's laptop.

As for the new 'analysis' that you, a member of the appeal commission, made and used instrumentally to invoke new accusations against Carlo, I am not going to address it, for several reasons.

The first reason is in the same appeal document that the appeal commission accepted as a proof that the accusations moved against Carlo were flawed:

"The methodology was chosen by people who were not blind to the moves (...) this carries the risk of involuntarily picking a methodology exactly because it confirmed the accusations"

This flawed activity is called 'cherry picking', and *voilà*, you could have bet with 98% probability: this is exactly what is happening! 'Cherries' everywhere! I warmly invite you to read the appeal document.

The second reason why I am not going to address the new round of analysis is in the introduction on Regan's work that you cite yourself:

"His [KEN REGAN] work began on September 29, 2006, during the Topalov-Kramnik World Championship match. Vladimir Kramnik had just forfeited game five in protest to the Topalov team's accusation that Kramnik was consulting a chess engine during trips to his private bathroom. (...) Topalov's team published a controversial press release trying to prove their previous allegations. Topalov's manager, Silvio Danailov, wrote in the release, '... we would like to present to your attention coincidence statistics of the moves of GM Kramnik with recommendations of chess program Fritz 9.' (...) An online battle commenced between pundits who took Danailov's 'proof' seriously versus others, like Regan, who insisted that valid statistical methods to detect computer assistance did not yet exist. (...) In just a few weeks, the greatest existential threat to chess had gone from a combination of bad politics and a lack of financial support to something potentially more sinister: scientific ignorance. In Regan's mind, this threat seemed too imminent to ignore. 'I care about chess,' he says. 'I felt called to do the work at a time when it really did seem like the chess world was going to break apart.'"

This is exactly what is happening now: the Go world is breaking apart. And I'm sorry to say that in this analogy you represent yourself as Regan, but in fact you act like Danailov.

The third reason is that, from Regan's work, it is apparent that in order to create a solid methodology it is necessary to analyze thousands of games. This is not something that can be done in few days or weeks, and not by somebody who repeatedly claims that he is not an expert on statistics.

To be constructive: I think we should focus exactly on creating a solid method, as Regan did, based on science and data, to be applied in future tournaments, because, as explained above, trying to create methods to confirm one's opinion is flawed in the first place. Let's start this process all together: I warmly invite you to send your proposal to the AGM, and/or make proposals on this forum.

Finally: apologies to everybody if I sounded rude. Let's close this sad chapter in the history of Go, and let's start working together, not against each other.

Best regards, Maurizio


This post by figgitaly was liked by 5 people: Bill Spight, Charlie, frmor, theoldway, Uberdude
Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 720 posts ]  Go to page Previous  1 ... 11, 12, 13, 14, 15, 16, 17 ... 36  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group