The topic should serve as an easy to link reference to my analysis of player performance measured by the available bots, in the current version mostly Leela 0.11. I started to work on it in relation to the
PGETC case in which an Italian player Carlo Metta was accused of using Leela in his internet games. After an original analysis based on matching the played moves to Leela top 3 suggestions proved to be inconclusive I decided to try a more detailed analysis with an idea of comparing the accused player performances (mistakes histograms) in his games played on internet and at regular (real life) tournaments. The analysis is inspired by works by Ken Regan on measuring in-game player ratings and catching those who cheat with AI in chess, see e.g.
an review article. The idea is to look at frequencies/probabilities the players make mistakes of a given value (play moves that lower the probability to win a game, e.g. lowering the winning probability by 1-2%, 3-4% etc). This should form a histogram (or pattern) reflecting the player performance. If a player makes significantly lower number of mistakes in his internet games when compared with games of the same player in regular tournaments, then it could be an indication that the player used an outside help.
The analysis is presented in a form of spreadsheet files with each sheet containing an analysis of one game. For each move the bot is used to estimate the probability to win the game (winrate) before and after the move is played. The difference delta (set to 0 by definition if the top bot choice is played) provides the value of mistake the player makes. For each game separately, the results of the performed analysis can be seen in the histogram tables provided at the top right of each sheet assigned to a particular game. The tables show (separately for black and white player) how many moves were played with delta falling into a specific interval. The percentages of good moves (the played move had a winrate within 1% of the top move suggested by the AI, or even bettered the top move the AI found) and bad moves (causing a drop of the winrate by at least 10%) are also shown there.
The
original analysis included 4 internet games by Carlo Metta and 4 of his games played at regular tournaments.
The
current analysis of his internet games is far more extensive. It includes four PGETC games and two games from the Italian Championship Online, all played by Carlo Metta before he was accused of cheating. For a comparison three more PGETC games played by Carlo are included that he played after the accusation. Finally, the analysis of the Bryant-Metta game played in the PGETC qualification match is added as well. The
analysis of Carlo's regular games was also updated to include four games played at WAGC and two games played in the Italian Championship Final.
Some notes on the internet games played by Carlo before the accusation:
The data shown in the current analysis are from new runs, so the results are slightly different from those in the earlier analysis (e.g. Carlo had 68% of good moves in the old analysis of the Kulkov-Metta game, it dropped to 64% now). The new runs are more consistent as they come from "automated runs" of
Go Review Partner while a good fraction of the original analysis included hand-transcribed winrates. The differences between the original and new delta histograms are relatively small and demonstrate variations due to independent runs of Leela.
Carlo makes almost no big mistakes (marked by red color) in his internet games which is in contrast when compared with his regular games. One can make only 1-2 (or even 0) big mistake but not so consistently as my preliminary results for another player show (I am still finalizing the analysis, hope to make it public soon).
The percentage of good moves in Carlo internet games is rather consistent, unlike in his regular games. The percentage of good moves drops sharply to 50% in a game against Csaba Mero played after the accusation.
The game Bryant-Metta from the UK-IT qualification match is also interesting. It is the only game analyzed so far in which Leela 0.11 has trouble to "understand the game" and provide stable winrates. It was suggested that Carlo used Leela Zero here, though there is no real proof for it.
Two of these games
were also analyzed with the AQ bot. Unfortunately, the winrates estimated by the bot are not as stable as those provided by Leela.
I intend to edit this message heavily to provide more information and updates on the analysis. Expect more later.