On handling online cheating with AI

NordicGoDojo · Post by **NordicGoDojo** » Wed Jun 10, 2020 5:56 pm

FinrodFelagund wrote:A first solution might be to simply use the AI to characterize the skill of players and the skill level of specific games.

1.) Find the average winrate change (i.e. move 1 deviates from the AI's top choice by some percentage, called the move's delta for players) in a specific rank by analyzing a high number of games.

2.) Create a histogram for all ranks, from 25k to pro level play. This might mean that the average pro level move is only -5% (an example, I have no idea), and that the average 1d move is -15%.

You can use this data to identify suspicious players. The statistics of any player could be displayed in a public way on the server, so that instead of forcing the admins to decide if a player is cheating, the decision would be up to the suspicious player's opponents. This would work even for high dan players, where it would be obvious if their average deviation put them on par with or above top pros. Matchmaking systems and challenges could also be customized, so that you can set your own limits on the suspicious-ness of your opponents.

Eventually, you could use this kind of data to replace traditional ranking systems. Instead of a 1dan being some arbitrary elo rating, you could assign the rank to an average move delta.

If the cheater was clever, and didn't always pick the best move, they would end up playing at some reasonable level. It would be more difficult to detect these cheaters, but they would also do less harm, since they wouldn't be playing above the strength of the account and so would not on average be frustrating for honest players.

The only really difficult part of this is choosing the number of playouts and the specific engine. My guess is that even a relatively modest level of playouts with any strong engine would work, because the most important thing is that you use the same engine with the same settings for everyone, like a measuring stick.

I have looked into this, and unfortunately it does not look feasible.

I have analysed a considerable number of games with KataGo looking at the size of a player's 'average mistake' – not in terms of winrate-%, but points, as KataGo is able to do. I think this method is more robust than looking at the winrate, because even a very small mistake can cause a big winrate change when the game is close.

After analysing a few players, Ke Jie's average mistake seemed to be in the range of -0.5 points per move. For my own games, I got around -0.8 points; then for Shūsaku I got around -1.2 points, at which point I started to get suspicious. Then I checked a few European 6d players, who came to around -1.5 points; and then I came upon a game by Lukas Podpera 7d and Tanguy Le Calvé, which had an average mistake of only -0.3 points per move for both players.

Investigating further, I realised that the size of the average mistake depends on the 'nature' of the game: fighting-oriented games inevitable lead to higher average mistakes and peaceful games lead to lower average mistake. This is why Ke Jie's -0.5 points per move is impressive. Even if we analysed winrate-% rather than KataGo-points, I believe we would get the same conclusion.

In order to find a way to rank players by the size of their average mistake, it seems we need a way to quantify how 'complex', or 'error-prone', a particular game is. So far I have not thought of a way to accomplish this.

Yakago · Post by **Yakago** » Thu Jun 11, 2020 2:11 am

FinrodFelagund wrote:A first solution might be to simply use the AI to characterize the skill of players and the skill level of specific games.

1.) Find the average winrate change (i.e. move 1 deviates from the AI's top choice by some percentage, called the move's delta for players) in a specific rank by analyzing a high number of games.

2.) Create a histogram for all ranks, from 25k to pro level play. This might mean that the average pro level move is only -5% (an example, I have no idea), and that the average 1d move is -15%.

You can use this data to identify suspicious players. The statistics of any player could be displayed in a public way on the server, so that instead of forcing the admins to decide if a player is cheating, the decision would be up to the suspicious player's opponents. This would work even for high dan players, where it would be obvious if their average deviation put them on par with or above top pros. Matchmaking systems and challenges could also be customized, so that you can set your own limits on the suspicious-ness of your opponents.

Eventually, you could use this kind of data to replace traditional ranking systems. Instead of a 1dan being some arbitrary elo rating, you could assign the rank to an average move delta.
...

Before you make this kind of system, you have to be very certain that 'average move delta' correlates where clearly with rank.

For instance if we have 2 1d players:
One player studies with AI a lot, and is for most of the moves able to play moves with low delta, but he is weaker in fighting and is prone to making big decisive blunders.

The other player employs a tricky creative style, and is stronger in fighting. He often has higher delta because he plays unorthodox moves / overplays, but overall he is more consistent due to his fighting ability.

Will these be measured similarly by the system?

And if it's only a 'suspiciousness' measure - I don't want to be judged by other players because of the style that I chose to play, before the game starts.

Harleqin · Post by **Harleqin** » Thu Jun 11, 2020 11:02 am

Sorry, but I am quite unconvinced by Antti's analysis, especially because all moves that are counter-examples are disregarded as “then he didn't cheat for just those moves”.

The way to verify such a method would be to get a large enough sample size of known good/bad games, then doing a double-blind application of the method and determine false/true positive/negative. This would have to be done at different levels, too.

Obvious cases are easy to find, and easy to adjudicate. 100% bot choice: no problem. The difficulty is finding the threshold of obviousness. Where do we cross the line where there's just “strong suspicion”? How do we protect against bias of the judge?

jlt · Post by **jlt** » Thu Jun 11, 2020 11:51 am

Antti's argument is something like:

1) Many moves are blue, and some surprising moves are considered by the AI, so Black is a strong dan player.

2) However a few moves like

are kyu-level mistakes.

It would be interesting to find an algorithm that determines if a mistake is DDK level, or 10k, or 5k level without relying on a human judgment but just on a bot analysis, but that doesn't seem an easy task.

Knotwilg · Post by **Knotwilg** » Thu Jun 11, 2020 12:44 pm

Yakago wrote: For instance if we have 2 1d players:
One player studies with AI a lot, and is for most of the moves able to play moves with low delta, but he is weaker in fighting and is prone to making big decisive blunders.

There's an assumption here that "being weaker in fighting" would still allow you to stay close to AI play most of the time. That's true in the opening but not in the middle game, where the choice for a move depends on the understanding of a group's status and relationships between those positions. Although the AI choice is often on the territorial side, their "understanding" of fighting is supreme.

We must discard the opening from an investigation on cheating because indeed it's much harder to distinguish between learning from (in the mimicking sense) AI and using it. This is similar to amateurs who are strong in the opening because they mimic pro play. Then when the stones get into contact, the positions and the players' images crumble.

If you have learned so much that you can stay close to AI choices throughout the whole game, then you've reached professional strength. To do that in a very short time span is suspicious. To play mostly at AI level and make a few kyu level blunders, is also suspicious.

Bill Spight · Post by **Bill Spight** » Thu Jun 11, 2020 3:31 pm

NordicGoDojo wrote:I have analysed a considerable number of games with KataGo looking at the size of a player's 'average mistake' – not in terms of winrate-%, but points, as KataGo is able to do. I think this method is more robust than looking at the winrate, because even a very small mistake can cause a big winrate change when the game is close.

Good point. In addition, while evaluation in terms of points only is theoretically problematic, because it ignores the value of having the move (sente), it is much closer to how humans evaluate positions than winrate estimates are.

After analysing a few players, Ke Jie's average mistake seemed to be in the range of -0.5 points per move. For my own games, I got around -0.8 points; then for Shūsaku I got around -1.2 points, at which point I started to get suspicious. Then I checked a few European 6d players, who came to around -1.5 points; and then I came upon a game by Lukas Podpera 7d and Tanguy Le Calvé, which had an average mistake of only -0.3 points per move for both players.

Investigating further, I realised that the size of the average mistake depends on the 'nature' of the game: fighting-oriented games inevitable lead to higher average mistakes and peaceful games lead to lower average mistake. This is why Ke Jie's -0.5 points per move is impressive. Even if we analysed winrate-% rather than KataGo-points, I believe we would get the same conclusion.

Interesting. I think for this kind of approach we need to profile a large number of players, not just in terms of average or median mistakes, but of the whole distribution of errors. Furthermore, since the value of sente changes during the game, we should look at profiles of errors at different stages of the game, which might also reflect different styles of play.

Ideally, we would be able to come up with a finite number of profiles of honest play. Then not fitting any of those honest profiles would be evidence of possible cheating. I.e., you got some 'splainin' to do.

NordicGoDojo · Post by **NordicGoDojo** » Thu Jun 11, 2020 6:11 pm

Bill Spight wrote:
Good point. In addition, while evaluation in terms of points only is theoretically problematic, because it ignores the value of having the move (sente), it is much closer to how humans evaluate positions than winrate estimates are.

What do you mean? E.g., if you pass at the start of the game, KataGo thinks you made a loss of roughly 13 points.

Bill Spight · Post by **Bill Spight** » Thu Jun 11, 2020 8:30 pm

NordicGoDojo wrote:
Bill Spight wrote:
Good point. In addition, while evaluation in terms of points only is theoretically problematic, because it ignores the value of having the move (sente), it is much closer to how humans evaluate positions than winrate estimates are.

What do you mean? E.g., if you pass at the start of the game, KataGo thinks you made a loss of roughly 13 points.

Sorry, I misspoke. It's human evaluation in terms of points that does not take the value of the move into account.

The problem with evaluation by points alone is that the payoff is not in terms of points.

Edit: But the value of sente still affects the error function for points.

Yakago · Post by **Yakago** » Fri Jun 12, 2020 12:04 am

Knotwilg wrote: If you have learned so much that you can stay close to AI choices throughout the whole game, then you've reached professional strength. To do that in a very short time span is suspicious. To play mostly at AI level and make a few kyu level blunders, is also suspicious.

Now that is an assumption

Either way, it is a bit controversial to which extent one can act on 'suspicion' as has been discussed in this thread.

Perhaps we should not look at 'punishing' suspicious players with 'flags' and what not - just give them an untimely promotion.

Then we can focus our attention on cheaters at the highest ranks, where it should be easier to find 'superhuman play'.

(yes, this of course also has its downsides..)

jlt · Post by **jlt** » Fri Jun 12, 2020 1:40 am

Is cheating really a big problem at levels other than high dan ranks? Do some of you frequently encounter players that you suspect of using AI?

(I am not talking about tournaments, only about usual games on a server.)

gennan · Post by **gennan** » Fri Jun 12, 2020 4:24 am

jlt wrote:Is cheating really a big problem at levels other than high dan ranks? Do some of you frequently encounter players that you suspect of using AI?

(I am not talking about tournaments, only about usual games on a server.)

I play about 10-20 (casual) games a week online an I've never encountered an opponent who I suspected of cheating. This may depend on the server though.

Shenoute · Post by **Shenoute** » Mon Aug 03, 2020 9:07 am

jlt wrote:Do some of you frequently encounter players that you suspect of using AI?

Here is the graph of a player I saw on IGS. After more than a year of being 4d and having a 50% win/loss ratio, (s)he is now up to 7d and 80% wins since mid-july 2020. His/her results in the last 20 games, played over 4 days against opponents ranked 5d+ to 8d, are 19 wins - 1 loss (on time).
I checked randomly a dozen of his/her games with Leelazero: almost all followed the same pattern, with the player at hand taking an early lead an never releasing it.

But I'm sure someone will come forward and say that it is possible to improve so much and so fast because he knows the cousin of a friend of his sister-in-law who did, and that we can do absolutely nothing against cheating online because we can never know for sure.

Life In 19x19

On handling online cheating with AI

Re: On handling online cheating with AI

Re: On handling online cheating with AI

Re: On handling online cheating with AI

Re: On handling online cheating with AI

Re: On handling online cheating with AI

Re: On handling online cheating with AI

Re: On handling online cheating with AI

Re: On handling online cheating with AI

Re: On handling online cheating with AI

Re: On handling online cheating with AI

Re: On handling online cheating with AI

Re: On handling online cheating with AI