Measuring player mistakes versus bots

General conversations about Go belong here.
Bojanic
Lives with ko
Posts: 142
Joined: Fri May 06, 2011 1:35 pm
Rank: 5 dan
GD Posts: 0
Has thanked: 27 times
Been thanked: 89 times

Re: Measuring player mistakes versus bots

Post by Bojanic »

Pnprog,

first I would like to appreciate you for the GRP, it is excellent software, great work!

----

On the topic, you can not simply count players moves that correspond to GnuGo.
You can have atari, peep, joseki - and all those moves probably would be answered as best choice by any player.

It is necessary to focus on important moves, move sequences, etc.
Simple statistics is not good enough.
User avatar
pnprog
Lives with ko
Posts: 286
Joined: Thu Oct 20, 2016 7:21 am
Rank: OGS 7 kyu
GD Posts: 0
Has thanked: 94 times
Been thanked: 153 times

Re: Measuring player mistakes versus bots

Post by pnprog »

Bojanic wrote:On the topic, you can not simply count players moves that correspond to GnuGo.
You can have atari, peep, joseki - and all those moves probably would be answered as best choice by any player.

It is necessary to focus on important moves, move sequences, etc.
Simple statistics is not good enough.
Haha, I am just some guy who can makes tools that could be useful for you to test your hypothesis, or perform analysis :)

So I am trying to stay "neutral" on the existing PGETC case, and I won't embark into trying to develop a method to solve future case.

But, if you have some ideas that you want to apply on large set of data, and that it's to much work (and error prone) to do by hand, then I would be happy to help :salute:

Above was just a "proof of concept" of the sort of data that could be extracted from Gnugo, as was mentioned by Uberdude. If some of you believe it could be an useful tool in itself, then I will release the tool in a easy way for you to use.
Bojanic wrote:You can have atari, peep, joseki - and all those moves probably would be answered as best choice by any player
On this specific question, one way to differentiate between important move and urgent move would be, with Leela:
  • Check if Leela only proposes one move: this is a strong indicator that this is a do or die move
  • Check the decrease in win rate before the first top move and the second top move. If the first top move has 51% win rate, and the second top move only has 15% win rate, this also indicate a forced move.
I am the author of GoReviewPartner, a small software aimed at assisting reviewing a game of Go. Give it a try!
Bojanic
Lives with ko
Posts: 142
Joined: Fri May 06, 2011 1:35 pm
Rank: 5 dan
GD Posts: 0
Has thanked: 27 times
Been thanked: 89 times

Re: Measuring player mistakes versus bots

Post by Bojanic »

pnprog wrote: On this specific question, one way to differentiate between important move and urgent move would be, with Leela:
  • Check if Leela only proposes one move: this is a strong indicator that this is a do or die move
  • Check the decrease in win rate before the first top move and the second top move. If the first top move has 51% win rate, and the second top move only has 15% win rate, this also indicate a forced move.
It could be helpful, but some analysis would be needed.
IE, in one game I have seen forced move with two answers, both good.
In other cases, someone might choose not to answer peep, or to play other move nearby.
User avatar
pnprog
Lives with ko
Posts: 286
Joined: Thu Oct 20, 2016 7:21 am
Rank: OGS 7 kyu
GD Posts: 0
Has thanked: 94 times
Been thanked: 153 times

Re: Measuring player mistakes versus bots

Post by pnprog »

Hi!

In some other thread, you mentioned that the PGETC games also have time record. This is also something that could be extracted together with other informations, in its own column.
I am the author of GoReviewPartner, a small software aimed at assisting reviewing a game of Go. Give it a try!
User avatar
pnprog
Lives with ko
Posts: 286
Joined: Thu Oct 20, 2016 7:21 am
Rank: OGS 7 kyu
GD Posts: 0
Has thanked: 94 times
Been thanked: 153 times

Re: Measuring player mistakes versus bots

Post by pnprog »

That's me again!

I was thinking about something that maybe would work, but would be a lot of work to implement:

Basically, it would consist in training a set of policy networks, each one corresponding to a specific level of play (3k, 2k, 1k, 1d, 2d, 3d...).

<Edit> to be clear, I am not proposing to train a bot, only a policy network. Not something that can play Go, no play-out, no tree search, no MC rolls, no value network...</Edit>

A policy network, as I understand it, was developed by Deepmind for there first version of AlphaGo by showing it games of strong amateurs players they downloaded from the internet. This policy network was used to indicate, for a specific game position, what moves a strong amateur would play. This was used to reduce the number of moves AlphaGo had to evaluate (evaluation being done with value network and montecarlo rolls). Later they used AlphaGo VS AlphaGo games to improve better their policy network.

So, we could try to train one policy network using ~2k players' games, then another one using ~1k players' games, then another one using ~1d players' games, and so on.

Note that we don't really care what level at policy network is labelled (1k, or 3d), we only need them to be in croissant order, and ideally at a regular distance in strength. We could classify them using ELO or simply A, B, C...

With such a set of policy networks, we could evaluate how the moves of one player in his game correlate with each of our policy networks, and draw a chart. One could expect this chart to peak at the policy network closest to this player level.

Then, by comparing those charts for different games, we could then tell that for a particular game, that player did not played at his usual level.

The difficult part would be to gather enough games for training, games from players with stable level, and have those games classified by level...

One way to do that could be to work with Go severs, more specifically with the players they use as anchors
Now, they won't probably want to disclose publicly what players are used as anchors, but maybe this could be done under a non disclosure agreement. Or maybe they could disclose this information when the anchor is removed. Then we can download his games from the period he was selected as an anchor.
Or maybe we could collaborate with Go server to get statistics on what player have a very strong rating confidence.

Once we get enough games to train our policy networks, it also open all sort of possibility regarding the rating of players or their games (like, one could finally get to know the equivalence of ranks among Go servers).
Last edited by pnprog on Mon Jun 18, 2018 4:55 am, edited 2 times in total.
I am the author of GoReviewPartner, a small software aimed at assisting reviewing a game of Go. Give it a try!
moha
Lives in gote
Posts: 311
Joined: Wed May 31, 2017 6:49 am
Rank: 2d
GD Posts: 0
Been thanked: 45 times

Re: Measuring player mistakes versus bots

Post by moha »

I see two problems with training an 1k network, for example. First, to get 1k level of play you should disable search (otherwise you get much higher levels, someone did this with 1d games and the results were comparable to full bot strength - the policy only used for pruning the search, and good search with 1d pruning is VERY strong). OTOH a no-search policy net will have specific NN related oversights, atypical and different to a human 1k.

Second, even if you get an artificial 1k player, comparing to it doesn't seem much better than comparing to other humans of similar strength. And even two 1k-s can have quite different playstyles and error distributions.

The stronger approach seems to be to compare to a "perfect" player, collect detailed error statistics (exact size of the errors in points dropped in various phases of the game), and then compare those DISTRIBUTIONS to known reference distributions. But even with this approach one should start by studying typical human error distributions, and see how similar or different two humans can be. Those errors may be quite dependent on playing style, for example.

But if you only want to have NN aid in detecting cheaters, you could train a net specifically for this. Showing it a lot of bot games, and a lot of human games of different strength (maybe even human+bot games), you have a direct training target if you ask whether the player was human (maybe subdivided for different strength levels). But since a cheater may not use the bot for all moves (only blunder checking), such direct approaches doesn't seem viable.

Detailed study of error statistics seems to be the only promising way - whatever a player does will have SOME mark on his distribution.
User avatar
pnprog
Lives with ko
Posts: 286
Joined: Thu Oct 20, 2016 7:21 am
Rank: OGS 7 kyu
GD Posts: 0
Has thanked: 94 times
Been thanked: 153 times

Re: Measuring player mistakes versus bots

Post by pnprog »

moha wrote:... Second, even if you get an artificial 1k player ...
No no no, you got me wrong!

I am not proposing to train a bot, I am just proposing to train a policy network :)
I am the author of GoReviewPartner, a small software aimed at assisting reviewing a game of Go. Give it a try!
moha
Lives in gote
Posts: 311
Joined: Wed May 31, 2017 6:49 am
Rank: 2d
GD Posts: 0
Been thanked: 45 times

Re: Measuring player mistakes versus bots

Post by moha »

pnprog wrote:I am not proposing to train a bot, I am just proposing to train a policy network :)
Ok, but then:
moha wrote:a no-search policy net will have specific NN related oversights, atypical and different to a human 1k.
There are some things a raw net often gets wrong, because of the lack of tactical understanding that is inevitable with no search (and because of the fuzzy, approximative nature of NNs). These can be quite different from human mistakes.

EDIT: Back to the original suggestion, even assuming these policies form worthwhile comparison points. Suppose you find a game where the player played better than usual (correlation peak above). This would correspond to having his error distribution shifted/scaled a bit. How do you judge if he were lucky, had a good day or cheated, without a closer look at the details of his distribution?
User avatar
jlt
Gosei
Posts: 1786
Joined: Wed Dec 14, 2016 3:59 am
GD Posts: 0
Has thanked: 185 times
Been thanked: 495 times

Re: Measuring player mistakes versus bots

Post by jlt »

I think it's very hard to detect difficulty of a move using a neural net. The level of problems on the website https://neuralnetgoproblems.com/ is far from accurate, some 1d problems are quite easy (common joseki moves for instance), while some 10k problems look much harder than 10k. In addition, the strength of a player depends on
  • knowledge
  • reading.
Knowledge corresponds roughly to the neural network, and reading to simulations. Some players don't have a lot of knowledge but are good at reading, and conversely. Also, you can be (relatively) strong because you make many good moves but regular blunders, or because you make mostly small mistakes.

Maybe the following approach could work:
  • Choose a database of at least a few hundred games.
  • Choose a strong bot, like a recent version of LeelaZero.
  • Say that a position is "relevant" when the game is between moves 30 and 150, LeelaZero evaluates the winrate between 30% and 80%, and the move it suggests is different from the move suggested by GnuGo.
  • define the "winrate loss" of a human move the difference between the winrate before the move, and the winrate after the move. It can be a negative number when the human finds a better move than LeelaZero.
  • Using the database, determine the parameters a and b such that exactly 10% of moves made by 1d players at relevant positions have a winrate loss less than a, and 10% of moves made by 1d players at relevant positions have a winrate loss more than b.
  • Define a "good move" as a move, made at a relevant position, with winrate loss less than a.
  • Define a "bad move" as a move, made at a relevant position, with winrate loss more than b.
  • By definition, a "good move" is a move that would be found by less than 10% of 1d players, and a "bad move" is a mistake that less than 10% of 1d players would make.
  • Using the database, given a grade g (g = ...2k, 1k, 1d, 2d,...) define ag as the percentage of good and bg as the percentage of bad moves made by players of grade g. The point Mg=(ag,bg) in the plane represents the average play of grade g players.
  • We will say that a person played at level g during a game if the proportion of good and bad moves he made during that game is closest to the point Mg.
  • Then one can check using the database how often a 6d player plays at level 4d or conversely.
Of course I have no idea if the above approach works at all. The reference to 1d is arbitrary, as well as the proportions 10%. The approach can also be refined by classifying moves as "very good", "good", "average", "bad", "blunder". The notion of "relevant position" is also arbitrary and could be refined, but as given above it is easy to check on a computer.
Post Reply