I'm skeptical that similarity (or lack of similarity) with GnuGo will provide useful information in regard to the analysis.Uberdude wrote:I'm also thinking that we should also analyse the games with GnuGo, and any move which GnuGo agrees with the human and the strong bot be discarded from the analysis as an obvious move with little information. This should help mitigate the "this was a simple game with many obvious forced moves so will be more similar to the bot" problem.
Measuring player mistakes versus bots
-
Kirby
- Honinbo
- Posts: 9553
- Joined: Wed Feb 24, 2010 6:04 pm
- GD Posts: 0
- KGS: Kirby
- Tygem: 커비라고해
- Has thanked: 1583 times
- Been thanked: 1707 times
Re: Measuring player mistakes versus bots
be immersed
-
moha
- Lives in gote
- Posts: 311
- Joined: Wed May 31, 2017 6:49 am
- Rank: 2d
- GD Posts: 0
- Been thanked: 45 times
Re: Measuring player mistakes versus bots
This assumption also seems false. The winrate approximates the probability of the given bot winning against itself starting from the position. This is how it was trained, but this can be significantly different from human winrate due to different playstyles. In fact, a drop of 2% (bot) winrate may even be an 1% (human) winrate gain.dfan wrote:To be clearer, what I meant with that phrase was "if you assume that the win rate accurately represents the probability that a human would win a game against another human of equal ability starting from the position in question".
This is another reason to go for expected scores instead of winrates, although it is also possible to train a net specifically for predicting the human winrate (maybe with a strength parameter).
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: Measuring player mistakes versus bots
Are you sure about that? In that case it would be easy to produce margin of error statistics, which, IIUC, are not given. Another reason I suspect that the winrates were not calculated that way is that doing so would take a lot of time, and would not be necessary to improve the ability of the bot. Another reason is that moves are chosen based upon number of visits, not winrates, or not only on winrates.moha wrote:This assumption also seems false. The winrate approximates the probability of the given bot winning against itself starting from the position. This is how it was trained,dfan wrote:To be clearer, what I meant with that phrase was "if you assume that the win rate accurately represents the probability that a human would win a game against another human of equal ability starting from the position in question".
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
Tryss
- Lives in gote
- Posts: 502
- Joined: Tue May 24, 2011 1:07 pm
- Rank: KGS 2k
- GD Posts: 100
- KGS: Tryss
- Has thanked: 1 time
- Been thanked: 153 times
Re: Measuring player mistakes versus bots
No, it's not easy, because the given winrate is mostly based on winrate given by the evaluation of the network. And there is no easy way to get the margin of error of these numbers.Bill Spight wrote:Are you sure about that? In that case it would be easy to produce margin of error statistics, which, IIUC, are not given.moha wrote:This assumption also seems false. The winrate approximates the probability of the given bot winning against itself starting from the position. This is how it was trained,dfan wrote:To be clearer, what I meant with that phrase was "if you assume that the win rate accurately represents the probability that a human would win a game against another human of equal ability starting from the position in question".
-
moha
- Lives in gote
- Posts: 311
- Joined: Wed May 31, 2017 6:49 am
- Rank: 2d
- GD Posts: 0
- Been thanked: 45 times
Re: Measuring player mistakes versus bots
Consider the training method: from zillions of positions taken from zillions of selfplay games the value head is trained with a loss function that is the difference of its current output and the actual outcome (1/-1). I'm not sure about error statistics, I agree those could be produced, maybe nobody was interested enough to collect them?Bill Spight wrote:Are you sure about that? In that case it would be easy to produce margin of error statistics, which, IIUC, are not given.moha wrote:The winrate approximates the probability of the given bot winning against itself starting from the position. This is how it was trained,
It would not be that easy though, since it would need a different test game set (the loss IS decreasing/disappearing on the training set oc, but that doesn't necessarily mean better predictions on a different set as the danger of overfitting is higher for the value head than the policy).
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: Measuring player mistakes versus bots
Bill Spight wrote:Are you sure about that? In that case it would be easy to produce margin of error statistics, which, IIUC, are not given.moha wrote:This assumption also seems false. The winrate approximates the probability of the given bot winning against itself starting from the position. This is how it was trained,dfan wrote:To be clearer, what I meant with that phrase was "if you assume that the win rate accurately represents the probability that a human would win a game against another human of equal ability starting from the position in question".
That's my point.Tryss wrote:No, it's not easy, because the given winrate is mostly based on winrate given by the evaluation of the network. And there is no easy way to get the margin of error of these numbers.
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: Measuring player mistakes versus bots
Isn't that a form of reinforcement learning? You don't need accurate winrates for that to work.moha wrote:Consider the training method: from zillions of positions taken from zillions of selfplay games the value head is trained with a loss function that is the difference of its current output and the actual outcome (1/-1).Bill Spight wrote:Are you sure about that? In that case it would be easy to produce margin of error statistics, which, IIUC, are not given.moha wrote:The winrate approximates the probability of the given bot winning against itself starting from the position. This is how it was trained,
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
dfan
- Gosei
- Posts: 1598
- Joined: Wed Apr 21, 2010 8:49 am
- Rank: AGA 2k Fox 3d
- GD Posts: 61
- KGS: dfan
- Has thanked: 891 times
- Been thanked: 534 times
- Contact:
Re: Measuring player mistakes versus bots
OK. This is all incidental to the actual point I was trying to make anyway, which has now gotten lost in the noise, so I'm just going to drop it.moha wrote:This assumption also seems false. The winrate approximates the probability of the given bot winning against itself starting from the position. This is how it was trained, but this can be significantly different from human winrate due to different playstyles. In fact, a drop of 2% (bot) winrate may even be an 1% (human) winrate gain.dfan wrote:To be clearer, what I meant with that phrase was "if you assume that the win rate accurately represents the probability that a human would win a game against another human of equal ability starting from the position in question".
-
moha
- Lives in gote
- Posts: 311
- Joined: Wed May 31, 2017 6:49 am
- Rank: 2d
- GD Posts: 0
- Been thanked: 45 times
Re: Measuring player mistakes versus bots
It's closer to supervised than to "real" reinforcement learning (the selfplay cycle makes it a bit different, net->selfplay->newnet). And the winrates will be pretty "accurate" in a sense, since the network is trained until the loss diminishes, at that point it will output reasonable values - in the positions it was trained on. Hence the need for a different test set if you are interested in its real accuracy.Bill Spight wrote:Isn't that a form of reinforcement learning? You don't need accurate winrates for that to work.moha wrote:Consider the training method: from zillions of positions taken from zillions of selfplay games the value head is trained with a loss function that is the difference of its current output and the actual outcome (1/-1).
Or one could actually run hundreds of selfplays from hundreds of chosen test positions. To go back to dfan's original assumption: you could also do the same with human games starting from chosen test positions and collect the accuracy statistics.
Edit: I somehow missed your comment about move selection / number of visits. What I wrote is the value net only, when strengthened with search it will most often use an average of the value evaluations at leafs starting with the move candidate. And selecting on number of visits will converge to selection on avg value, since the higher valued candidates will get more future visits (either reducing the avg if refutation is found, or increasing the visit counts).
It's true this would work even with inaccurate values/winrates, provided at least their ordering is reasonably good. But the above sampling tests still seem possible. And btw, if the nets would be much faster then policy net based rollouts (almost real winrates) would be used for the evaluation.
-
Bill Spight
- Honinbo
- Posts: 10905
- Joined: Wed Apr 21, 2010 1:24 pm
- Has thanked: 3651 times
- Been thanked: 3373 times
Re: Measuring player mistakes versus bots
Anyway, we can test the winrates by bot vs. bot self play ourselves. 
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
At some point, doesn't thinking have to go on?
— Winona Adkins
Visualize whirled peas.
Everything with love. Stay safe.
-
Bojanic
- Lives with ko
- Posts: 142
- Joined: Fri May 06, 2011 1:35 pm
- Rank: 5 dan
- GD Posts: 0
- Has thanked: 27 times
- Been thanked: 89 times
Re: Measuring player mistakes versus bots
Go Review Partner can analyze entire game, using selection of bots.
After analysis, it can produce histogram which shows deviations from bot's play.
It is not direct proof of similarities. Of course josekis would be similar, opening and even close fighting.
But if player has a long game similar to Leela, that is cause for further examinations.
Here is histogram of one game between european pros.
Red bars are deviation's from Leela's move (it considers them bad), and green are better moves.
After analysis, it can produce histogram which shows deviations from bot's play.
It is not direct proof of similarities. Of course josekis would be similar, opening and even close fighting.
But if player has a long game similar to Leela, that is cause for further examinations.
Here is histogram of one game between european pros.
Red bars are deviation's from Leela's move (it considers them bad), and green are better moves.
- Attachments
-
- QIQJWEPNSE.png (22.23 KiB) Viewed 9593 times
-
Uberdude
- Judan
- Posts: 6727
- Joined: Thu Nov 24, 2011 11:35 am
- Rank: UK 4 dan
- GD Posts: 0
- KGS: Uberdude 4d
- OGS: Uberdude 7d
- Location: Cambridge, UK
- Has thanked: 436 times
- Been thanked: 3718 times
Re: Measuring player mistakes versus bots
It would be interesting to compare the same game with a LeelaZero analysis: when I was reviewing one of Ilya Shikshin's games with Leela 0.11 it often didn't like or expect his moves, as a 4d I thought sometimes it was right they were bad, but sometimes I think his moves were actually better (and indeed sometimes Leela would then like them when shown them, a point pnprog recently explained). As LZ is more strongly opinionated I would expect more red overall, but maybe some of those bars would be relatively smaller. Of course sometimes even the Euro pros do just play pretty badly
.
-
moha
- Lives in gote
- Posts: 311
- Joined: Wed May 31, 2017 6:49 am
- Rank: 2d
- GD Posts: 0
- Been thanked: 45 times
Re: Measuring player mistakes versus bots
This is what I was suggesting. And for their accuracy in human games you may not even need the mentioned hundreds of special games from chosen positions: just take a large human database, get bot prediction (both raw net and search result) in a chosen sample of positions, then calculate the overall correlation to outcomes. You may even do this separately for opening-middlegame-endgame positions (or for various winrate ranges).Bill Spight wrote:Anyway, we can test the winrates by bot vs. bot self play ourselves.
My first thought was taking a game between two different bots (like an LZ vs. Golaxy game from earlier) and analyzing it with a third bot (Leela?).Uberdude wrote:It would be interesting to compare the same game with a LeelaZero analysis
- pnprog
- Lives with ko
- Posts: 286
- Joined: Thu Oct 20, 2016 7:21 am
- Rank: OGS 7 kyu
- GD Posts: 0
- Has thanked: 94 times
- Been thanked: 153 times
Re: Measuring player mistakes versus bots
Hi!
So inside, there are:
slowparameters is for the CPU analysis, and fastparameters for the GPU analysis.
If you want to perform the analysis only on a subset of moves, you can modify the batch_analysis_CPU/GPU to modify the GRP command line by adding the --range parameter. For example: In the above example, %pythonexe% leela_analysis.py --profil=slow --range="30-1000" "games_to_be_analysed/%%~nf.sgf" will make Leela skip the analysis of moves before 30 and after 1000, so the opening won't be analysed.
At the moment, the main drawback is that it requires python 2.7 to be installed on the computer. For Mac users, I think the Linux version can be used, but the Leela executables need to be replaced by MacOs executables, and the names of the executables has to be updated in the config.ini
Please have a try and let me know if it works, or can be improved.
Edit: in that "kit", I also set GRP to save up to 361 variations. This way, one can be sure none informations is discarded. The --nobook parameter prevents Leela to use her joseki dictionary to play the opening, so she is forced to think about all moves, including during the opening. I deliver all this together in a zip to help making this analysis repeatable: I more people want to help analysing big volume of data by sharing their computer power, it's easy to just distribute this zip file so everybody in analysing is conditions as similar as possible to everybody else.
So I prepared an "analysis kit" pretty similar to what I prepared for Ales already:Uberdude wrote:This info is basically the raw data behind the win rate delta graph, so if you could somehow dump out the data for the whole game as text/file somewhere that'd be super useful, e.g. a CSV (I added a few bonus columns) likeMove number,Colour,Bot move,Bot winrate,Game move,Game winrate,Bot choice,Policy prob
20,W,h17,54.23,j17,53.5,2,5.12
21,B,h18,46.5,h18,46.5,1,45.32
- http://yuntingdian.com/goreviewpartner/ ... indows.zip
- http://yuntingdian.com/goreviewpartner/ ... _linux.zip
So inside, there are:
- A python file rsgf2csv.py that is used to extract the data from Leela's RSGF files into CSV file. If you run it directly, it will have you select a RSGF file on your computer, and then create the CSV. For example: mygame.rsgf => mygame.rsgf.csv
- A minimalist version of GRP, that can only be used to perform analysis with Leela. It has been configured to use Leela with those parameters: Leela0110GTP.exe --gtp --noponder --playouts 150000 --nobook and a thinking time of 1000secondes per moves. In fact, Leela does not follow the --playouts very respectfully, and tends to give much more playouts when she is not sure. But at least 150000 playouts seems to be her minimum limit in that case.
- An empty folder games_to_be_analysed where you can place the SGF files you want to analyse.
- Two batch files (bash scripts for Linux) that can be run to perform the batch analysis of all SGF files in games_to_be_analysed folder. So one for Leela CPU (batch_analysis_CPU), and one for Leela GPU (batch_analysis_GPU). For windows, the batch file has first to detect where python is located on the computer to run the analysis. It's working on my Windows computer, but I am not so confident it would work on others windows computer, let me know.
Code: Select all
[Leela]
slowcommand = Leela0110GTP.exe
slowparameters = --gtp --noponder --playouts 150000 --nobook
slowtimepermove = 1000
fastcommand = Leela0110GTP_OpenCL.exe
fastparameters = --gtp --noponder --playouts 150000 --nobook
fasttimepermove = 1000
If you want to perform the analysis only on a subset of moves, you can modify the batch_analysis_CPU/GPU to modify the GRP command line by adding the --range parameter. For example:
Code: Select all
for /f "delims=" %%i in ('Assoc .py') do set filetype=%%i
set filetype=%filetype:~4%
echo filetype for .py files: %filetype%
for /f "delims=" %%i in ('Ftype %filetype%') do set pythonexe=%%i
set pythonexe=%pythonexe:~12,-7%
echo path to python interpreter: %pythonexe%
for %%f in (games_to_be_analysed/*.sgf) do (
%pythonexe% leela_analysis.py --profil=slow --range="30-1000" "games_to_be_analysed/%%~nf.sgf"
)
for %%f in (games_to_be_analysed/*.rsgf) do (
%pythonexe% rsgf2csv.py "games_to_be_analysed/%%~nf.rsgf"
)
echo ==================
echo Analysis completed
pauseAt the moment, the main drawback is that it requires python 2.7 to be installed on the computer. For Mac users, I think the Linux version can be used, but the Leela executables need to be replaced by MacOs executables, and the names of the executables has to be updated in the config.ini
Please have a try and let me know if it works, or can be improved.
Edit: in that "kit", I also set GRP to save up to 361 variations. This way, one can be sure none informations is discarded. The --nobook parameter prevents Leela to use her joseki dictionary to play the opening, so she is forced to think about all moves, including during the opening. I deliver all this together in a zip to help making this analysis repeatable: I more people want to help analysing big volume of data by sharing their computer power, it's easy to just distribute this zip file so everybody in analysing is conditions as similar as possible to everybody else.
I am the author of GoReviewPartner, a small software aimed at assisting reviewing a game of Go. Give it a try!
- pnprog
- Lives with ko
- Posts: 286
- Joined: Thu Oct 20, 2016 7:21 am
- Rank: OGS 7 kyu
- GD Posts: 0
- Has thanked: 94 times
- Been thanked: 153 times
Re: Measuring player mistakes versus bots
This also can be performed with GRP, because Gnugo has a command to produce the 10 preferred moves (maybe one could modify the source code of Gnugo to get more moves). And that is what GRP does when using Gnugo to perform an analysis.Uberdude wrote:I'm also thinking that we should also analyse the games with GnuGo, and any move which GnuGo agrees with the human and the strong bot be discarded from the analysis as an obvious move with little information. This should help mitigate the "this was a simple game with many obvious forced moves so will be more similar to the bot" problem.
I made a quick proof of concept using the controversial game from PGETC. I enclose the CSV file. The column Bot choice indicates the rank of the game move among Gnugo preferred moves. So a rank of 1 means that GnuGo would have played the same move. When the rank indicates ">10" it means this move in not part of Gnugo best 10 moves.
I calculated the average rank for both players (using rank=11 when rank>10) and they are both between 6 and 7 in average.
23/83 moves by black correspond to Gnugo first move.
14/82 moves by white correspond to Gnugo first move.
Both players have played exactly 48 moves inside Gnugo top 10 moves.
I am the author of GoReviewPartner, a small software aimed at assisting reviewing a game of Go. Give it a try!