I think close comparisons with chess are unlikely to be that useful here: there are too many differences (e.g., in chess the average game has much fewer moves, which shorter length is often truncated even more by deep opening theory; there are generally fewer reasonable candidate moves; there are generally more forced moves, etc.). A 98 percent match rate in chess might only allow one nonmatch out of an entire game; it might happen occasionally in games between top chess pros but I suspect it would be rare.Uberdude wrote:When you talk of agreement with a chess engine do you mean playing the top choice of the engine, playing one of the top 3 choices of the engine (as in this case), or some more complex comparison. I am concerned that by choosing the broader top 3 metric a headline figure of 98% can be quoted (without telling people the typical distribution of non-cheaters) which suggests more guilt to the casual reader than warranted. When comparing to Leela's top choice I got 72% agreement*.Bartleby wrote: although in chess a 98 percent agreement between an engine and a player would be considered very strong evidence (even the best chess players in the world who train with computers all the time don't score nearly as high).
* It may be even lower if you allow Leela to analyse more deeply. Leela starts off analysing moves suggested by its policy network, which has been trained on strong human games, so has a very human-like style (unlike -Zero bots). As it analyses more it may come to prefer moves which the policy network didn't like; AlphaGo's move 37 5th line shoulder hit being a famous example. To give an example from this game : in my first analysis for move 51 l17 was Leela's #1 choice and this is what Carlo played. It is also quite likely what I would play. But if I let Leela analyse for longer l17 becomes the #2 choice and e11 becomes #1. This may well happen with other moves too, so a deeper Leela analysis could see this similarity metric of 72% drop even further.
P.S Another data point: a 1 kyu on reddit got 64% similarity to Leela's top 3 over moves 1-150 of his correspondence game; would be lower over same 50-150 interval as more similarity in opening. Chart: https://i.imgur.com/jMM4EIM.png. Unsurprising that a 1k isn't as good / similar to Leela at middle game than mid dans.
I don't have any firm opinion whether there was any cheating in this particular go game. But surely it is evident that a 64 percent match rate has 18 times as many nonmatches as a 98 percent match rate. That's a rather substantial difference.
Without looking at the game or knowing anything about the players (other than your comment that they are mid-dans), my gut reaction would be that a match rate
of 98 percent top three moves is extremely high. The Go board is too big, mid-dans are quite good amateurs but will play many suboptimal moves every game, and there is no obvious reason why their suboptimal moves should match or even be members of the same small set 98 percent of the time. There are probably also many moves per game on which there are multiple optimal moves (or so nearly optimal as to make no difference to a human player); there is no obvious reason these moves should have a near perfect match rate either.
I am a similar level at chess as a mid-dan or even a bit higher, and if you analyzed all of the games I have played in my life with a top engine, I would be surprised if more than a few of them matched the engines top three moves 98 percent of the time, and would not surprised if none of them had such a high match rate. And rough logic suggests to me that the match rate should (a) be lower in Go than in chess; and (b) be lower with a weaker engine than a stronger one.
The above is not a serious analysis, just my gut reaction. But yes, a 98 percent top three match rate seems very, very high to me.