“Decision: case of using computer assistance in League A”

Javaness2 · Post by **Javaness2** » Thu Jun 28, 2018 3:01 am

jlt wrote: Personally I am not claiming that anyone is right or wrong, I am just waiting for some strong players to play the "4d vs 6d" game. If Bojanic would like to play the game, then there are three possible outcomes:

He guesses right most of the time, and never confuses a 4d with a 6d or vice-versa. This would be a strong argument in favor of the validity of his method of analysis.

He confuses a 6d with a 4d but never a 4d with a 6d. The test is not conclusive.

He confuses a 4d with a 6d. Then, either Bojanic's method is not accurate, or cheating has already occurred in the past as substitution of players (so maybe cheating in PGETC is much more widespread than we previously thought).

Whatever the outcome, I would find the conclusion interesting.

While the test is interesting, I don't think it looks at the right thing.
More correct would be to get somebody like Uberdude to play 5 games with an opponent. In 2 games he will cheat by using Leela. Can you find out in which games he did that. 5/5 score must be obtained.

AlesCieply · Post by **AlesCieply** » Thu Jun 28, 2018 3:09 am

jlt wrote:It would be nice if Bojanic could play the "4 dan or 6 dan ?" game. I wouldn't expect 100% accuracy, but it would be interesting to check if at least he never confuses a 4 dan with a 6 dan or vice-versa.

Bojanic's analysis is about establishing that Carlo Metta plays quite differently on internet and in his regular games. For the "significant moves" (chosen not to be part of forced sequences) Carlo is much more likely to play in agreement with Leela in his PGETC games than in his regular games. That's what Bojanic's analysis tells us. It has nothing to do with establishing whether he plays as 4d or 6d, though it is understood that Leela is quite a bit stronger than 4d.

AlesCieply · Post by **AlesCieply** » Thu Jun 28, 2018 3:13 am

Javaness2 wrote: While the test is interesting, I don't think it looks at the right thing.
More correct would be to get somebody like Uberdude to play 5 games with an opponent. In 2 games he will cheat by using Leela. Can you find out in which games he did that. 5/5 score must be obtained.

Exactly!

Uberdude · Post by **Uberdude** » Thu Jun 28, 2018 3:30 am

jlt wrote: I am not talking about kyu players here. Bojanic (5d) as well as other strong players say that it is easy to see from a game if a player is 4d or 6d. In addition, Bojanic can use computer tools to make more accurate analyses. Other people like Uberdude (4d) and Robert Jasiek (5d) think that it is not possible to judge from a single game or from a small number of games.

Actually, I think strong players can do a fairly good job in judging the strength of similar players from their moves (and I was actually impressed how well my wife did on the test, with caveats noted there), but indeed suspect they may not be quite as good as they think, particularly in the case of a 4d playing well or a 6d playing badly (note for my test I didn't look at the games before choosing them so have no idea if any of the 4ds played well or 6ds badly), and small samples are always problematic. I like evidence, and will happily update my views on its production. Lukan (7d) said he'd give it a go, I hope he does.

Gobang wrote:(I had the nerve a critical comment about it and was slammed for my "negativity").

As the someone who "slammed" Gobang, I'd call that more of a counter-throw than a slam of my initiation as I said "I disagree, though I think your constant repetition of negativity is a waste of time." in response to "So all this is is a waste of time, just like 99% of the babble around the topic of detecting online cheats.". That was said out of frustration of his repeated calls to throw our hands in the air, give up on detecting or preventing cheating (at least attempting in an imperfect way) and just cancel the entire PGETC league (plus demeaning the participants as "kids"). This is despite admitting he is new to serious/tournament Go and the accounts from various people actually involved of how much they value the league (e.g. dsatkas in Greece, quantumf in SA, Simba and me in UK).

Gobang wrote:It is also questionable to construct this test with online games where there is no way of verifying who was in fact playing.

As for the possibility of the actual player not being the named one, yes it is non-zero and in the ~4500 games played in the history of the league I think it likely some may be so, but the chance it happened in the 14 cases I picked pretty slim, more so because they are from higher leagues so who exactly would the replacement be? Most of the top players of the countries involved participate so either it's: 1) another player on the team, who isn't playing at the same time and they all collude to keep the cheating secret, 2) some secret strong player from their country not in the league or unknown to the Go community, are there many of these? 3) some pro or strong player in Asia?. Also by being from the same event they have the same time controls, seriousness etc (and were easy for me to obtain). If someone else could collect game records from e.g. the EGC and we judge those too it would be interesting, maybe a 4d in 1 hour PGETC game online is generally weaker / plays worse / judged lower than a 4d at the 2.5 hour EGC? How about at the WAGC? Or faster KPMC? Or your average 1 hour game from a 3-a-day McMahon (last game worse from tiredness perhaps?).

Java's proposed test is also an interesting one, I hope someone conducts it (but I'm rather busy atm, and won't be able to work on my extension of Ales's mistake analysis for some time; my hope is that will give typical profiles for 4 and 6 dans, be better at identifying them than humans, and by knowing their variance we can answer questions like how unlikely is it a 4d plays as well as a 6d, or did Dragos play particularly poorly in that game vs Carlo particularly well). A lot of consideration in this thread has been on false positives, false negatives are also important to test for (but perhaps less so if we consider punishing an innocent worse than not punishing a guilty).

Edit: As Ales said, the 4 or 6 dan test isn't really relevant to Bojanic's analysis, it was prompted by the "I'm a strong player and I looked at the game and there's no way that's a 4 dan (even on a good day)" type arguments (which has a side premise of "and we don't think Carlo is really a 6 dan based on the WAGC").

Edit 2:

Gobang wrote: For this 6d or 4d test to make any sense, then it should be done in the context that it was created. A 6d player played an entire serious game with someone who is allegedly 4d, (but most probably just acting a bot for Leela). The 6d player said that it felt nothing like playing against a 4d.

Then someone decided to construct a "test", apparently for the purpose of showing that this 6d may not be a reliable judge of whether his opponent was 4d or stronger. My perception is that someone, with the intention of calling the 6d player's judgement into doubt created this "test".

My prompt wasn't just the Simba vs Carlo game, but also using the views of strong players looking at other past games of Carlo to decide whether he cheated instead of statistical Leela similarity approaches (e.g. suggested by Lukan). Gobang makes the distinction between the actual person playing the game (Simba most recently, we've not heard from Reem\Dragos\Kulkov etc) vs an observer and that they will be better at detecting the opponent's strength or if they cheated. I agree playing is different to watching, but it's not clear to me the player is a better judge: in Javaness's test we could ask the opponent of the sometimes-cheater as well as observers to identify the cheating games. But also if we are trying to distinguish "4d cheating with Leela beating 6d" and "4d not cheating getting that expected 1 in 10 win against a 6d" we'd need lots more than 5 games. Also I would like to cheekily point out the EGF rank of the 6d mentioned is 3d (though I believe he is at least 5d).

Edit 3:

Gobang wrote:Getting kyu players to decide if a 6d or 4d was playing, just by looking at the games is obviously absurd.

I'd say fun but irrelevant:

Uberdude wrote:What threshold should we take as demonstrating the truth of "It easy to tell the difference between a 4 dan and 6 dan"? (for sufficiently strong players, weaker players might like to play this game for fun/interest but them being bad at it doesn't show 6 or 7ds couldn't be good at it, I hope some strong players participate).

dfan · Post by **dfan** » Thu Jun 28, 2018 5:18 am

Javaness2 wrote:More correct would be to get somebody like Uberdude to play 5 games with an opponent. In 2 games he will cheat by using Leela. Can you find out in which games he did that. 5/5 score must be obtained.

This would tell us something if the subject failed, but I'm not sure if it would tell us anything if the subject succeeded (since one explanation would just be "Uberdude is bad at cheating").

By the way, chess grandmasters have made "my opponent played too strongly for his rating, therefore he was cheating" accusations that have been determined to be incorrect before. Of course, chess is not go.

Javaness2 · Post by **Javaness2** » Thu Jun 28, 2018 6:18 am

Yes, I guess we already discussed somewhere here that we have this problem of people genuinely believing somebody has cheated when they haven't cheated. It happens.

In this case, 5 games would probably be a small sample. I suggested it only as a starting point. Probably a 6 player round robin with 1 player assigned as cheater each round would be more interesting. Who is going to have the time for that though? Nobody...

Bill Spight · Post by **Bill Spight** » Thu Jun 28, 2018 6:45 am

Uberdude wrote:
Gobang wrote: For this 6d or 4d test to make any sense, then it should be done in the context that it was created. A 6d player played an entire serious game with someone who is allegedly 4d, (but most probably just acting a bot for Leela). The 6d player said that it felt nothing like playing against a 4d.

Then someone decided to construct a "test", apparently for the purpose of showing that this 6d may not be a reliable judge of whether his opponent was 4d or stronger. My perception is that someone, with the intention of calling the 6d player's judgement into doubt created this "test".
My prompt wasn't just the Simba vs Carlo game, but also using the views of strong players looking at other past games of Carlo to decide whether he cheated instead of statistical Leela similarity approaches (e.g. suggested by Lukan). Gobang makes the distinction between the actual person playing the game (Simba most recently, we've not heard from Reem\Dragos\Kulkov etc) vs an observer and that they will be better at detecting the opponent's strength or if they cheated. I agree playing is different to watching, but it's not clear to me the player is a better judge: in Javaness's test we could ask the opponent of the sometimes-cheater as well as observers to identify the cheating games. But also if we are trying to distinguish "4d cheating with Leela beating 6d" and "4d not cheating getting that expected 1 in 10 win against a 6d" we'd need lots more than 5 games. Also I would like to cheekily point out the EGF rank of the 6d mentioned is 3d (though I believe he is at least 5d).

I am not very good at judging the strength of another player, either from observing them or playing them. That in part has to do with my psychology, in two ways. First, I don't really care. Second, I tend to raise the level of my game to meet a challenge.

That said, I do think that it is easier to tell any difference while playing a game. Perhaps it has to do with really getting into the game and understanding the play as well as you can, perhaps it has to do with the time involved. In Uberdude's test I am not going to take an hour or two for each game to make the effort.

But looking back on the few games in which I have felt outgunned, in each game, aside from that feeling, I could point to at most a few plays, usually only one, that gave me that feeling. So just because someone has a feeling that their opponent played much better than expected, I also want to know which plays gave rise to that feeling. (I know that that is not always possible. Sometimes you suddenly realize that you are losing and you don't know why.

)

Bill Spight · Post by **Bill Spight** » Thu Jun 28, 2018 6:48 am

dfan wrote: By the way, chess grandmasters have made "my opponent played too strongly for his rating, therefore he was cheating" accusations that have been determined to be incorrect before. Of course, chess is not go.

I seems to me that at present it is easier to detect cheating at chess than go. Give us ten years, though.

Bill Spight · Post by **Bill Spight** » Thu Jun 28, 2018 6:56 am

dfan wrote:
Javaness2 wrote:More correct would be to get somebody like Uberdude to play 5 games with an opponent. In 2 games he will cheat by using Leela. Can you find out in which games he did that. 5/5 score must be obtained.
This would tell us something if the subject failed, but I'm not sure if it would tell us anything if the subject succeeded (since one explanation would just be "Uberdude is bad at cheating").

If we are going to make progress at detecting cheating, we need to have verified cases of cheating for our research. The only way to get a large number of verified cases is to have games in which people cheat on purpose for the sake of the research. (OC, the other players must be in on the fact that their opponent may be cheating.)

Uberdude · Post by **Uberdude** » Thu Jun 28, 2018 7:40 am

Bill Spight wrote: That said, I do think that it is easier to tell any difference while playing a game. Perhaps it has to do with really getting into the game and understanding the play as well as you can, perhaps it has to do with the time involved. In Uberdude's test I am not going to take an hour or two for each game to make the effort.

That's a good point about the players getting into the game, they have to take responsibility for their moves so when the casual observer says X should have played so-and-so maybe X did consider that but found a refutation for their opponent further down the line the kibitzer didn't. I remember in my British title match with dhu last year Matthew Macfadyen 6d was commentating and said one of us (let's say me) made a mistake and should have played some move. After the game we reviewed his comments and dhu disagreed as he had a stronger reply which meant the proposed better sequence didn't actually work. He had read this in the game. I hadn't read that far but had come to the same conclusion it was not a promising line for me (good or lucky pruning?). If the judges spent as long as the players (2 to 3 hours) on the game would they read equally thoroughly? If I was to do the test I'd probably spend about 15 minutes per game. But on the flip side, non-players can be more objective and emotionally detached. I often make dumb irrational decisions during the heat of the game, particularly in overtime, which on review afterwards I can easily see were silly.

Bojanic · Post by **Bojanic** » Thu Jun 28, 2018 9:47 am

Fenring wrote:Thanks for the anlalysis Bogdan.
But maybe better to follow the same process with others european player to have a comparaison point?

My name is Milos Bojanic, not Bogdan, and I already explained here as well as in paper update: in preliminary analysis I checked all games from A league and qualifications. Some ten games looked suspicious in deviations histogram,
After more detailed analysis, some were fismissed, and two with most similarities are presented here.

Bojanic · Post by **Bojanic** » Thu Jun 28, 2018 9:54 am

jlt wrote:It would be nice if Bojanic could play the "4 dan or 6 dan ?" game. I wouldn't expect 100% accuracy, but it would be interesting to check if at least he never confuses a 4 dan with a 6 dan or vice-versa.

For what purposes, except for derailment of this research?
We are not discussing 4 or 6d diff at all, but diff to program that plays very consistent. And that analysis is not performed by me, but by same program.

bugsti · Post by **bugsti** » Thu Jun 28, 2018 9:58 am

Bojanic wrote: My name is Milos Bojanic, not Bogdan, and I already explained here as well as in paper update: in preliminary analysis I checked all games from A league and qualifications. Some ten games looked suspicious in deviations histogram,
After more detailed analysis, some were fismissed, and two with most similarities are presented here.

How many time did you spend analyzing each move? How many nodes? I think we need at least 200k nodes per move in order to obtain a reliable deviation histogram, and there will be still many sources of error (some good moves are found after that limit). That requires something like 45 days of calculation on a good hardware, or 1 year in a normal pc.

You need also to produce this histogram for any available strong bot, they are like a dozen right now

Bill Spight · Post by **Bill Spight** » Thu Jun 28, 2018 10:48 am

bugsti wrote:
Bojanic wrote: My name is Milos Bojanic, not Bogdan, and I already explained here as well as in paper update: in preliminary analysis I checked all games from A league and qualifications. Some ten games looked suspicious in deviations histogram,
After more detailed analysis, some were fismissed, and two with most similarities are presented here.
How many time did you spend analyzing each move? How many nodes? I think we need at least 200k nodes per move in order to obtain a reliable deviation histogram, and there will be still many sources of error (some good moves are found after that limit). That requires something like 45 days of calculation on a good hardware, or 1 year in a normal pc.

You need also to produce this histogram for any available strong bot, they are like a dozen right now

IMO, our current bots are not good enough. They may play at a superhuman level, despite making blunders, but they were optimized for play, not for evaluation and analysis. (Even though they make use of evaluation. That's not all it takes to play well at the time limits in use. Their evaluation just has to be good enough to play well.

)

Bojanic · Post by **Bojanic** » Thu Jun 28, 2018 12:23 pm

bugsti wrote:How many time did you spend analyzing each move? How many nodes? I think we need at least 200k nodes per move in order to obtain a reliable deviation histogram, and there will be still many sources of error (some good moves are found after that limit). That requires something like 45 days of calculation on a good hardware, or 1 year in a normal pc.

You need also to produce this histogram for any available strong bot, they are like a dozen right now

Ah, you just came up with 12 Herculean tasks of AI go.
Good idea how to stop investigation, too bad it does not work.

Deviations histograms were actually pretty similar for quick analysis, and for 50k and 200k.
Quick analysis is preliminary screen, and it serves only to select games for further analysis. Games with similar deviations were then analyzed in greater details, especially tenuki moves.

No need to analyze all moves in all games in 1m variations in all programs that even did not exist then.

Life In 19x19

“Decision: case of using computer assistance in League A”

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A