“Decision: case of using computer assistance in League A”

Bojanic · Post by **Bojanic** » Wed Jun 13, 2018 4:00 am

maf wrote:I think it's good that you're trying to avoid problems with earlier analysis, but the current execution seems lacking.

First of all, you clearly knew the result before you started. I'm not going to say that disqualifies your work, but if you make little to no effort to show weak points in your analysis, it makes it seem like you were simply following a trail that lead to the desired result. What you want to do is a blind study of many players, and you need to determine your method beforehand.

Two internet games I chose were suspicious because their histogram looked too much like Leela.
I made histograms for all games from this year's A league, btw. Several games stood out, and others are being examined.

maf wrote:In particular, you need to compare what you did with other players - maybe the difference between tournament and online games is visible for others, too, even tho they did not cheat. Or it could be due to the time format, and that more thought is put into offline tournament games than online games (blitz or slow alike!). That possibility would invalidate your result and is easy to do, so it lessens the perceived quality of you work that it was omitted.

Actually, I did most of it, as described above.

maf wrote:Also, if I did not miscount, all games together yield only about 30 or so data points (tenuki moves). That is not a lot, it can suffer from sheer coincidence. If you're only 99% certain (which is a lot from so few data), then that practically ensures that each year, several honest players would be 'convicted' of cheating. You need to have at least 5 or 6 nines. It's not simple.

So you mean that if player uses Leela for 1-2 games, he cannot be punished, because of too little nodes?
And other, you are trying to go back to percentages discussion?
Weaker player can guess some moves of the stronger, but most... No chance.
I would like to see more of the Metta's live games (without any electronics on him, of course).
Let him try to play another game similar to Leela. Make him prove that he can.

Javaness2 · Post by **Javaness2** » Wed Jun 13, 2018 5:30 am

Uberdude wrote: It would be rather amusing if those who have been vehemently calling Carlo a Leela cheater ended up with stronger evidence that they themselves played more like Leela in the PGETC than in the WAGC or other offline events

Surely that's not so amusing. It would only be amusing if they, at the same time, also had unusually spectacular/ good tournament performance performance ratings.

Uberdude · Post by **Uberdude** » Wed Jun 13, 2018 6:52 am

Javaness2 wrote:
Uberdude wrote: It would be rather amusing if those who have been vehemently calling Carlo a Leela cheater ended up with stronger evidence that they themselves played more like Leela in the PGETC than in the WAGC or other offline events
Surely that's not so amusing. It would only be amusing if they, at the same time, also had unusually spectacular/ good tournament performance performance ratings.

It would show that large variations in Leela similarity / mistake profiles or whatever measurement we are using for the same non-cheating person are actually rather common and so cannot be good evidence that they cheated when they did well.

First guy I checked is 2550 GoR, 2017/18 PGETC performance rating 2674, WAGC performance rating 2491. And no I don't suspect he cheated in PGETC at all (he benefited from being on low board so some easy wins, but also some good ones), and there are other real life tournaments with a better performance rating, something Carlo is indeed lacking. But this just shows a ~200 GoR difference in performance rating between 2 tournaments (with the better one online) is not so unusual/suspicious.

Javaness2 · Post by **Javaness2** » Wed Jun 13, 2018 7:19 am

Uberdude wrote:
It would show that large variations in Leela similarity / mistake profiles or whatever measurement we are using for the same non-cheating person are actually rather common and so cannot be good evidence that they cheated when they did well.

First guy I checked is 2550 GoR, 2017/18 PGETC performance rating 2674, WAGC performance rating 2491. And no I don't think he cheated in PGETC at all (he benefited from being on low board so some easy wins, but also some good ones), and there are other real life tournaments with a better performance rating, something Carlo is indeed lacking. But this just shows a ~200 GoR difference in performance rating between 2 tournaments is not so unusual/suspicious.

Fair enough; but how exactly did you calculate BD-G's tpr in each case? Is the number where entry rating essentially equals exit rating? If so are you iterating the calculation over all players?

theoldway · Post by **theoldway** » Wed Jun 13, 2018 7:29 am

Bojanic wrote: In my opinion, thing is simple - if we have a player who is playing online like Leela, and playing much weaker in live games, then he was using Leela for online games.
No need for complicated statistics, or guessing why he plays much better online.

But you involuntarily used the statistics everywhere in your work, with the aggravating circumstance of having taken very few samples chosen a priori among those already more similar to Leela.

Are you familiar with the term cherry-picking?

https://en.wikipedia.org/wiki/Cherry_picking

You took the games most similar to Leela, among these games you took a dozen of moves among those more similar to Leela, and guess what? they are similar to Leela! utterly unexpected

Uberdude · Post by **Uberdude** » Wed Jun 13, 2018 7:33 am

Javaness2 wrote:Is the number where entry rating essentially equals exit rating? If so are you iterating the calculation over all players?

Yes. No. (and just using current ratings not at start/end/middle of event so those could have changed a bit). Plus the Thai 4d he lost to might be a bit stronger than 2400, he also beat the UK's Daniel 4-5d and a Dutch 5d. In fact I think I played him in Thailand a year or two ago when he was 2-3d; I beat him but as he was young and it was clear he had a great desire to win I'm not surprised he's improved.

Javaness2 · Post by **Javaness2** » Wed Jun 13, 2018 7:37 am

Uberdude wrote:
Javaness2 wrote:Is the number where entry rating essentially equals exit rating? If so are you iterating the calculation over all players?
Yes. No. (and just using current ratings not at start/end/middle of event so those could have changed a bit). Plus the Thai 4d he lost to might be a bit stronger than 2400, he also beat the UK's Daniel 4-5d and a Dutch 5d. In fact I think I played him in Thailand a year or two ago when he was 2-3d; I beat him but as he was young and it was clear he had a great desire to win I'm not surprised he's improved.

My immediate concern would be can it be valid if you answer Yes.No. ?
Wouldn't you do better to have a script that would essentially submit, then resubmit(with updated initial ratings) results 3 times?

Bill Spight · Post by **Bill Spight** » Wed Jun 13, 2018 7:39 am

Bojanic wrote:As promissed earlier, here is analysis I made on middle game moves of Carlo Metta in four games - two internet games that preliminary analysis of deviations histogram showed that were very similar to Leela, and two of his live games from WAGC.
Metta analysis Bojanic.pdf
{snip}

In my opinion, thing is simple - if we have a player who is playing online like Leela, and playing much weaker in live games, then he was using Leela for online games.
No need for complicated statistics, or guessing why he plays much better online.

Thank you very much for your hard work and analysis. I will take a look at the PDF file, at least.

But I disagree with your last paragraphs. I won't go into detail, since that's what this whole discussion has been about. But if you are using those criteria, you do need statistics to evaluate the data.

Bojanic · Post by **Bojanic** » Wed Jun 13, 2018 8:06 am

theoldway wrote:But you involuntarily used the statistics everywhere in your work, with the aggravating circumstance of having taken very few samples chosen a priori among those already more similar to Leela.

Are you familiar with the term cherry-picking?

https://en.wikipedia.org/wiki/Cherry_picking

You took the games most similar to Leela, among these games you took a dozen of moves among those more similar to Leela, and guess what? they are similar to Leela! utterly unexpected

?

Since you have not obviously read what I previously wrote and posted, here is entire research procedure again:
- in preliminary research I analyzed all games from A league. Deviations histograms were analyzed, and ones with least deviations from Leela were selected for detailed analysis
- since plays in opening and endgame could be very similar to Leela's play, I decided to analyze middle game only. Those parts of the game also deviate little from the Leela, but can easily show false positive results, since joseki sequences could be learned, engame could be forced or with little variations.
- middle game was analyzed in two aspects: middle game tenukis, and sequence of moves during play
- middle game play in those two games was compared to play in two of the live games (all available recent live games).

Therefore, it is not just handful of moves, it is entire middle game in two online and two live games.

Regarding other live games of Carlo Metta, please note that some of them were played after objection that he used Leela, therefore it would be less likely that he might use Leela again in same manner (entire game), or if at all.

Bojanic · Post by **Bojanic** » Wed Jun 13, 2018 8:15 am

Bill Spight wrote:Thank you very much for your hard work and analysis. I will take a look at the PDF file, at least.

But I disagree with your last paragraphs. I won't go into detail, since that's what this whole discussion has been about. But if you are using those criteria, you do need statistics to evaluate the data.

In that case please take a look at XLS file in the additional data.
In it, you have Metta's moves in 4 games, compared to Leela's suggestions for those moves.
Actually, here is screenshot of it:

: Screen Shot 2018-06-13 at 5.10.00 PM.png (134.29 KiB) Viewed 9806 times

I have also noted which A moves were forced.

theoldway · Post by **theoldway** » Wed Jun 13, 2018 8:31 am

Bojanic wrote:
Therefore, it is not just handful of moves, it is entire middle game in two online and two live games.

Regarding other live games of Carlo Metta, please note that some of them were played after objection that he used Leela, therefore it would be less likely that he might use Leela again in same manner (entire game), or if at all.

Chess players needed years and more than 200,000 games of thousands of players to develop, test and estabilish an anti-cheating protocol. You took a bunch of online games of one player (not randomly, but those you've observed as similar to Leela) and 3 live games played from 7 to 12 months later than the online games.

Are we supposed to take you seriously?

Please take your seat in the Salem witch trial, you deserve it.

Bill Spight · Post by **Bill Spight** » Wed Jun 13, 2018 8:42 am

Bojanic wrote:As promissed earlier, here is analysis I made on middle game moves of Carlo Metta in four games - two internet games that preliminary analysis of deviations histogram showed that were very similar to Leela, and two of his live games from WAGC.
Metta analysis Bojanic.pdf

I confess that I had hoped to see a go analysis of the games, not Leela output histograms. I was glad to see that you had picked out important tenukis. But I was surprised that you only looked at Metta's plays. We really need to compare the variations in his play with the variations in others' plays. Also, output by other programs, such as Zen or LeelaZero/Elf would be good for comparison.

You have discovered an important piece of evidence, Leela's top choice which is a blunder, but which Metta played. To have an amateur player who is playing well make a blunder is not all that unusual, but when that blunder is also the choice of a super strong program, that is unusual. One thing we would like to know is how often Leela makes similar blunders. Here is also where the choices of other strong programs, such as Zen, Golaxy, or Leela/Efi could be helpful to know. If they also choose that blunder, then the fact that Metta did too is not so significant. It would be a blunder that is easy to make, even if you are playing well. The choices of other strong amateurs would also be relevant, by the same token.

Bojanic wrote:
maf wrote: Also, if I did not miscount, all games together yield only about 30 or so data points (tenuki moves). That is not a lot, it can suffer from sheer coincidence. If you're only 99% certain (which is a lot from so few data), then that practically ensures that each year, several honest players would be 'convicted' of cheating. You need to have at least 5 or 6 nines. It's not simple.
So you mean that if player uses Leela for 1-2 games, he cannot be punished, because of too little nodes?

Depending upon how he uses Leela, probably not, if the evidence is only statistical. There won't be enough data. The fact that the original verdict was based upon the statistics of only one game was a big red flag for me.

Edit: As Regan points out, without physical or behavioral evidence, cheating is very difficult to prove.

Bojanic · Post by **Bojanic** » Wed Jun 13, 2018 8:44 am

theoldway,

I see that you have registered especially for this topic, and that you want to show that proving online cheating is impossible.
OK, that is certainly your right, but it has to be noted on topic.

Since in your messages are mostly accusations and attacks on me, without much analysis, I would ask you to introduce yourself for further discussion.
I am registered here by my name, here is my EGD card.
http://www.europeangodatabase.eu/EGD/Pl ... y=10337085

Bojanic · Post by **Bojanic** » Wed Jun 13, 2018 8:54 am

Bill Spight wrote:But I was surprised that you only looked at Metta's plays. We really need to compare the variations in his play with the variations in others' plays. Also, output by other programs, such as Zen or LeelaZero/Elf would be good for comparison.

I have looked other's player moves also, you can see them in RSGF files.
They are much more different than Leela's, lot of moves outside suggestions, etc.

And why do we need to analyze other players?
Analysis should be made on play of one player in live and in internet games.

Bill Spight wrote:You have discovered an important piece of evidence, Leela's top choice which is a blunder, but which Metta played. To have an amateur player who is playing well make a blunder is not all that unusual, but when that blunder is also the choice of a super strong program, that is unusual.

In other games of Metta I analyzed, there were life&death moves that were better than Leela's.
IE against Stankovic cut was not in the suggestions, in other game he connected his group on the left side on the first line.
So he can play L&D problems better than Leela.

Bill Spight wrote:One thing we would like to know is how often Leela makes similar blunders.

It did make it this game, and as I mentioned previously, failed to see some moves in other games.

Bill Spight wrote:Here is also where the choices of other strong programs, such as Zen, Golaxy, or Leela/Efi could be helpful to know. If they also choose that blunder, then the fact that Metta did too is not so significant. It would be a blunder that is easy to make, even if you are playing well. The choices of other strong amateurs would also be relevant, by the same token.

For those two games, it is very clear that he showed very large similarities to Leela's play, why would you analyze it with other programs?
It could make sense in other games where it is not similar to Leela.

Bojanic wrote: Depending upon how he uses Leela, probably not, if the evidence is only statistical. There won't be enough data.

If that is the case, we can conclude that any online official games are pointless, since it is impossible to prevent cheating.

Bill Spight · Post by **Bill Spight** » Wed Jun 13, 2018 9:06 am

Bojanic wrote:
Bill Spight wrote:But I was surprised that you only looked at Metta's plays. We really need to compare the variations in his play with the variations in others' plays. Also, output by other programs, such as Zen or LeelaZero/Elf would be good for comparison.
I have looked other's player moves also, you can see them in RSGF files.
They are much more different than Leela's, lot of moves outside suggestions, etc.

And why do we need to analyze other players?
Analysis should be made on play of one player in live and in internet games.

{snip}

Bill Spight wrote:Here is also where the choices of other strong programs, such as Zen, Golaxy, or Leela/Efi could be helpful to know. If they also choose that blunder, then the fact that Metta did too is not so significant. It would be a blunder that is easy to make, even if you are playing well. The choices of other strong amateurs would also be relevant, by the same token.
For those two games, it is very clear that he showed very large similarities to Leela's play, why would you analyze it with other programs?

(Emphasis mine.)

The short answer to making comparisons with other players and other program is this: It's the differences that make a difference.

Life In 19x19

“Decision: case of using computer assistance in League A”

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A

Re: “Decision: case of using computer assistance in League A