Life In 19x19 http://lifein19x19.com/ |
|
Questions about a game http://lifein19x19.com/viewtopic.php?f=15&t=15836 |
Page 3 of 4 |
Author: | Bill Spight [ Thu Jun 21, 2018 9:15 am ] |
Post subject: | Re: Questions about a game |
bugsti wrote: Bill Spight wrote: Carlo's motivation is a factor in considering his guilt, especially as all we have in the game record are coincidences that may have arisen by chance. But, truth to say, the facet that the team had already lost may have increased his personal motivation, since the risk was his alone. From PGETC rules: "Penalties: Any cheating results in losing the complete match on all four boards with an additional four MP-penalty. Second cheating disqualifies the team for the running season and the next two season." Carlo's loss would have been the fourth loss in the match, anyway, but there was an additional penalty. Thanks for the info. |
Author: | MircoF [ Thu Jun 21, 2018 9:54 am ] |
Post subject: | Re: Questions about a game |
In addition, the last thing I wanted to say is that any case has to be judge with his gravity and consequences. If I cheat and win a final game to one of the first place in the tournament, it has a weight. But cheating from an amateurish team that plays to remain in A league, in a game that could not change the situation it is a fact of much less gravity. Indeed, I would forgive the player if I had to judge, also if I had found him guilty. PS: I hope Javaness will forgive me for being off topic for the last time! |
Author: | Jan.van.Rongen [ Thu Jun 21, 2018 2:57 pm ] |
Post subject: | Re: Questions about a game |
Bojanic wrote: ...would you be so kind to forward this information to Ales Cieply? ... This information was provided by Ales himself in the other thread (see viewtopic.php?p=232722#p232722) and I commented extensively on his remarks there. That it was the same machine as his Phd supervisor used to run the tests for the appeal. A Intel Core i7, 2.60 GHZ, RAM 16GB, GPU NVIDIA GeForce GTX 960M. Operating system - Windows 10. It is very similar to my laptop: same specs but a mine with a slightly newer GPU. It is just the type of laptop you expect a Phd student in AI to have: best price/performance and you can builld all the textbook Neural Net models |
Author: | AlesCieply [ Fri Jun 22, 2018 2:12 am ] |
Post subject: | Re: Questions about a game |
Jan.van.Rongen wrote: This information was provided by Ales himself in the other thread (see viewtopic.php?p=232722#p232722) and I commented extensively on his remarks there. That it was the same machine as his Phd supervisor used to run the tests for the appeal. A Intel Core i7, 2.60 GHZ, RAM 16GB, GPU NVIDIA GeForce GTX 960M. Operating system - Windows 10. Jan, actually, your comments in the other thread were quite a bit misleading there but I did not find it important to answer then. Just to clarify. Where do you get the idea that Carlo's PhD supervisor (F.Moradin) did the tests (games analysis) for the appeal. As far as I know he did not. I believe most of it was done by Carlo himself. I guess MircoF can tell us for sure. What I said there: AlesCieply wrote: On the computer Carlo Metta might have used in his PGETC games. In the Italian appeal they specify what computer they performed their counter-analysis on: Intel Core i7, 2.60 GHZ, RAM 16GB, GPU NVIDIA GeForce GTX 960M. Operating system - Windows 10. They also say there that it analyses about 100k nodes in about 30s. I asked the question what computer Carlo used in his PGETC games and the answer I got (from Mirco Fanti, the Italian team captain, as he insisted any questions should not be asked Carlo directly but should go through him) that it was the one used in the analysis. I conclude from this that most (in not all) of the Italian counter-analysis was done by Carlo himself.
|
Author: | AlesCieply [ Fri Jun 22, 2018 2:18 am ] |
Post subject: | Re: Questions about a game |
And something to stay on topic: Bill, you may also pick the suggestions for any "important moves" that my Leela provided too. They are all in the document I put on-line, https://docs.google.com/spreadsheets/d/ ... =925979564 Just have a look on the sheet MettaBenDavid for the game you started discussing here. |
Author: | Bill Spight [ Fri Jun 22, 2018 12:51 pm ] |
Post subject: | Re: Questions about a game |
AlesCieply wrote: And something to stay on topic: Bill, you may also pick the suggestions for any "important moves" that my Leela provided too. They are all in the document I put on-line, https://docs.google.com/spreadsheets/d/ ... =925979564 Just have a look on the sheet MettaBenDavid for the game you started discussing here. Thanks, but I did not find any specific moves on that spreadsheet. |
Author: | AlesCieply [ Fri Jun 22, 2018 1:20 pm ] |
Post subject: | Re: Questions about a game |
This is what I see when I open the sheet: Does it not open properly for you. The #1 column is the top suggestion by Leela, the #n is the number of the move (from the list of choices provided by Leela) actually played in the game. EDIT: Sorry for the large picture, I did not find how to handle it properly. |
Author: | Bill Spight [ Fri Jun 22, 2018 1:28 pm ] |
Post subject: | Re: Questions about a game |
Thanks. It opened on a different sheet. But I found the one you showed. |
Author: | AlesCieply [ Fri Jun 22, 2018 1:34 pm ] |
Post subject: | Re: Questions about a game |
Fine, you can check all the games I analyzed there and pick the Leela suggestions for any moves you determine as significant. In fact, I have already some more games analyzed, will upload them at the google sheets some time next week. |
Author: | Bill Spight [ Sat Jun 23, 2018 12:03 pm ] |
Post subject: | Re: Questions about a game |
Added more positions to note #2 ( viewtopic.php?p=233019#p233019 ), to make 12 in all. No more positions to add. |
Author: | AlesCieply [ Thu Jun 28, 2018 3:32 am ] |
Post subject: | Re: Questions about a game |
I think it is a right direction to select a number of significant moves in a game and then establish whether the player made a mistake there and how large the mistake was. At least it is what Ken Regan uses for the chess games. My concerns are how reliably we can determine the "significant moves" aa well as the value of mistakes (lowering of the winning probability) the played moves bear. As far as this is difficult to automate, we have to rely on "experts" to choose the significant moves. However, we need to do it for a large number of games played by many different players to establish how players of different strength would do in these positions. Regan was able to compare the histograms/graphs of someone's play with those already established for players of varied strength (ELO). I am afraid we cannot ask for "experts opinion" on hundreds (if not thousands) of games. So, I am a bit sceptical about this kind of analysis at the moment. On the other hand, we should still be able to "measure" how Metta's play in internet games differs from his play in regular games. What Milos Bojanic does in his analysis has a qualitative character but I think that his analysis can be improved by making delta-histograms for the significant moves in a similar way as I do in my analysis for the whole part of the game (moves 31-180). |
Author: | Bill Spight [ Thu Jun 28, 2018 7:32 am ] |
Post subject: | Re: Questions about a game |
AlesCieply wrote: I think it is a right direction to select a number of significant moves in a game and then establish whether the player made a mistake there and how large the mistake was. At least it is what Ken Regan uses for the chess games. My concerns are how reliably we can determine the "significant moves" aa well as the value of mistakes (lowering of the winning probability) the played moves bear. As I have indicated, I think that the consensus of relatively few experts (maybe even five) can distinguish difficult choices in each game, say for the top 5% - 10% of plays, from less difficult choices. They can also agree on the worst plays, which can happen even when the position is not very difficult. For instance, in the Metta-Ben David game even I felt that was a mistake, in agreement with Frejlak and Leela. I also did not like , mainly because I thought that White was already behind because of . Real experts could agree on at least a few errors per game. However, reliably judging the value of mistakes is obviously beyond the ability of Leela 11, as analysis of the 8 Metta games shows. We even see that in the current games of the top AI bots. (Alpha Zero having retired.) But give us a few years, and let us develop bots for the purpose of rating positions and plays. Our current top bots are not optimized for those purposes. |
Author: | Bill Spight [ Sat Jun 30, 2018 11:12 am ] |
Post subject: | Re: Questions about a game |
I have not gotten around to why I started this thread. Busy, busy! Let me start with some considerations that are broadly applicable. In this world coincidences abound. So do spurious correlations. When I was a kid I read about one spurious correlation whereby, for several years, it was possible to predict the US economy from the stork population in one European city (Amsterdam, I think, but it was a long time ago). OC, no matter how good the correlation may have been, nobody thought that it was anything but coincidental. Why not? Because we had no theory connecting the two facts. Nothing that we know about biology and economics provides any connection. Given a correlation, we may search for a theory, an explanation. Often finding any explanation is a challenge. Often, however, we have a number of possible explanations. We may then try to find the best explanation. Generality is desirable, as are brevity and parsimony. The fewer assumptions the explanation requires, the better. Note that the original correlation which we are trying to explain is not a very good test of the explanation. Sure, if we tested the explanation de novo it would be good evidence, but since we have fitted the theory to the data, we cannot expect less. In science we try to test the explanation or theory with new evidence. In real life "detective" work we may not be able to do that. But in science and detection we do our best to disprove and eliminate possible theories. Especially our pet theories. Sherlock Holmes wrote: When you have eliminated the impossible, whatever remains, however improbable, must be the truth.
|
Author: | Bill Spight [ Mon Jul 09, 2018 1:45 pm ] |
Post subject: | Re: Questions about a game |
Every assumption that we make about a theory or about its evidence weakens the support for the theory. Now to make a case for a theory, in lawyerly fashion, may mean making assumptions. That is not an indictment of making a case. But the standpoint of a scientist or detective is different from that of a lawyer or advocate. Not that scientists and detectives don't make cases for their conclusions, but their process, at least ideally, is largely as the Sherlock Holmes quote suggests, to eliminate theories. The case against Metta began, IIUC, with the observation of a member of the Israeli team who was watching his game vs. Ben David, that a surprising number of Metta's plays matched the plays suggested by Leela 11. OC, that fact raises the question of cheating, and, IMO, justified filing a complaint. One way of cheating in online chess is to make the plays suggested by a superhuman engine. Analysis reveals that the suspected player, who has a modest rating, makes neither blunder nor mistake, but only makes a few of what chess players term inaccuracies. Such play is superhuman. In addition, there is often behavioral evidence, such as the suspected player belittling his opponent. Now, Simba strongly believes that Metta cheated in their game. He assumes that Metta used Leela Zero to do so. OC, that may be so, but it is an assumption. If Metta was cheating, why did he blunder away a number of stones early in the game? Simba assumes that Metta did not cheat early in the game, because Leela Zero is so strong that Metta could use it to win if he got behind. Now that is a plausible theory of cheating, but it is also ad hoc, tailored to fit the evidence. Then there is the curious case, advanced by the anonymous accuser, of the now infamous move 156, where Metta did not pick the play recommended by Leela 11, which was a mistake, but picked the play recommended by Leela Zero. Why that might be relevant is a puzzle, given the theory that Metta used Leela Zero to cheat, anyway. To get there the anonymous accuser had to assume, as he stated on reddit, that once he was comfortably ahead Carlo started using both Leela 11 and Leela Zero, side by side, to cheat. Not only is that an additional assumption, it is implausible on its face. Now, making assumptions weakens a case, but when the assumptions are not pointed out, doing so can appear to strengthen a case. The anonymous accuser's gratuitous assumption performs that function brilliantly. On move 156 Leela 11's play, which Metta did not choose, is a significant error, while Leela Zero's play, which Metta did choose, is not. Such a large discrepancy in evaluation between Leela 11 and Leela Zero is unusual, and Anonymous Accuser uses that fact as proof that Metta was cheating. OC, the discrepancy is relevant only given the assumption that Metta was using both Leela 11 and Leela Zero simultaneously. And the so-called proof requires the hidden assumption that Metta would not have found move 156 if he were not cheating. In fact, it is an obvious candidate move. The original case against Metta, made by the Israeli team, also makes assumptions that make the case appear stronger than it is while actually weakening it. One known way of cheating online is to use a superhuman program to choose your plays. Here the obvious suspect is Leela, especially since Metta says that he used it for training. One problem with that theory is that many of his moves do not match Leela's choice. One possibility, OC, is that Metta cheated in a different way. When asked how they might cheat using a superstrong bot, players on this forum suggested that they might use it to avoid blunders. To test that theory would involve looking at individual plays for evidence of errors. I have actually done that, as have others. Another possibility is that, because Leela is non-deterministic, plays that it suggested to Metta might not match all of its suggestions when the program is run to check for matches. It is, in fact, very likely that the checking run of the program will not find all of the plays where Metta made the same play as Leela suggested. However, it works the other way, as well. There will be plays that match the checking run that did not match the run that Metta used, if he used Leela at all. This behavior of Leela is something that a scientist or detective would examine. In fact, because of the phenomenon called regression to the mean, a run chosen because it has a high number of matches would likely have a higher number of matches than a random run. In any event, the possibility that Metta made plays suggested by Leela that the checking run of Leela does not match does not justify matching second or third choices of that run to Metta's plays. The likely result of doing so would be to grossly overestimate the number of matches, when it is plausible that the number matches to Leela's top choice alone is already an overestimate, assuming that Metta picked Leela's top choice for several plays. This is not a criticism of the Israeli team. Their job was to present a case, not to find a verdict. But by matching a range of Metta's plays to Leela's top three choices they gave the appearance, echoed by the claim — by whom, I forget, but it does not really matter — that the probability that Metta cheated in that game is greater than 90%. (The Israeli team found a match of 98% for Metta's plays in the range of moves 50 - 150. Other runs found, as we might have expected, lower matching rates around 93%.) Matches to Leela's top choice were 72%; other runs may have found matches in the mid-60% range. Adding matches to the second and third choices made a big difference in the impression that the evidence gave. 98% matches? A guilty verdict appeared to be a slam dunk. To paint such an impressive picture required the assumption that Metta would sometimes pick Leela's second choice and the assumption that he would sometimes pick Leela's third choice. The assumption that he would sometimes pick Leela's fourth choice was not necessary. The additional assumption was made that he would not pick an obviously bad play. The assumption was also made that Metta would pick a second or third choice in order to avoid detection that he was cheating. This assumption seems rather implausible, given that Leela reveals its second and third choices, possibly among others. How does picking one of them avoid detection? The choice of the range of 50 of Metta's plays is also suspicious. The arguments that later plays in the endgame might be unreliable and that earlier plays in the opening might provide too many matches because of joseki are plausible. But I cannot avoid the nagging suspicion that a wider range might have been less impressive. (In particular, Metta's move 37 is problematic for the cheating hypothesis, as we shall see.) One thing my undergraduate research methods professor stressed was this: Quote: Do not throw away any data. OC, you may have data that are questionable, or outliers that you ignore in reaching your final conclusion. But you have to address those data and make your arguments. You don't just make some plausible assertions and then ignore data without even considering it. The human world is full of plausible assertions. A lot of people assume that the opening is not a good place to look for evidence of cheating in go by using a super strong bot. I disagree. That may make sense in chess, where players memorize openings and chess engines use opening books. But the opening is more fluid in go, and, perhaps more importantly, super strong go bots excel in strategy, which is paramount in the opening. That's where you can use a bot to advantage to take an early lead. And humans are imitating bots in the opening already. They are making early 3-3 invasions, playing some new AI joseki, and making attachments and diagonal contact plays that humans used to avoid in the opening. Some New Fuseki style plays have made a comeback, as well. Using a bot in the opening is not a dead giveaway. (OC, using a bot in a semeai may be a giveaway when the bot makes a mistake. ) While I have raised questions about the accusations against Metta in this note, my main point is to show the deleterious effect of assumptions. Not only do they weaken a case, they can make it look better than it is. |
Author: | Bill Spight [ Mon Jul 09, 2018 1:50 pm ] |
Post subject: | Re: Questions about a game |
Back when I was training at go, I made use of Botvinnik's idea to study positions where I had taken a long time, because they were difficult for me. An unstated assumption in this training is that I was more likely to have made a mistake in a position where I took a long time than in positions where I did not, or to make worse mistakes. That runs counter to folklore, where taking a long time enables you to play well. An famous example is the game where Honinbo Shusai took 8 hours ( ) to read out an endgame and find the right play. But if taking a long time meant finding the right play, why study those positions afterwards? Bridge great Alfred Sheinwold once quipped that he played quickly because quick errors were less embarrassing than slow ones. So I start with the assumption that plays that take more time for a player are more likely to be mistakes than the others, or are worse on average, because they are more difficult for him. As discussion in this thread has indicated, time taken is at best an imperfect measure, since we do not know why the player took a long time. He might have gone to the kitchen to make a sandwich, for instance. Imperfect though it may be, I take the time taken as a measure of subjective difficulty, and assume that plays where a player takes a long time are on average worse than those where he takes a normal length of time. (Very quick plays may on average be worse, as well, for different reasons. ) I also take the difference between Leela's estimated winrate for a play and its estimated winrate for its top choice as a measure of how bad a play may be. (Or how good, in some instances. ) As is well known, that is an imperfect measure, for several reasons. But you make do with what you have. I have used Ales Cieply's numbers. I have indicated the relationship between the time taken by Metta in his game with Ben David and the number of % points chucked in the following table.
* 10 next longest plays ---- total of 9.0% pts. chucked -- average 0.9% * 48 shortest plays ---- total of 23.6% pts. chucked -- average 0.5% According to Cieply's Leela11, Metta chucked 24.6% pts. in the 19 plays he took the longest on, as opposed to 23.6% pts. chucked in the other 48 plays. He did play play worse, by that measure, on his slowest plays. Note that the only other hypothesis advanced in this discussion concerning the length to time Metta took to make a play is that a quick play might not have given him time enough to cheat effectively using Leela. OC, for the plays that took him a long time he had quite enough time to run Leela and cheat. |
Author: | Bill Spight [ Mon Jul 09, 2018 1:58 pm ] |
Post subject: | Re: Questions about a game |
Most of the positions I posted here were ones on which Metta took the longest (49 sec. or longer). Let's review them. Edit: Ales Cieply ran another Leela11 analysis, this time of the whole game, as I suggested, with a setting of 300K+ playouts. This is the most accurate Leela11 analysis that we have. I have edited these positions accordingly. Edit 2: Ales also managed to get analyses by Leela Zero Elf. I am updating the positions in this note and the next one accordingly. (BTW, Elf agrees with me that is not good, losing 9% pts. Leela 11 thinks it's OK, but it was trained originally on human play. ) First position. Second position. Position 3 Position 4 Position 5 |
Author: | Bill Spight [ Mon Jul 09, 2018 2:05 pm ] |
Post subject: | Re: Questions about a game |
More positions. Edit: I also applied the result of Cieply's new run to these positions. Edit 2: Added Leela Elf (200k)'s results, using the delta estimate. Position 6 Position 7 Position 8 Position 9 Position 10 Position 11 Aside from the position for , these are the 10 positions on which Carlo Metta took the most time, 49 sec. or more. We do not have Leela11's evaluation for one of them, because it occurred before either Cieply or Bojanic did any evaluations. But it was a joseki choice that stands a good chance of being Cieply's Leela's top choice. Another one was Leela's top choice, and another one Leela considered to be better than its top choice. Of the 67 plays that Cieply's Leela evaluated, it considered 21 of them to be errors. 7 of them were in the 9 plays that took the longest time. 14 of them were in the 58 remaining plays. So when Carlo took a long time, he was more likely to make an error (according to Leela). The average winrate loss per error was about 2.3%, regardless of how long Carlo took. So the main difference has to do with the probability of error. When he took less than 49 sec. to make a play, his error rate was around ¼; when he took 49 sec. or more to make a play, his error rate was around ¾. When queried about how they might use a bot to cheat, many of our members said that they would use it to help prevent blunders. Carlo's time usage data does not fit that theory of cheating. Why would he take a long time on a play, only to then pick what Leela told him was a bad play more often than he usually did? What theory of cheating does this data fit? OC, he might have cleverly taken a long time to pick bad moves, in order to disguise his cheating. But until now, who has suggested that such a strategy might be necessary? |
Author: | Bojanic [ Tue Jul 10, 2018 5:09 am ] |
Post subject: | Re: Questions about a game |
Bill Spight wrote: Position 7 [hide] Metta played , which looks like aji keshi to Frejlak and me. He took 56 seconds, almost 6 times his average. When I did analysis of this move, before previous move, wN2 was played, and for some time after it was played, this move was actually Leela's top choice. Leela found better move later, but this move was rated as top choice for considerable time. Bill Spight wrote: Of the 67 plays that Cieply's Leela evaluated, it considered 21 of them to be errors. 7 of them were in the 9 plays that took the longest time. 14 of them were in the 58 remaining plays. So when Carlo took a long time, he was more likely to make an error (according to Leela). Those 21 moves are not errors! There were better moves, but those moves were still recommended, and in some cases they were Leela's top choice during some period of time. Even in top european players you could see serious errors, blunders, miscalculations, hallucinations etc - but you could not see any of those in two of the Carlo's games. Just a long list of Leela's top choices. Bill Spight wrote: The average winrate loss per error was about 2.3%, regardless of how long Carlo took. So the main difference has to do with the probability of error. When he took less than 49 sec. to make a play, his error rate was around ¼; when he took 49 sec. or more to make a play, his error rate was around ¾. When queried about how they might use a bot to cheat, many of our members said that they would use it to help prevent blunders. Carlo's time usage data does not fit that theory of cheating. Why would he take a long time on a play, only to then pick what Leela told him was a bad play more often than he usually did? What theory of cheating does this data fit? There is a problem with analysis of spent time - we don't know what he was doing during it. |
Author: | Bill Spight [ Tue Jul 10, 2018 7:43 am ] |
Post subject: | Re: Questions about a game |
Bojanic wrote: Bill Spight wrote: Position 7 [hide] Metta played , which looks like aji keshi to Frejlak and me. He took 56 seconds, almost 6 times his average. When I did analysis of this move, before previous move, wN2 was played, and for some time after it was played, this move was actually Leela's top choice. Leela found better move later, but this move was rated as top choice for considerable time. Bill Spight wrote: Of the 67 plays that Cieply's Leela evaluated, it considered 21 of them to be errors. 7 of them were in the 9 plays that took the longest time. 14 of them were in the 58 remaining plays. So when Carlo took a long time, he was more likely to make an error (according to Leela). Those 21 moves are not errors! According to Leela, they reduced Carlo's probability of winning the game. What do you require of an error? That it loses the game after a sequence of perfect play? Call them chucks if you wish. Leela thought that with those 21 moves Carlo chucked 48.8% pts. Quote: Bill Spight wrote: The average winrate loss per error was about 2.3%, regardless of how long Carlo took. So the main difference has to do with the probability of error. When he took less than 49 sec. to make a play, his error rate was around ¼; when he took 49 sec. or more to make a play, his error rate was around ¾. When queried about how they might use a bot to cheat, many of our members said that they would use it to help prevent blunders. Carlo's time usage data does not fit that theory of cheating. Why would he take a long time on a play, only to then pick what Leela told him was a bad play more often than he usually did? What theory of cheating does this data fit? There is a problem with analysis of spent time - we don't know what he was doing during it. That question was already addressed. When Carlo took a long time he was more likely to chuck points. Despite the fact that Leela could have been running all that time and come up with good plays, as well as win rate estimates. |
Author: | Bill Spight [ Tue Jul 10, 2018 11:38 am ] |
Post subject: | Re: Questions about a game |
Well, I meant for these questions to be about behavioral evidence. Namely, given so much time, during which, assuming that Carlo used a bot to cheat, that bot could be calculating winrates and picking plays, why would he choose options that chucked percentage points, when he did not do so when he took more normal times to play? However, it seems like I have to do some statistics. And remember, I formed my hypothesis before looking at the data. The hypothesis being that on average the longer a player took to make his play, the more points he would chuck. So I did a regression using the 67 data points from Cieply's analysis. Here is the equation, using phrases for variables: expected_percentage_points_chucked = 0.385 + 0.012*seconds_taken_to_play The correlation coefficient is 0.313. P(tail) = 0.0050 < 0.01 The correlation between time taken and points chucked (according to Leela) is highly significant. |
Page 3 of 4 | All times are UTC - 8 hours [ DST ] |
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |