NordicGoDojo wrote:
An interesting feature of the scoremean is that KataGo is reliably able to produce ties against itself (with an integer komi), showing that its calculations are at least consistent to a degree. [...] KataGo also reliably beats Leela Zero, indicating that its understanding of the game should be the better one. While the scoremean values are 'impure' and 'imprecise', unlike human counting, I still think we should give them value.
Sure.
Where I object is over-interpretation of KataGo's skill, e.g., when the paper refers to "a player's skill".
Quote:
RobertJasiek wrote:
number of its future possibilites
I'm not sure I get this.
I lack to time to work out this. The paper might as well omit the related statement without significant impact, so who cares?:)
Quote:
a player's average effect in a single game does not make it possible to accurately estimate their playing skill
At the same time, the paper describes the intention of analysing a player's skill (but so far should only speak of a "model of it") from just one game. His demonstrated, what the paper calls, skill shall then be used as a basis for possibly detecting his cheating in this game.
If you do hold your statement, you must at the same time hold that cheating detection by the paper's means from only one game of the player is impossible.
Quote:
There is a fairly strong correlation, however.
I do not have a problem with seeing a fairly strong correlation, as long as it roughly described as "for an average game of a particular, arbitrary player, the paper's tools can indicate a 'cheating' suspicion under the assumption that the model of the player's performance is his performance".
Quote:
Of course you can fit a humanly describable strategic plan to a particular move by an AI.
Not just to one move but To particular kinds of move sequences.
Quote:
The point is that the AI's move-choosing procedure itself is [...]
...not described as a human-readable strategic plan indeed, right. It is well hidden in the network values, pure tree searches and code.
Quote:
I think we should note that the only 'proof' of cheating is the cheater's confession or getting caught in the act, for example by a video recording or a trusted proctor. All other anti-cheating solutions are finally based on probabilities, which I think should be called 'evidence' rather than 'proof'.
Right.
Therefore, if "statistical" probabilities shall serve as evidence, they require theory for thresholds, levels of confidence and agreement to large samples.
Quote:
I have tested the model on a wide variety of AIs, even AlphaGo Master, which is trained on human games and plays considerably differently from modern AIs such as KataGo.
Good.
Quote:
Furthermore, being able to for example play KataGo's favoured sequences out from time to time will not mark a player as suspicious, but consistently playing in the roughly right part of the board will.
(My reply refers to phases before the endgame phase.)
I disagree. A player can have the skill to always play in roughly the, as indicated by AI analysis, right part of the board in some of his games. Such a player need not have superhuman level.
A player is suspicious if he also consistently plays locally close to optimal. If we know he is a strong (or very strong) player, we must be extra cautious and tolerant towards interpreting his skill.
Quote:
I have seen no evidence of a player's 'familiarity with AI moves' making them stand out in my analysis.
I expect what you describe. Nevertheless and regardless, could you describe your observations so far in more detail, please? We might learn from them.
Quote:
I think you may have misunderstood the purpose of the paper.
I get it that the paper is an early step in metrics analysis - for that purpose, I do not think to have misunderstood its purpose.
At the same time, at various places, the paper makes detailed statements that go far beyond the aforementioned purpose. I criticise the paper for such over-interpreting statements.
Furthermore, the paper goes far beyond the aforementioned purpose when suggesting and describing application to cheating detection. I also criticise that the paper rushes ahead too fast while it even serves as part of justification of already applying such tools in tournaments.
IOW, there is not just one purpose of the paper - not just the modest purpose of an early step in metrics analysis. This paper does not give an impression like a pure maths paper, such as about KL-divergence gives. Quite contrarily, implicitly the paper is referred to as strong justification when a tournament announcement refers to "state-of-the-art" anti-cheating tools.
Quote:
Depending on the complexity of a position, this may of course require a larger number of playouts.
Yes, but cheating detection is supposed to be applicable even when, in quite a few positions, there are not enough playouts.
Quote:
RobertJasiek wrote:
The paper's value analysis applied to players creates an unfair prejudice: some players with specific playing styles, studying with specific AI programs or having studied much with AI are in much greater danger of being wrongly indicated as cheaters.
As I said above, this claim is unsubstantiated.
I am not convinced because I do not buy it that the paper's only purpose would be early research. (You might rewrite the paper to convince me by removing all details hinting at advanced application / interpretation, but please do not waste your time on doing so:) As a suggestion for future papers, clearly distinguish current level of understanding and possible future research, and maybe applications beyond a paper described outside a paper itself.)
Quote:
It certainly was not presenting a robust system that can be applied in cheat detection – if that was the case, then the research would be nearing completion, rather than having just started, and we would already have a product to offer.
Thank you for the clarification!
Quote:
When applying a series of 'cheat filters'
Good in theory, but only good in practice if each filter itself is convincing - and is not a roughly 50% interpretation chance "cheated or not cheated".
Quote:
I believe is more trustworthy than a 'mere' human interpretation made from reviewing a game.
I think what might some time become a useful filter is objective analysis of data currently presented as graphs and indicating similar progress (such as "winning chances") during a game of different AIs' moves versus a player's moves. Such characteristics are very hard to fake if they occur absolutely consistently before the stage of already having won a game strategically. Of course, that presumes extensive studies that coincidences do not occur just because of a specific nature of a game's development.