For AlphaZero-like self-play training purposes, neural net input/output representation, and compatibility with existing protocols for computer Go playing and game representation (GTP, SGF), I would like a ruleset that:
* Does not include any hypothetical play, branching of the game (i.e. copying the game to determine something) or any rollback of state upon determining the status of a chain, or a ko, or an area of the board. The game and all statuses are determined through alternating actions alone.
* All possible actions in any phase of the game are either a pass or are "naturally" representable as being associated with one or more locations on the board, such that these locations are disjoint between actions on a given turn (implications: any location has at most one possible action associated with it, the maximum number of possible legal actions on a 19x19 board is 362, there are no actions like "communicate a list of group and their proposed statues to be agreed upon" that cannot be naturally be encoded in such a way, etc.).
* Will still produce some result even with fairly badly-behaved players. Such as players that play completely at random, or do not understand when play are necessary or not, or that "mistakenly" take actions that result in the game ending "prematurely". Self-play rules must be able to handle such players.
* Can be efficiently implemented in an actual computer program without too much difficulty. I think the current draft is not too bad on this - only a small number of additional concepts are defined such "state" and "atari" beyond those that a computer implementation would need to implement anyways such as "region", and simple combinations of those concepts are enough to define the most complex concepts like "independent-life-region" without any more layers of definitions. And as for efficiency, everything is computable directly, for example no tree search over move sequences is necessary to implement any of the rules.
* Subject to the above technical requirements/restrictions, still does a reasonable job of matching most common situation results in the Japanese rules so long as the players ARE reasonably-well-behaved and act self-interestedly.
For most area-scoring rulesets, it is not hard to come up with a version that satisfies all the above technical requirements, in fact many of them pretty much do already - at least after omitting any human provisions for group status agreement. The challenge is to find one for Japanese rules. I'm not aware of mechanisms right now that do a better job of achieving such an approximation subject to such requirements than ko-pass-like rules. Are there any known?
Edit: Maybe not all of the above restrictions are necessary, but having them massively simplifies the process of making self-play training work and to make the bot compatible with existing tools, GUIs, etc.
I like to think that the current draft, despite the flaws, actually does a pretty good job of achieving the above objectives. If someone does see a simple change that would be a clear improvement, that would be great. Otherwise, I'm likely go just go ahead with it (after pondering for a little longer if any simple clear improvements are possible), unless someone comes up with something massively better.
I hope some of this work ends up being actually useful in the end. In the current state, it's pretty clear to me why no Go programmer wants to touch this stuff if they can help it.
