Extending SGF?

pnprog · Post by **pnprog** » Sun Jun 16, 2019 3:35 am

I like to proposal of spook, I think JSON is a good fit for this job.

spook wrote:We have to keep in mind that other AIs will have overlap, but may have more or less properties. I don't think you want to define a standard and dedicate it to 1 bot.
So, it should be very flexible.

Very good point. We have no idea what the state of the art go bots in 10 years from now will look like. They maybe they will use a technology different from neural networks of MCTS, and won't talk in term of winrate or playouts.

Amtiskaw wrote:Of course there are additional problems when the data isn't UTF-8, but mandating UTF-8 would fix SGF's biggest problems.

Yes this please

The different encodings are a legacy from the past, and have not more reason to be used today. Enforcing UTF-8 would help so much. As a matter of fact, GoReviewPartner only outputs UTF-8 SGF or RSGF now. It immediately converts from any other encoding it encounters, and try to use it anyway when encoding is not specified.

As a note for later, we may have to specify somewhere the units of some values with use in the JSON file. For example, winrate like below:

Code: Select all

stats: [
{
  move: "Q16",
  winrate: 0.46,
},
{
  move: "D16",
  winrate: 0.47,
},
...
]

Would that mean 46% or 0.46%? Would that be winrate for the player to make that move? or this that the winrate for black? (like in Alphago teaching tool?). I am always using a format like "45.3%/56.5%" in GoReviewPartner to avoid any ambiguity, but that solution is not really satisfying.

As a note for later, that would be good to have support for a simple/standard markup language for the comments. Something simple that can be used to add hyperlinks, bullet lists, bold/italic... something like Markdown

lightvector · Post by **lightvector** » Mon Jun 17, 2019 6:07 am

pnprog wrote:
spook wrote:We have to keep in mind that other AIs will have overlap, but may have more or less properties. I don't think you want to define a standard and dedicate it to 1 bot.
So, it should be very flexible.
Very good point. We have no idea what the state of the art go bots in 10 years from now will look like. They maybe they will use a technology different from neural networks of MCTS, and won't talk in term of winrate or playouts.

And some modern bots already right now also reports not just winrate, but the average expected score, as well as a standard deviation value that measures uncertainty about the score.

pnprog wrote: As a note for later, we may have to specify somewhere the units of some values with use in the JSON file. For example, winrate like below:
Code: Select all
stats: [
{
  move: "Q16",
  winrate: 0.46,
},
{
  move: "D16",
  winrate: 0.47,
},
...
]
Would that mean 46% or 0.46%? Would that be winrate for the player to make that move? or this that the winrate for black? (like in Alphago teaching tool?). I am always using a format like "45.3%/56.5%" in GoReviewPartner to avoid any ambiguity, but that solution is not really satisfying.

I agree with thinking about units. Leela Zero's lz-analyze currently multiplies all probability values by 10000 and then rounds them - but if we're talking file formats, 10000 seems like a pretty arbitrary constant. I would vote not multiplying at all - fields intended to be probabilities should be floats between 0 and 1, predicted score should be in units of points rather than points-times-some-constant, if a bot wants to report signed utility (e.g. version of winrate that is positive if a player is ahead and negative if behind, possibly blending in a term for greater score), that could be around the scale of -1 to 1, etc.

Mandating values from the view of a consistent player (e.g. black) rather than alternating by side to move would make it much easier to write tools that graph the winrate or other values, or scan game records for large differences between consecutive moves looking for mistakes and such, since for both of those applications consistent-view values can be used as-is while side to move values need to be inverted every other move. There's also precedent in Chess - my impression is that it's also somewhat more common in Chess analysis land as well to use a consistent player (the first player) rather than to show by side-to-move.

xela · Post by **xela** » Sun Dec 29, 2019 5:01 pm

So did this ever go anywhere? It looks like https://www.red-bean.com/sgf/ hasn't been updated in a very long time.

My 2 cents:

SGF is close to human-readable. I do sometimes open up SGF in a text editor to fix broken files or reformat stuff, or use grep on a file of SGFs to find things. JSON is also more or less human-readable. XML isn't, despite what the fans say: it's just too verbose. I'd prefer to stay with SGF so as to not throw out all my old software. But I can see the benefits of a JSON alternative.
As well as bot evaluations, the other thing I'd love to see added to SGF (or a new format) is node labels plus the ability to hyperlink from a comment to another (labelled) node. This would be great for commented games and for SGF joseki dictionaries: the comments could say things like "compare this variation with (link)".

Javaness2 · Post by **Javaness2** » Mon Dec 30, 2019 1:34 am

Can we not just go back to the Ishi format?

Harleqin · Post by **Harleqin** » Mon Dec 30, 2019 5:28 pm

You have two options: keep SGF compatibility, or create a new format. I think that SGF is not too bad, and it has the big advantage that there is already a large(ish) community of programs and people who know how to work with it. On the other hand, SGF has made some restricting decisions that you might want to get rid of.

If you keep to extending SGF, there are two kinds of extensions: add more properties just to nodes (e. g. bot evaluations), and add more structure (e. g. links). The latter is more interesting. The former is restricted by the decision to have only short, unstructured property names.

For more properties, I'd propose to add just one new property name, and make its value arbitrarily structured. This might look like:

Code: Select all

Z[{bots {leela ({version "157" black-win-rate 57.32 prediction (ca if ig hi)})}}]

For links, I think the idea of labelling and then referring is not too bad. For example, mark with MK[some-label]. The referring is not so easy, because it has to extend/restrict the syntax inside e. g. a comment. Maybe C[See the {ref some-label with "other position"}].

(In the above, I avoided the use of square brackets and colons so that there is less need for escaping. It's just a quick draft.)

If you think about a new format, I believe that one should think about whether a tree is the right model. Maybe a general property graph can offer new opportunities, e. g. unifying variations that arrive at the same position, or representing not only moves, but also links or ko restrictions as edges. This is hard to do in a human readable way, but maybe that's rather liberating. Moving to a binary format could also simplify the implementation, since human readability is not a concern anymore and you can put exact type and length tags in — no escaping needed.

P. S.: no XML.

Life In 19x19

Extending SGF?

Re: Extending SGF?

Re: Extending SGF?

Re: Extending SGF?

Re: Extending SGF?

Re: Extending SGF?