It is currently Fri Apr 26, 2024 2:31 am

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 7 posts ] 
Author Message
Offline
 Post subject: SGF Date Renamer
Post #1 Posted: Sun Aug 29, 2010 11:06 pm 
Lives with ko

Posts: 178
Liked others: 1
Was liked: 22
Rank: 2 dan
GD Posts: 10
KGS: usagi
# SGF Date Renamer
# by Oliver Richman on 2010-08-30
#
# This is to try and introduce a standard format for the naming of pro games.
# The script creates a new filename (ignoring the old one) from the DT, PW and PB tags inside a .SGF file.
# Duplicates are handled by appending a random number to the end of every duplicate entry except one.
#
# The main requirement for the .sgf is that the DT tag includes something like YYYY[-MM[-DD[,DD,DD-DD]]].
# Please note that it is impossible to determine what name order is used in any file.
# For example Cho U vs Michael Redmond would be shown as Cho-Michael. For Western players this
# is rather unfortunate. If you don't like this, you can replace the @plw[0] with $plrw and
# @plb[0] with $plrb in the lines with "$dfn =". This will then use the player's full name.
# Actually perhaps this should be the recommended behavior to help distinguish between common
# last names such as Park, Rin, or O (i.e. O Meien, O Rissei).


use strict;
use integer;

foreach my $fn (<*.sgf>) {
my $f;
open my $FILE, '<' , $fn or die "can't open $fn $!";

my $dt = 'x';
my $plrw = 'x';
my @plw = 'x';
my $plrb = 'x';
my @plb = 'x';
my $dfn = 'x';
my $count = 2;
local $/=undef;

while (<$FILE>)
{

if ( /DT\[.*(\d\d\d\d\-\d\d\-\d\d).*]/ )
{
$dt = $1;
}
elsif ( /DT\[.*(\d\d\d\d\-\d\d).*]/ )
{
$dt = "$1-00";
}
elsif ( /DT\[.*(\d\d\d\d).*]/ )
{
$dt = "$1-00-00";
}
else { $dt = "0000-00-00"; }

if ( /PW\[(.*?)\]/ )
{
$plrw = $1;
}
else
{$plrw = 'x';}

if ( /PB\[(.*?)\]/ )
{
$plrb = $1;
}
else {$plrb = 'x';}

@plw = split /\W+/, $plrw;
@plb = split /\W+/, $plrb;
($plrw = $plrw) =~ tr/a-zA-Z0-9//cd;
($plrb = $plrb) =~ tr/a-zA-Z0-9//cd;
}

$dfn = "$dt-@plw[0]-@plb[0].sgf";
if ($dfn eq $fn)
{ $count = 2; }
else {
while (-e $dfn)
{
$dfn = "$dt-@plw[0]-@plb[0]-$count.sgf";
$count = $count + 1;
}
}

rename($fn,$dfn);
}

Top
 Profile  
 
Offline
 Post subject: Re: SGF Date Renamer
Post #2 Posted: Sun Aug 29, 2010 11:57 pm 
Lives with ko

Posts: 178
Liked others: 1
Was liked: 22
Rank: 2 dan
GD Posts: 10
KGS: usagi
Okay, I've tested this out on my collection of 851 Go Seigen games and it seems to work just fine. It extracts the date and the player's names without any issues (even from tags like DT[Published 1934-06 in a Magazine]).

Any suggestions welcome :)

Top
 Profile  
 
Offline
 Post subject: Re: SGF Date Renamer
Post #3 Posted: Mon Aug 30, 2010 4:06 am 
Oza

Posts: 3656
Liked others: 20
Was liked: 4631
A very interesting idea, but as yet I'm not quite sure what this is meant to achieve - standardisation, yes, but for what purpose, and how enforced in practice?

But subject to the caveat you may have answers I have overlooked, here are a couple of initial reservations.

1. Some games cannot be dated properly, e.g. Qianlong era

2. Some games have two valid dates: played and broadcast. Which one do you take? If you prescribe one (e.g. the earlier) what about the case where one database maker isn't aware of an earlier date and another is?

3. Some games have disputed dates. Even Go Seigen games.

4. There are instances of the same players playing each other two or three times on the same day, even with the same colours.

5. There are, and can easily be, instances of different players of the same surnames playing each other on the same day (e.g. Cho Chikun vs. Ch'oe Myeong-hun and Cho U vs. Ch'oe Ch'eol-han.

6. The lack of a players' names standard means one database could have a certain player under the name Yi and another under Lee and another under Changho.

7. File names could be very long, and very variable in length.

There are also the exceptions you mentioned. Some cases could be treated as exceptions, but a standard with a lot of exceptions doesn't seem to be a lot of use.

For reference, the method we use in GoGoD is simply to use as the filename the YYYY-MM-DD date (00 for unknown) followed by a, b, c etc for duplicates. For us this has the merit of keeping the filenames short and easy to fit into presentation table columns. Other collectors appear to have followed this method but that does not mean it is standard - the order of games within each day varies from collection to collection. Also, we are not trying to fix a name in concrete. We very often change a filename as more data becomes available. However, whilst this is not a standard, it could be called a standard approach, and that seems to have some value. When we talk among ourselves or to other database makers, simply quoting just the date is normally enough to ensure very rapid identification.

Ales Cieply has suggested the use of PIN numbers as a means of standardisation, but there are problems even there. Who sets the PIN numbers, and how do we handle the case of two players who are the same according to one authority and differentb according to another?

But the sgf format could do with some fine tuning and standardisation (or at least best practice). I expect there'll be a lot of mileage in this discussion.


PS As a side note, our Go Seigen total is now up to 863 + 2 9x9.

Top
 Profile  
 
Offline
 Post subject: Re: SGF Date Renamer
Post #4 Posted: Mon Aug 30, 2010 5:03 am 
Lives in gote
User avatar

Posts: 429
Location: Sweden
Liked others: 101
Was liked: 73
Rank: SDK
KGS: CarlJung
Good lord! Considering all the thought that has been put into GoGoD, I would not oppose a name change to GoGod.

_________________
FusekiLibrary, an opening library.
SGF converter tools: Wbaduk NGF to SGF | 440 go problems | Fuseki made easy | Tesuji made easy | Elementary training & Dan level testing | Dan Tutor Shortcut To Dan

Top
 Profile  
 
Offline
 Post subject: Re: SGF Date Renamer
Post #5 Posted: Mon Aug 30, 2010 7:24 am 
Lives with ko

Posts: 178
Liked others: 1
Was liked: 22
Rank: 2 dan
GD Posts: 10
KGS: usagi
John Fairbairn wrote:
A very interesting idea, but as yet I'm not quite sure what this is meant to achieve - standardisation, yes, but for what purpose, and how enforced in practice?


Your points are valid, but considering the naming conventions that GoGoD uses, the same criticisms can apply; for example, what sort of filemanes do you use for the Qianlong era games? What sort of filenames do you use for games broadcast on a different date than they have been played?

The idea here is to be able to sort and process files by their date. But I guess files do not really need to be renamed to do this. It's just that I have a lot of pro games from a lot of different sources and I want to organize them under one directory by date.

As for a PIN number, the only really useful thing about that would be a hash of the moves of the game -- maybe. Like a public key -- everyone knows what it would be for any particular game. This would help distinguish amongst games played on the same day, but the PIN itself would not be informative.

Your other criticisms are covered by the duplicate handling system the script above uses, which assigns -2, -3, -4, etc. to the end of the filename for duplicates. ex. "5. There are, and can easily be, instances of different players of the same surnames playing each other on the same day (e.g. Cho Chikun vs. Ch'oe Myeong-hun and Cho U vs. Ch'oe Ch'eol-han." -- covered by the duplicate system, similar to the -a -b -c system.

---7. File names could be very long, and very variable in length.
true. Perhaps I don't really need to include the player's names in the filename.

But the sgf format could do with some fine tuning and standardisation (or at least best practice). I expect there'll be a lot of mileage in this discussion.

I'd only request that the first YYYY-MM-DD string is the play date. With 00's for day and month if they're not known. Formats such as 1850-08-22,24,25 are okay too.

Quote:
PS As a side note, our Go Seigen total is now up to 863 + 2 9x9.


Mmm, I've been thinking of buying that CD for a while now.

Top
 Profile  
 
Offline
 Post subject: Re: SGF Date Renamer
Post #6 Posted: Tue Nov 09, 2010 1:23 pm 
Lives in gote
User avatar

Posts: 643
Location: Munich, Germany
Liked others: 115
Was liked: 102
Rank: KGS 3k
KGS: LiKao / Loki
usagi wrote:
As for a PIN number, the only really useful thing about that would be a hash of the moves of the game -- maybe. Like a public key -- everyone knows what it would be for any particular game. This would help distinguish amongst games played on the same day, but the PIN itself would not be informative.


MoyoGo Help wrote:
Dyer signature

Shows the Dyer Signature for the current game. Dyer Signatures are a standardized way to identify a Go game by just a few characters and some popular Go websites use them. Moyo Go Studio also shows the "invariant" Dyer Signature, which is the lexicographic lowest Dyer Signature of all mirror/rotational/color-reversed games.

http://www.andromeda.com/people/ddyer/g ... -spec.html

_________________
Sanity is for the weak.

Top
 Profile  
 
Offline
 Post subject: Re: SGF Date Renamer
Post #7 Posted: Wed Nov 10, 2010 3:44 am 
Lives in gote

Posts: 350
Location: London UK
Liked others: 19
Was liked: 19
Rank: EGF 12kyu
DGS: willemien
John Fairbairn wrote:

For reference, the method we use in GoGoD is simply to use as the filename the YYYY-MM-DD date (00 for unknown) followed by a, b, c etc for duplicates. For us this has the merit of keeping the filenames short and easy to fit into presentation table columns. Other collectors appear to have followed this method but that does not mean it is standard - the order of games within each day varies from collection to collection. Also, we are not trying to fix a name in concrete. We very often change a filename as more data becomes available. However, whilst this is not a standard, it could be called a standard approach, and that seems to have some value. When we talk among ourselves or to other database makers, simply quoting just the date is normally enough to ensure very rapid identification.




Because it is the system used by GoGoD it is the de facto standard :D

_________________
Promotor and Librarian of Sensei's Library

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 7 posts ] 

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: Cassandra and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group