It is currently Fri Dec 03, 2021 9:53 pm

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 11 posts ] 
Author Message
Offline
 Post subject: Making a hash of it
Post #1 Posted: Mon Aug 02, 2021 11:20 am 
Oza

Posts: 3086
Liked others: 18
Was liked: 4112
Being interested in linguistic corpora, I was looking at some search algorithms, such as Rabin-Karp. These generally involve hash functions, and that got me thinking about go position searches when I moved on to my next task, something go-related. The only one I know about is Kombilo, and I got to wondering how it works.

Using hashing for chess positions is, if not exactly easy in practical terms, not too hard to follow. The limited board size and the convenient marriage of 64 squares and 64 bits make the hashing principle there reasonably straightforward. I imagine most of the cleverness of the top programmers goes into things like bit magic. I used to work with some on shogi, and it was fascinating to see how they grappled with the upgrade to 9x9 squares. The first question was: has anybody got a 9-bit chip?

But go is not only many times bigger - the task seems quite different. If it were just a question of searching for matches of whole-board positions using a cached, pre-hashed databank, that would not seem specially interesting. But Kombilo (a magnificent beast) allows you to make searches of rectangles of any size within a cache built (as I infer) from 19x19 hashing. I can't get my head round that at all except by imagining hashing and caching every possible rectangle, which is clearly nowhere near feasible. As a reference point, on my machine Kombilo seems to use about 2G on a 110,000-game database that occupies about 15MB. That seems insanely economical.

Can anyone lift the veil on the magic?

Top
 Profile  
 
Offline
 Post subject: Re: Making a hash of it
Post #2 Posted: Mon Aug 02, 2021 12:03 pm 
Lives with ko

Posts: 248
Liked others: 42
Was liked: 109
Rank: 2d
John Fairbairn wrote:
But go is not only many times bigger - the task seems quite different. If it were just a question of searching for matches of whole-board positions using a cached, pre-hashed databank, that would not seem specially interesting. But Kombilo (a magnificent beast) allows you to make searches of rectangles of any size within a cache built (as I infer) from 19x19 hashing. I can't get my head round that at all except by imagining hashing and caching every possible rectangle, which is clearly nowhere near feasible. As a reference point, on my machine Kombilo seems to use about 2G on a 110,000-game database that occupies about 15MB. That seems insanely economical.

Can anyone lift the veil on the magic?

Kombilo (and I've used essentially the same algorithm in q5go) stores two bitmaps per game. These hold the positions where white and black stones appeared at any point during the game. It's not quite the same as the final position, since captured stones are also recorded here. I'm not sure I'd use the term "hashes" for these bitmaps, but I suppose one could.

The search is then a two-pass affair. First, for every game, you try to match the pattern against these bitmaps.

Let's say you're looking for
Code:
XO
OX


and your bitmaps contain, somewhere within the 19x19
Code:
OO  XX
.O  X.

then you know that the pattern could match in this game, at this position, if you mirror it, since white and black stones appeared in the right places at some point during the game. So, you have a candidate match. To make certain, you then have to go through all the moves in the candidate games to see if the required pattern is actually reached. The number of mismatches in the candidate is 4 in an empty starting position, and you add/subtract to that number as moves and captures appear. If it reaches zero, you found a match.

I believe Kombilo also uses something else for each game that's more conventionally a hash value. These are used as (hopefully) unique IDs and IIRC they are used to identify games and print annotations such as "Commentary in Go World 72".

Top
 Profile  
 
Offline
 Post subject: Re: Making a hash of it
Post #3 Posted: Mon Aug 02, 2021 2:40 pm 
Dies with sente

Posts: 74
Liked others: 6
Was liked: 60
Rank: now 1kyu-ish
John - Kombilo uses a Dyer signature to identify each game - that's the position of the six moves 20, 40, 60, 31, 51, 71 in SGF format.

There was some discussion on Sensei's Library about hashing and searching at:

https://senseis.xmp.net/?SearchAlgorithms

You can use the Kombilo database in your own Python scripts to mine your SGF files - worth looking at before you rush into any coding. The PDF in the Kombilo Documentation covers this.

Take Care - John

Top
 Profile  
 
Offline
 Post subject: Re: Making a hash of it
Post #4 Posted: Tue Aug 03, 2021 12:56 am 
Lives in gote

Posts: 680
Liked others: 294
Was liked: 342
bernds wrote:
The search is then a two-pass affair.


RIght; when I looked through the source, I was surprised that it doesn't use a simple hash lookup or some fancy regex but just says "your pattern *could* occur in this game", then still goes through the moves one-by-one. I guess with captures and ko this is unavoidable.

Speaking of kombilo, has anyone found a way to get it to run under Python 3? The developer says it's incompatible with Python 3 but I'm wondering how much effort would be involved to make it compatible. (I'm neither a Python nor a C++ expert.)

Top
 Profile  
 
Offline
 Post subject: Re: Making a hash of it
Post #5 Posted: Tue Aug 03, 2021 1:14 am 
Oza

Posts: 3086
Liked others: 18
Was liked: 4112
Thanks for replies. It seems I made a rash assumption that hashing was involved. I did this partly because Ulrich mentions hashing a lot and even offers a function within the program for corner searches with or without hashing. But furtehr scrutiny does suggest he never exactly says he used hashing (hidden pun not intended!).

I also assumed that doing a hash look-up on cached data was the only way to achieve the sort of speed he gets, though he did once say he used C for a portion of the program to achieve speed.

I'm not looking to code anything myself. After all, I have Kombilo already, and have written a workaround to get over the problem that it crashes with date of 2000 and over. But the SL pages mentioned, supplementing the remarks above, look as if they will give me the "tourist" view I was looking for - though the main Ulrich link on the SL page is now dead :(. Is that because it's 2021?

Top
 Profile  
 
Offline
 Post subject: Re: Making a hash of it
Post #6 Posted: Tue Aug 03, 2021 4:08 am 
Lives with ko

Posts: 248
Liked others: 42
Was liked: 109
Rank: 2d
Marcel Grünauer wrote:
Speaking of kombilo, has anyone found a way to get it to run under Python 3? The developer says it's incompatible with Python 3 but I'm wondering how much effort would be involved to make it compatible. (I'm neither a Python nor a C++ expert.)

I looked at it for a while but failed. I'm not enough of a Python export, and had no idea how the integration of Python and C++ code needed to be dealt with. I found it easier to just reimplement the search algorithm.

Top
 Profile  
 
Offline
 Post subject: Re: Making a hash of it
Post #7 Posted: Tue Aug 03, 2021 11:24 am 
Lives with ko

Posts: 248
Liked others: 42
Was liked: 109
Rank: 2d
John Fairbairn wrote:
Thanks for replies. It seems I made a rash assumption that hashing was involved. I did this partly because Ulrich mentions hashing a lot and even offers a function within the program for corner searches with or without hashing.
Oh yeah, now that you mention it, I kind of remember seeing something like this. But I never looked deeply into it since the general case search seems to be fast enough, especially for corner patterns, so I felt that a special case wasn't needed.

Top
 Profile  
 
Offline
 Post subject: Re: Making a hash of it
Post #8 Posted: Wed Aug 04, 2021 11:17 am 
Lives in gote

Posts: 680
Liked others: 294
Was liked: 342
bernds wrote:
I looked at it for a while but failed. I'm not enough of a Python export, and had no idea how the integration of Python and C++ code needed to be dealt with. I found it easier to just reimplement the search algorithm.


I see that q5Go has a pattern matcher; is there a standalone version like libkombilo that can be used in scripts?

Top
 Profile  
 
Offline
 Post subject: Re: Making a hash of it
Post #9 Posted: Wed Aug 04, 2021 11:46 am 
Lives with ko

Posts: 248
Liked others: 42
Was liked: 109
Rank: 2d
Marcel Grünauer wrote:
bernds wrote:
I looked at it for a while but failed. I'm not enough of a Python export, and had no idea how the integration of Python and C++ code needed to be dealt with. I found it easier to just reimplement the search algorithm.


I see that q5Go has a pattern matcher; is there a standalone version like libkombilo that can be used in scripts?

No. What exactly would you need?

Top
 Profile  
 
Offline
 Post subject: Re: Making a hash of it
Post #10 Posted: Wed Aug 04, 2021 12:24 pm 
Dies with sente

Posts: 83
Liked others: 5
Was liked: 29
Marcel Grünauer wrote:
I see that q5Go has a pattern matcher; is there a standalone version like libkombilo that can be used in scripts?

sgfutils has sgfinfo that can select games from a game collection given some partial metainformation or some moves or some board pattern. See sgfinfo.html. It can be used in scripts. It was written for a Linux environment.

Top
 Profile  
 
Offline
 Post subject: Re: Making a hash of it
Post #11 Posted: Wed Aug 04, 2021 2:30 pm 
Lives in gote

Posts: 680
Liked others: 294
Was liked: 342
bernds wrote:
No. What exactly would you need?


See http://gogamespace.com/training-module/ . I want to group problems by shape.

This is how I generate the site contents: I have a large(ish) collection of SGF files, often with many variations. They contain both games and problems.

Then I have a list of pattern definitions. For each pattern I need to find out in which trees and where exactly they occur.

For example, for the "dog face" pattern, I have

Code:
    '''
    *...*
    *.1.*
    .....
    .X.X.
    *...*
    '''


and am anchoring this particular pattern at

Code:
[ CENTER_PATTERN, SIDE_N_PATTERN, SIDE_E_PATTERN, SIDE_W_PATTERN, SIDE_S_PATTERN ]


These constants are from kombilo.

I'm creating a kombilo DB on all the SGF files, then geneate a report where each pattern occurred ((filename, tree path, pattern name). Using this report I then generate the site.


aeb wrote:
sgfutils has sgfinfo that can select games from a game collection given some partial metainformation or some moves or some board pattern. See sgfinfo.html. It can be used in scripts. It was written for a Linux environment.


Thank you; I'll have a look whether this can do what I described above.

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 11 posts ] 

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group