L19 Programming Problem Championship: Round 2 (Strings)

All non-Go discussions should go here.
User avatar
Solomon
Gosei
Posts: 1848
Joined: Tue Apr 20, 2010 9:21 pm
Rank: AGA 5d
GD Posts: 0
KGS: Capsule 4d
Tygem: 치킨까스 5d
Location: Bellevue, WA
Has thanked: 90 times
Been thanked: 835 times

Re: L19 Programming Problem Championship: Round 2

Post by Solomon »

peti29 wrote:Solomon: thank you for the detailed explanation! I almost get it now. Only one thing remains:
How is it guaranteed that we'll use a piece only once?
Say our test case is: '(', '(', ')', ')' then our positive memo table will be [0, 1, 2] and our negative memo table the same,
so when we compare memo table entries we will add 4 for i==2 which is correct, but then we'll add 2 for i==1 resulting in a sum of 6 which is incorrect. :scratch:
Once we've calculated the memo tables, the dynamic programming step is done! As we're stepping through the memo tables and find cases where there are positive values in the same index across the tables, we're just trying to find the max (and we're going through them in regular order). So at i=1, we see values 1 and 1 across the tables, add them, and set that as our current max. Then at i=2, we see values 2 and 2 across the tables, add them, see that this value is higher, and set that as our new max, which is our answer.
peti29
Lives with ko
Posts: 125
Joined: Mon Feb 17, 2014 2:41 am
Rank: KGS 5 kyu
GD Posts: 0
Has thanked: 13 times
Been thanked: 12 times

Re: L19 Programming Problem Championship: Round 2

Post by peti29 »

Solomon: Thank you! I think I understand now.
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: L19 Programming Problem Championship: Round 2

Post by Bill Spight »

One virtue, which I did not mention, of reducing the maximum n in the Power Strings problem from the length of the string to a smaller n(max) in the manner used is that you only have to find the divisors of n(max) instead of the string length.

Let me illustrate with a string length of 60, which has several divisors. In the worst case we might have to try substring lengths of 1, 2, 3, 4, 5, 6, 10, 12, 15, 20, and 30. But suppose that we are able to reduce n(max) to 15. Then our possible n's are 15, 5, 3, and 1, which means that we might only have to try substring lengths of 4, 12, and 20, i.e., divisors of 60 which are multiples of 4. OC, a maximum n of 15 is a bit lucky. 12 is perhaps more likely. Then we might only have to try substring lengths of 5, 10, 15, 20, and 30. :)
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
bernds
Lives with ko
Posts: 259
Joined: Sun Apr 30, 2017 11:18 pm
Rank: 2d
GD Posts: 0
Has thanked: 46 times
Been thanked: 116 times

Re: L19 Programming Problem Championship: Round 2

Post by bernds »

Solomon wrote:
Indeed, but when I tried something far simpler like:

Code: Select all

while True:
    s = input()
    if s == '.':
        break
    print(len(s) // (s + s).find(s, 1))
It would pass the first 3 test cases, before hitting TLE on the fourth (I wrote the C++ equivalent, and found it would hit TLE even sooner, at the second test case).
Well, you're not in control of whether the library uses a clever algorithm. I had a look at libstdc++ and it looks like it doesn't (there's a mention of KMP only for a parallel version string search, the default seems to be the naive loop).
I've built a KMP version with a slightly different approach. I think you can just compute the solution while building the kmp prefix table: whenever the length of the prefix is equal to the length of the original string, you've found S within S+S. The first time this occurs, you can stop searching. As an additional optimization you can avoid the string copy by just noticing when your index goes past the length and wrapping it. I implemented this without reference to your program, just looking at general KMP algorithm descriptions elsewhere, and I suspect that this version would be good for spot 2 on the performance leaderboard (assuming it really works). Would you object if I submitted it?
I suspect Bill's method could very slightly improve runtimes further in some cases, but the solution is already linear and that obviously won't change, so I imagine the improvement would be very limited.
User avatar
Solomon
Gosei
Posts: 1848
Joined: Tue Apr 20, 2010 9:21 pm
Rank: AGA 5d
GD Posts: 0
KGS: Capsule 4d
Tygem: 치킨까스 5d
Location: Bellevue, WA
Has thanked: 90 times
Been thanked: 835 times

Re: L19 Programming Problem Championship: Round 2

Post by Solomon »

bernds wrote:
Solomon wrote:
Indeed, but when I tried something far simpler like:

Code: Select all

while True:
    s = input()
    if s == '.':
        break
    print(len(s) // (s + s).find(s, 1))
It would pass the first 3 test cases, before hitting TLE on the fourth (I wrote the C++ equivalent, and found it would hit TLE even sooner, at the second test case).
Well, you're not in control of whether the library uses a clever algorithm. I had a look at libstdc++ and it looks like it doesn't (there's a mention of KMP only for a parallel version string search, the default seems to be the naive loop).
I've built a KMP version with a slightly different approach. I think you can just compute the solution while building the kmp prefix table: whenever the length of the prefix is equal to the length of the original string, you've found S within S+S. The first time this occurs, you can stop searching. As an additional optimization you can avoid the string copy by just noticing when your index goes past the length and wrapping it. I implemented this without reference to your program, just looking at general KMP algorithm descriptions elsewhere, and I suspect that this version would be good for spot 2 on the performance leaderboard (assuming it really works). Would you object if I submitted it?
I suspect Bill's method could very slightly improve runtimes further in some cases, but the solution is already linear and that obviously won't change, so I imagine the improvement would be very limited.
Go for it! I'm curious :).
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: L19 Programming Problem Championship: Round 2

Post by Bill Spight »

Solomon wrote:
bernds wrote:
Solomon wrote:
Indeed, but when I tried something far simpler like:

Code: Select all

while True:
    s = input()
    if s == '.':
        break
    print(len(s) // (s + s).find(s, 1))
It would pass the first 3 test cases, before hitting TLE on the fourth (I wrote the C++ equivalent, and found it would hit TLE even sooner, at the second test case).
Well, you're not in control of whether the library uses a clever algorithm. I had a look at libstdc++ and it looks like it doesn't (there's a mention of KMP only for a parallel version string search, the default seems to be the naive loop).
I've built a KMP version with a slightly different approach. I think you can just compute the solution while building the kmp prefix table: whenever the length of the prefix is equal to the length of the original string, you've found S within S+S. The first time this occurs, you can stop searching. As an additional optimization you can avoid the string copy by just noticing when your index goes past the length and wrapping it. I implemented this without reference to your program, just looking at general KMP algorithm descriptions elsewhere, and I suspect that this version would be good for spot 2 on the performance leaderboard (assuming it really works). Would you object if I submitted it?
I suspect Bill's method could very slightly improve runtimes further in some cases, but the solution is already linear and that obviously won't change, so I imagine the improvement would be very limited.
Go for it! I'm curious :).
My aim is to find n without doing any substring matches. (OC, you have to check it. ;) Unless it is 1, which I doubt.) Suppose that n is odd. Then the probability of any character count being even is 0.5, assuming independence, which is a maybe. The probability of all 26 character counts being even is 0.5^26. Well, there are 223 primes less than √2,000,000, so there is a chance that one of them will divide all of the character counts but not n. However, throw in 676 possible digrams and I think that the chances of finding n via the GCD are damn good. :)
2d or 3d edit:

In fact, I would be willing to submit a solution that calculates n just from the single character counts, with no string matching at all. Not even to check the result. :shock: I may well do that. As Kirby says, not perfect, but quick. ;)
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
bernds
Lives with ko
Posts: 259
Joined: Sun Apr 30, 2017 11:18 pm
Rank: 2d
GD Posts: 0
Has thanked: 46 times
Been thanked: 116 times

Re: L19 Programming Problem Championship: Round 2

Post by bernds »

Bill Spight wrote:In fact, I would be willing to submit a solution that calculates n just from the single character counts, with no string matching at all. Not even to check the result. :shock: I may well do that. As Kirby says, not perfect, but quick. ;)
Not going to work, since you can't distinguish abcdabcd from abcddcba.

My new attempt also got WA on one of the hidden testcases, so I'll have to do a bit of debugging.
Bill Spight
Honinbo
Posts: 10905
Joined: Wed Apr 21, 2010 1:24 pm
Has thanked: 3651 times
Been thanked: 3373 times

Re: L19 Programming Problem Championship: Round 2

Post by Bill Spight »

bernds wrote:
Bill Spight wrote:In fact, I would be willing to submit a solution that calculates n just from the single character counts, with no string matching at all. Not even to check the result. :shock: I may well do that. As Kirby says, not perfect, but quick. ;)
Not going to work, since you can't distinguish abcdabcd from abcddcba.

My new attempt also got WA on one of the hidden testcases, so I'll have to do a bit of debugging.
Yeah, well, that depends upon the psychology of the testers. ;)

But considering the expectation that you inform them if you yourself find a bug, then knowingly submitting a buggy program probably violates the spirit of the competition. :)
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.
bernds
Lives with ko
Posts: 259
Joined: Sun Apr 30, 2017 11:18 pm
Rank: 2d
GD Posts: 0
Has thanked: 46 times
Been thanked: 116 times

Re: L19 Programming Problem Championship: Round 2

Post by bernds »

Solomon wrote:
Go for it! I'm curious :).
Oddly, even after throwing a few more optimizations at it, Kattis reports a longer run-time than it does for my earlier version that does factorization. I'm somewhat surprised: it's got a linear algorithm, I don't have to precompute a big table of prime numbers for every invocation, it should be faster - right? Maybe the extra KMP table blows out the cache; there may be a lesson here.

Code: Select all

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <sys/time.h>

#define MAXLEN    2000001

static char line[MAXLEN];

static int kmp_next[MAXLEN*2+1];

static size_t kmp_build (char *str)
{
  if (str[1] == '\0')
    return 1;
  kmp_next[0] = -1;
  kmp_next[1] = 0;
  size_t cmp_pos = 0;
  size_t i = 1;
  size_t n = -1;
  size_t result = 1;
  for (;;)
    {
      if (str[i] == '\0')
        {
          n = i;
          break;
        }
      if (str[cmp_pos] == str[i])
        {
          kmp_next[i + 1] = cmp_pos + 1;
          cmp_pos++;
          i++;
        }
      else if (cmp_pos > 0)
        {
          do
            {
              cmp_pos = kmp_next[cmp_pos];
            }
          while (cmp_pos > 0 && str[cmp_pos] != str[i]);
        }
      else
        {
          kmp_next[i + 1] = 0;
          cmp_pos = 0;
          i++;
        }
    }
  for (i = 0; i <= n / 2;)
    {
      if (str[cmp_pos] == str[i])
        {
          kmp_next[i + 1] = cmp_pos + 1;
          cmp_pos++;
          i++;
        }
      else if (cmp_pos > 0)
        {
          do
            {
              cmp_pos = kmp_next[cmp_pos];
            }
          while (cmp_pos > 0 && str[cmp_pos] != str[i]);
        }
      else
        {
          kmp_next[i + 1] = 0;
          cmp_pos = 0;
          i++;
        }
      if (cmp_pos == n)
        {
          result = n / i;
          break;
        }
    }
  return result;
}

int main (int argc, char **argv)
{
  while (!feof (stdin))
    {
      gets (line);
      if (strcmp (line, ".") == 0)
        break;
      printf ("%ld\n", kmp_build (line));
    }
  return 0;
}
Post Reply