It is currently Fri Apr 19, 2024 12:10 pm

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 54 posts ]  Go to page Previous  1, 2, 3
Author Message
Offline
 Post subject: Re: L19 Programming Problem Championship: Round 2
Post #41 Posted: Mon May 08, 2017 4:26 pm 
Lives with ko

Posts: 125
Liked others: 13
Was liked: 12
Rank: KGS 5 kyu
Solomon wrote:
Problem C
That's pretty much the hard part, the DP is simpler:
Code:
memo[i + chunk.balance] = max(memo[i + chunk.balance], memo[i] + chunk.size)
I still don't understand :(
I'm not too familiar with DP. What does 'i' symbolize? What does the above line mean? Why is the range ~= max_number_of_strings * max_length_of_strings?
I briefly thought about some kind of sorting but then I didn't know how to handle when certain items needed to be left out.

Top
 Profile  
 
Offline
 Post subject: Re: L19 Programming Problem Championship: Round 2
Post #42 Posted: Mon May 08, 2017 6:28 pm 
Gosei
User avatar

Posts: 1848
Location: Bellevue, WA
Liked others: 90
Was liked: 837
Rank: AGA 5d
KGS: Capsule 4d
Tygem: 치킨까스 5d
peti29 wrote:
Solomon wrote:
Problem C
That's pretty much the hard part, the DP is simpler:
Code:
memo[i + chunk.balance] = max(memo[i + chunk.balance], memo[i] + chunk.size)
I still don't understand :(
I'm not too familiar with DP. What does 'i' symbolize? What does the above line mean? Why is the range ~= max_number_of_strings * max_length_of_strings?
I briefly thought about some kind of sorting but then I didn't know how to handle when certain items needed to be left out.

The i is the loop variable. I think the best way to show what this memo (memoization table) does is with an example:

Let's say the input we're given are the following:

(((...( [250 ('s, so the balance is +250, and the size is 250]
(((...()...) [200 ('s, followed by 50 )'s, balance +150, size 250]
(((...( [150 ('s, balance +150, size 150]


These are in sorted order as well, and for now to keep things simple we are not considering offset parentheses chunks (so all 3 of the chunks above have an offset value of 0). We have nothing right now, and for our first chunk, memo[chunk.balance] == memo[250] == 0, so we go ahead and add the first chunk to the memo table (memo[250] = 250):
Code:
memo: [ 0, 0, 0, ..., 250,  0,   0,   0, ... ]
        0  1  2       250  251  252  253

What this tells us is that the best choice (based on size) for any combination of chunks (here we just have 1 chunk) that has a balance of +250 has a size of 250. Right now we're just dealing with the positive memo table; if our negative memo table also had a value greater than 0 in index 250, then the answer would be that value + 250 (e.g., if we had processed ()))...) [1 ( and 251 )'s, or a balance of -250 and size 252], the negative memo table would have a value of 252 in index 250, then our answer would be 250 + 252 = 502).

Anyways, so the next chunk we have is (((...()...) [200 ('s, followed by 50 )'s, balance +150, size 250]. We iterate through the memo table from the end and any time we find a value > 0 at index i, we check to see if it's worth adding this chunk or not. That's this part of the code:
Code:
memo[i + chunk.balance] = max(
    memo[i + chunk.balance], // don't bother adding the chunk
    memo[i] + chunk.size)    // add the chunk
)

In this case we definitely should, because memo[250 + 150] == memo[400] == 0. And like the first time, we check memo[chunk.balance] as well (which also == 0, so we update that too) so now our memo table looks like this:
Code:
memo: [ 0, 0, 0, ..., 250, ..., 250, ..., 500, ...]
        0  1  2       150       250       400

You can start to see why the memo table needs to be as large as 300 * 300 now, right? We're only 2 chunks in and we're already up to index 400 (could have gone up to 600 at this point, or 300 * 2, and we can have up to 300 chunks).

Now for the last chunk (((...( [150 ('s, balance +150, size 150]. First index we hit since we're going from the end is i = 400. memo[i + chunk.balance] == memo[400 + 150] == memo[550] == 0, so that's updated to 500 + 150 = 650. But now, when we hit the next index i = 250, we see that memo[i + chunk.balance] == memo[250 + 150] == memo[400] == 500. In this case,
Code:
max(memo[i + chunk.balance], memo[i] + chunk.size)) == max(500, 250 + 150) == max(500, 400) == 500

So this time we don't use the chunk. What does this mean? For sake of simplicity, let's say the only chunks we have that are negatively balanced are ))...) [200 )'s] and ))...) [200 )'s again], so we know the negative memo table's value at index 400 is 400. Look again at the choices we have for our positively balanced chunks. Clearly, it is better to utilize only the first two chunks (for a total size of 500 + 400 = 900) and not bother with the third (for a total size of (250 + 150) + 400 = 800 < 900), and that's what this memo table is tracking for us. So at index 400, we keep what we have, and the same goes for index 150 (why use the 3rd chunk when the 2nd chunk has the same positive balance but greater size?).

Sorry this got so long-winded, but that's essentially what this memoized table is doing for us. It's "standard" DP in the sense that this is almost exactly the technique used in solving problems like the Coin Change problem, a classical DP problem.


This post by Solomon was liked by: jeromie
Top
 Profile  
 
Offline
 Post subject: Re: L19 Programming Problem Championship: Round 2
Post #43 Posted: Mon May 08, 2017 6:42 pm 
Gosei
User avatar

Posts: 1848
Location: Bellevue, WA
Liked others: 90
Was liked: 837
Rank: AGA 5d
KGS: Capsule 4d
Tygem: 치킨까스 5d
bernds wrote:
Solomon wrote:
Problem B
The way I approached this problem was to utilize the preprocessing function from the KMP algorithm (call it kmp_pre).
This is just equivalent to searching for S withing S+S (and ignoring the first match), right? I get the feeling this is how it was intended to be solved. I just went with the factorization. The interesting thing is that your solution looks like it should be more efficient, but runs slightly slower in practice in my tests. This is something I'll probably spend time on to try to understand - is C++ string handling that inefficient?

Indeed, but when I tried something far simpler like:
Code:
while True:
    s = input()
    if s == '.':
        break
    print(len(s) // (s + s).find(s, 1))
It would pass the first 3 test cases, before hitting TLE on the fourth (I wrote the C++ equivalent, and found it would hit TLE even sooner, at the second test case).

Top
 Profile  
 
Offline
 Post subject: Re: L19 Programming Problem Championship: Round 2
Post #44 Posted: Mon May 08, 2017 8:12 pm 
Gosei
User avatar

Posts: 1848
Location: Bellevue, WA
Liked others: 90
Was liked: 837
Rank: AGA 5d
KGS: Capsule 4d
Tygem: 치킨까스 5d
tj86430 wrote:
BTW, is it possible to submit solutions after the round has closed and to get feedback whether the solution would have been accepted? There are some problems I'd like to try, but I don't usually (read: ever) have time to do those within set timeframe. I might try during e.g. summer vacation, though.
Absolutely! Just go to the contest link and open the problems up, you should still be able to submit (assuming you registered an account).

Top
 Profile  
 
Offline
 Post subject: Re: L19 Programming Problem Championship: Round 2
Post #45 Posted: Tue May 09, 2017 1:14 am 
Lives with ko

Posts: 125
Liked others: 13
Was liked: 12
Rank: KGS 5 kyu
Solomon: thank you for the detailed explanation! I almost get it now. Only one thing remains:
How is it guaranteed that we'll use a piece only once?
Say our test case is: '(', '(', ')', ')' then our positive memo table will be [0, 1, 2] and our negative memo table the same,
so when we compare memo table entries we will add 4 for i==2 which is correct, but then we'll add 2 for i==1 resulting in a sum of 6 which is incorrect. :scratch:

Top
 Profile  
 
Offline
 Post subject: Re: L19 Programming Problem Championship: Round 2
Post #46 Posted: Tue May 09, 2017 3:31 am 
Gosei
User avatar

Posts: 1848
Location: Bellevue, WA
Liked others: 90
Was liked: 837
Rank: AGA 5d
KGS: Capsule 4d
Tygem: 치킨까스 5d
peti29 wrote:
Solomon: thank you for the detailed explanation! I almost get it now. Only one thing remains:
How is it guaranteed that we'll use a piece only once?
Say our test case is: '(', '(', ')', ')' then our positive memo table will be [0, 1, 2] and our negative memo table the same,
so when we compare memo table entries we will add 4 for i==2 which is correct, but then we'll add 2 for i==1 resulting in a sum of 6 which is incorrect. :scratch:
Once we've calculated the memo tables, the dynamic programming step is done! As we're stepping through the memo tables and find cases where there are positive values in the same index across the tables, we're just trying to find the max (and we're going through them in regular order). So at i=1, we see values 1 and 1 across the tables, add them, and set that as our current max. Then at i=2, we see values 2 and 2 across the tables, add them, see that this value is higher, and set that as our new max, which is our answer.


This post by Solomon was liked by: peti29
Top
 Profile  
 
Offline
 Post subject: Re: L19 Programming Problem Championship: Round 2
Post #47 Posted: Tue May 09, 2017 6:19 am 
Lives with ko

Posts: 125
Liked others: 13
Was liked: 12
Rank: KGS 5 kyu
Solomon: Thank you! I think I understand now.

Top
 Profile  
 
Offline
 Post subject: Re: L19 Programming Problem Championship: Round 2
Post #48 Posted: Tue May 09, 2017 7:07 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
One virtue, which I did not mention, of reducing the maximum n in the Power Strings problem from the length of the string to a smaller n(max) in the manner used is that you only have to find the divisors of n(max) instead of the string length.

Let me illustrate with a string length of 60, which has several divisors. In the worst case we might have to try substring lengths of 1, 2, 3, 4, 5, 6, 10, 12, 15, 20, and 30. But suppose that we are able to reduce n(max) to 15. Then our possible n's are 15, 5, 3, and 1, which means that we might only have to try substring lengths of 4, 12, and 20, i.e., divisors of 60 which are multiples of 4. OC, a maximum n of 15 is a bit lucky. 12 is perhaps more likely. Then we might only have to try substring lengths of 5, 10, 15, 20, and 30. :)

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: L19 Programming Problem Championship: Round 2
Post #49 Posted: Tue May 09, 2017 9:31 am 
Lives with ko

Posts: 259
Liked others: 46
Was liked: 116
Rank: 2d
Solomon wrote:
Indeed, but when I tried something far simpler like:
Code:
while True:
    s = input()
    if s == '.':
        break
    print(len(s) // (s + s).find(s, 1))
It would pass the first 3 test cases, before hitting TLE on the fourth (I wrote the C++ equivalent, and found it would hit TLE even sooner, at the second test case).


Well, you're not in control of whether the library uses a clever algorithm. I had a look at libstdc++ and it looks like it doesn't (there's a mention of KMP only for a parallel version string search, the default seems to be the naive loop).
I've built a KMP version with a slightly different approach. I think you can just compute the solution while building the kmp prefix table: whenever the length of the prefix is equal to the length of the original string, you've found S within S+S. The first time this occurs, you can stop searching. As an additional optimization you can avoid the string copy by just noticing when your index goes past the length and wrapping it. I implemented this without reference to your program, just looking at general KMP algorithm descriptions elsewhere, and I suspect that this version would be good for spot 2 on the performance leaderboard (assuming it really works). Would you object if I submitted it?
I suspect Bill's method could very slightly improve runtimes further in some cases, but the solution is already linear and that obviously won't change, so I imagine the improvement would be very limited.

Top
 Profile  
 
Offline
 Post subject: Re: L19 Programming Problem Championship: Round 2
Post #50 Posted: Tue May 09, 2017 9:33 am 
Gosei
User avatar

Posts: 1848
Location: Bellevue, WA
Liked others: 90
Was liked: 837
Rank: AGA 5d
KGS: Capsule 4d
Tygem: 치킨까스 5d
bernds wrote:
Solomon wrote:
Indeed, but when I tried something far simpler like:
Code:
while True:
    s = input()
    if s == '.':
        break
    print(len(s) // (s + s).find(s, 1))
It would pass the first 3 test cases, before hitting TLE on the fourth (I wrote the C++ equivalent, and found it would hit TLE even sooner, at the second test case).


Well, you're not in control of whether the library uses a clever algorithm. I had a look at libstdc++ and it looks like it doesn't (there's a mention of KMP only for a parallel version string search, the default seems to be the naive loop).
I've built a KMP version with a slightly different approach. I think you can just compute the solution while building the kmp prefix table: whenever the length of the prefix is equal to the length of the original string, you've found S within S+S. The first time this occurs, you can stop searching. As an additional optimization you can avoid the string copy by just noticing when your index goes past the length and wrapping it. I implemented this without reference to your program, just looking at general KMP algorithm descriptions elsewhere, and I suspect that this version would be good for spot 2 on the performance leaderboard (assuming it really works). Would you object if I submitted it?
I suspect Bill's method could very slightly improve runtimes further in some cases, but the solution is already linear and that obviously won't change, so I imagine the improvement would be very limited.
Go for it! I'm curious :).

Top
 Profile  
 
Offline
 Post subject: Re: L19 Programming Problem Championship: Round 2
Post #51 Posted: Tue May 09, 2017 11:03 am 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
Solomon wrote:
bernds wrote:
Solomon wrote:
Indeed, but when I tried something far simpler like:
Code:
while True:
    s = input()
    if s == '.':
        break
    print(len(s) // (s + s).find(s, 1))
It would pass the first 3 test cases, before hitting TLE on the fourth (I wrote the C++ equivalent, and found it would hit TLE even sooner, at the second test case).


Well, you're not in control of whether the library uses a clever algorithm. I had a look at libstdc++ and it looks like it doesn't (there's a mention of KMP only for a parallel version string search, the default seems to be the naive loop).
I've built a KMP version with a slightly different approach. I think you can just compute the solution while building the kmp prefix table: whenever the length of the prefix is equal to the length of the original string, you've found S within S+S. The first time this occurs, you can stop searching. As an additional optimization you can avoid the string copy by just noticing when your index goes past the length and wrapping it. I implemented this without reference to your program, just looking at general KMP algorithm descriptions elsewhere, and I suspect that this version would be good for spot 2 on the performance leaderboard (assuming it really works). Would you object if I submitted it?
I suspect Bill's method could very slightly improve runtimes further in some cases, but the solution is already linear and that obviously won't change, so I imagine the improvement would be very limited.
Go for it! I'm curious :).


My aim is to find n without doing any substring matches. (OC, you have to check it. ;) Unless it is 1, which I doubt.) Suppose that n is odd. Then the probability of any character count being even is 0.5, assuming independence, which is a maybe. The probability of all 26 character counts being even is 0.5^26. Well, there are 223 primes less than √2,000,000, so there is a chance that one of them will divide all of the character counts but not n. However, throw in 676 possible digrams and I think that the chances of finding n via the GCD are damn good. :)


2d or 3d edit:

In fact, I would be willing to submit a solution that calculates n just from the single character counts, with no string matching at all. Not even to check the result. :shock: I may well do that. As Kirby says, not perfect, but quick. ;)

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: L19 Programming Problem Championship: Round 2
Post #52 Posted: Tue May 09, 2017 1:06 pm 
Lives with ko

Posts: 259
Liked others: 46
Was liked: 116
Rank: 2d
Bill Spight wrote:
In fact, I would be willing to submit a solution that calculates n just from the single character counts, with no string matching at all. Not even to check the result. :shock: I may well do that. As Kirby says, not perfect, but quick. ;)


Not going to work, since you can't distinguish abcdabcd from abcddcba.

My new attempt also got WA on one of the hidden testcases, so I'll have to do a bit of debugging.

Top
 Profile  
 
Offline
 Post subject: Re: L19 Programming Problem Championship: Round 2
Post #53 Posted: Tue May 09, 2017 1:12 pm 
Honinbo

Posts: 10905
Liked others: 3651
Was liked: 3374
bernds wrote:
Bill Spight wrote:
In fact, I would be willing to submit a solution that calculates n just from the single character counts, with no string matching at all. Not even to check the result. :shock: I may well do that. As Kirby says, not perfect, but quick. ;)


Not going to work, since you can't distinguish abcdabcd from abcddcba.

My new attempt also got WA on one of the hidden testcases, so I'll have to do a bit of debugging.


Yeah, well, that depends upon the psychology of the testers. ;)

But considering the expectation that you inform them if you yourself find a bug, then knowingly submitting a buggy program probably violates the spirit of the competition. :)

_________________
The Adkins Principle:
At some point, doesn't thinking have to go on?
— Winona Adkins

Visualize whirled peas.

Everything with love. Stay safe.

Top
 Profile  
 
Offline
 Post subject: Re: L19 Programming Problem Championship: Round 2
Post #54 Posted: Tue May 09, 2017 2:05 pm 
Lives with ko

Posts: 259
Liked others: 46
Was liked: 116
Rank: 2d
Solomon wrote:
Go for it! I'm curious :).
Oddly, even after throwing a few more optimizations at it, Kattis reports a longer run-time than it does for my earlier version that does factorization. I'm somewhat surprised: it's got a linear algorithm, I don't have to precompute a big table of prime numbers for every invocation, it should be faster - right? Maybe the extra KMP table blows out the cache; there may be a lesson here.


Code:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <sys/time.h>

#define MAXLEN    2000001

static char line[MAXLEN];

static int kmp_next[MAXLEN*2+1];

static size_t kmp_build (char *str)
{
  if (str[1] == '\0')
    return 1;
  kmp_next[0] = -1;
  kmp_next[1] = 0;
  size_t cmp_pos = 0;
  size_t i = 1;
  size_t n = -1;
  size_t result = 1;
  for (;;)
    {
      if (str[i] == '\0')
        {
          n = i;
          break;
        }
      if (str[cmp_pos] == str[i])
        {
          kmp_next[i + 1] = cmp_pos + 1;
          cmp_pos++;
          i++;
        }
      else if (cmp_pos > 0)
        {
          do
            {
              cmp_pos = kmp_next[cmp_pos];
            }
          while (cmp_pos > 0 && str[cmp_pos] != str[i]);
        }
      else
        {
          kmp_next[i + 1] = 0;
          cmp_pos = 0;
          i++;
        }
    }
  for (i = 0; i <= n / 2;)
    {
      if (str[cmp_pos] == str[i])
        {
          kmp_next[i + 1] = cmp_pos + 1;
          cmp_pos++;
          i++;
        }
      else if (cmp_pos > 0)
        {
          do
            {
              cmp_pos = kmp_next[cmp_pos];
            }
          while (cmp_pos > 0 && str[cmp_pos] != str[i]);
        }
      else
        {
          kmp_next[i + 1] = 0;
          cmp_pos = 0;
          i++;
        }
      if (cmp_pos == n)
        {
          result = n / i;
          break;
        }
    }
  return result;
}

int main (int argc, char **argv)
{
  while (!feof (stdin))
    {
      gets (line);
      if (strcmp (line, ".") == 0)
        break;
      printf ("%ld\n", kmp_build (line));
    }
  return 0;
}

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 54 posts ]  Go to page Previous  1, 2, 3

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group