nishu271's blog

By nishu271, 12 years ago, In English

Hi all,

I was trying to solve http://www.spoj.com/problems/MCHAOS/. I think my solution is correct with the given constraints. I got TLE with BIT and STL maps initially in the 10th test case. Then I tried to optimize by converting strings into longs (preserving their order) and sorting long long[] instead of vector<string>. This gave WA. I believe there is some problem in the hash() function but I was not able to find it. Can someone help or suggest some other optimization?

Here are the two versions of the code...

#include <stdio.h>
#include <string>
#include <map>
#include <string.h>
#include <iostream>
#include <algorithm>
#include <vector>

using namespace std;

typedef long long int ll;

const int NMAX = 100000+5;
ll BIT[NMAX];
int LIM = 0;
map<ll, int> dict;
char vs[NMAX][11];
vector<string> vecs, seq;
ll read (int v) {
    ll sum = 0;
    while (v) {
        sum += BIT[v];
        v -= v &-v;
    }
    return sum;
}

void update (int v, int val) {
    
    while (v <= LIM) {
        BIT[v] += val;
        v += v & -v;
    }
}

string reverse(string s) {
    char c;
    int l = s.length();
    for (int i=0; i <= (l-1)/2; i++) {
        c = s[i];
        s[i] = s[l-i-1];
        s[l-i-1] = c;
    }
    return s;
}
ll hash(string s){   
    ll sum = 0;
    for (int i = 0; i < s.length(); i++) sum = sum * 26 + s[i]-26 + 1;
    for (int i = s.length(); i <= 10; i++) sum *= 26;
    return sum;
}
int main () {
    int N;
    memset(BIT, 0, sizeof(BIT));
    memset(vs, 0, sizeof(vs));
    cin >> N;
    string s;
    for (int i=0; i < N; i++) scanf("%s", vs[i]);
    for (int i=0; i < N; i++) vecs.push_back(vs[i]);
    sort(vecs.begin(), vecs.end());
    for (int i = 0; i < N; i++) {vecs[i] = reverse(vecs[i]);seq.push_back(vecs[i]);}
    sort(seq.begin(), seq.end());
       
    for (int i = 0; i < N; i++) dict[hash(seq[i])] = i+1;
    LIM = N+1;
    ll bad_pairs = 0;
    for (int i = N-1; i >= 0; i--) {
        bad_pairs += read(dict[hash(vecs[i])]);
        update(dict[hash(vecs[i])], 1);
    }
    cout << bad_pairs << endl;
    return 0;
}

After the optimization ...

#include <stdio.h>
#include <string>
#include <map>
#include <string.h>
#include <iostream>
#include <algorithm>
#include <vector>

using namespace std;

typedef unsigned long long int ull;

const int NMAX = 100000+5;
ull BIT[NMAX];
int LIM = 0;
map<ull, int> dict,d2;
char vs[NMAX][12], sq[NMAX][12];
ull vecs[NMAX], seq[NMAX];
ull read (int v) {
    ull sum = 0;
    while (v) {
        sum += BIT[v];
        v -= (v &-v);
    }
    return sum;
}

void update (int v, int val) {
    
    while (v <= LIM) {
        BIT[v] += val;
        v += (v & -v);
    }
}

void reverse(char *s) {
    char c;
    int l = strlen(s);
    for (int i=0; i <= l/2-1; i++) {
        c = s[i];
        s[i] = s[l-i-1];
        s[l-i-1] = c;
    }
}
ull hash(char *s){   
    ull sum = 0;
    for (int i = 0; i < strlen(s); i++) sum = sum * 26 + s[i]-'a'+1;
    for (int i = strlen(s); i < 10; i++) sum *= 26;
    return sum;
}
int main () {
    int N;
    memset(BIT, 0, sizeof(BIT));
    memset(vecs, 0, sizeof(vecs));
    memset(seq, 0, sizeof(seq));
    cin >> N;
    string s;
    for (int i=0; i < N; i++) scanf("%s", vs[i]);
    for (int i=0; i < N; i++) {vecs[i] = hash(vs[i]);d2[vecs[i]] = i;}
    sort(vecs, vecs+N);
    for (int i = 0; i < N; i++) {
        strcpy(sq[i],vs[d2[vecs[i]]]);
        reverse(sq[i]);
        seq[i] = hash(sq[i]);
    }
    sort(seq, seq+N);
    for (int i = 0; i < N; i++) dict[seq[i]] = i+1;
    LIM = N+1;
    ull bad_pairs = 0;
    for (int i = N-1; i >= 0; i--) {
        bad_pairs += read(dict[hash(sq[i])]);
        update(dict[hash(sq[i])], 1);
    }
    cout << bad_pairs << endl;
    return 0;
}
  • Vote: I like it
  • 0
  • Vote: I do not like it

| Write comment?
»
12 years ago, # |
Rev. 2   Vote: I like it +3 Vote: I do not like it

This problem & code is too tough for me to think and understand :( But if you think you have problem in hashFunction, then try this :

typedef unsigned long long int64u;
// always use a prime number :)
const int64u hashBase = 31;

int64u hashFunction (char *str) {
    int64u hashValue = 0;
    for (int i = 0; str [i]; ++i)
        hashValue = hashValue * hashBase + int64u (str [i] - 'a' + 1);
    return hashValue;
}
  • »
    »
    12 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Thanks a lot!!!Finally AC. But I cannot understand why a prime number is required. Lets say s[i] and s[j] denote two strings and l[i] and l[j] their corresponding long conversions.

    Then, if s[i] < s[j] then l[i] < l[j].

    I can't see it being violated as long as the hash_base >= 26 here.

    • »
      »
      »
      12 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      well, I tried with 27 and it worked. So, I guess with 26,there are strings for which the constraint mentioned in my previous comment doesnot hold. I guess as long as hash_base > 26, it should work just fine.

      • »
        »
        »
        »
        12 years ago, # ^ |
        Rev. 4   Vote: I like it 0 Vote: I do not like it

        Well it will be ok as long as you are using hashBase > 26 for this problem. Here string size will be at most 10 characters. So the hashValue will be distinct until you take a hashBase enough large so that hashValue crosses the limit of the data type (here unsigned long long).

        Let, lim = ULLONG_MAX
        A = ULLONG_MAX + 7
        B = 2 * ULLONG_MAx + 7
        7 = A % lim = B % lim
        

        (When a value crosses the size of the data type i.e. overflows, it is moded by the size of data type, so it increases the probability of two different strings having same hashValue)

        And a prime number is always required as hashBase to increase the probability of having all distinct hashValues. Sorry can't give an example of this :(

        • »
          »
          »
          »
          »
          12 years ago, # ^ |
          Rev. 2   Vote: I like it 0 Vote: I do not like it

          I guess you misunderstood me, as you can see, once I map the words into long long, I sort long long[] instead of sorting vector , this saves time, and is possible only if the conversion function (hash(), here) preserves the order.

          Well, maybe "hash" as a function name here is misleading. I should have named it map or something.

          Using a prime would be helpful when, I intend to use an array as a dictionary. Here, I am only remapping them to distinct numbers.

          Can you explain or give some pointer that explains, how to handle collisions. Lets say I want to map large strings onto an array of size 10**6. The total no. of such strings = 10 ** 6, so that they can be mapped distinctly. In such a case suppose, two strings map to the same number, when using a particular prime, then how to handle that? I hope I made the doubt clear this time :)

    • »
      »
      »
      12 years ago, # ^ |
        Vote: I like it +4 Vote: I do not like it

      Do you think that long long can store every number in the universe?

      • »
        »
        »
        »
        12 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        the constraint of the problem was that a string could be atmost of length 10, I know long long can store 27 ** 10 and is suitable here.

        Besides I thought of mapping the strings into a limited range, here 100000, preserving there order, but evidently there is no way to do that.

        The hash_base here, is basically the base in 26-ary number system, so as long as, hash_base ** 10 fits into long long, you can map the strings to numbers(preserving order), if say the size increases to x such that 27 ** x overflows long long, you won't EVER be able to map them to numbers(preserving order).