d4rkc0de's blog

By d4rkc0de, history, 9 years ago, In English

Hello CF's . I was trying to find some optimizations to the this problem

So we have n strings , we are trying to find all rare subset of letters of theses strings.

a rares subset is a subset of letters x of a string s, such that the number of appearances of x in all strings is strictly lower than a number min_appear.

example :

let's say we have 3 strings and min_appear = 2

abcd abc cde

than the answer is e,ad,bd,ce,de,abc,abd,bcd,abcd,cde (because the number of their appearances = 1 < 2)

these are all the possible subset of letters and their number of appearances { a=2 ,b=2 ,c=3 ,d=2 ,e=1 ,ab=2 ,ac=2 ,ad=1 ,bc=2 ,bd=1 ,cd=2 ,ce=1 ,de=1 ,abc=2 ,abd=1,bcd=1,abcd=1,cde=1}

The best solution i come with have a n*2^(length of taller string) , i'm wondering if there is a solution with a better complexity.

UDP : Sorry i forgot about subsets of length 3 and 4 . also each two different letters from each string are distinct.

  • Vote: I like it
  • 0
  • Vote: I do not like it

»
9 years ago, # |
  Vote: I like it 0 Vote: I do not like it

We need some more details about this task. Either you're not asked to report all subsets (which I understood as subsequence?) but count them or it must have some pretty low constraints, such as in string length, number of different characters, or minappear.

Suppose the case where there's only one string s = "abc...xyz". There are distinct subsequences of length k. Let's also say minappear = |s| + 1.

Then reporting all this subsequences would take characters which for |s| = 26 is about 872 million. A bit too much, don't you think?

  • »
    »
    9 years ago, # ^ |
    Rev. 2   Vote: I like it +5 Vote: I do not like it

    I think that by subsets, he means a subset of letters. In this case we have 226 subsets, and every string can be represented simply with a mask indicating whether it contains each letter or not.

    A simple solution involving bitmasks would have complexity O(N * 226), but we would need to know more constraints to see if that's enough (I doubt it).

    Anyway, I still don't understand why he didn't mention other possible subsets for that example test case, like "abc", "cde" or "abcd".

    • »
      »
      »
      9 years ago, # ^ |
      Rev. 4   Vote: I like it 0 Vote: I do not like it

      I added the missing subsets , thanks for pointing it .

      By a subset i mean a subset of letters .

      I know about the O(N * 2^26) solution . i'm wondering if there is a better complexity ?

»
9 years ago, # |
Rev. 4   Vote: I like it +3 Vote: I do not like it

It's obvious that if x is rare than all substrings that contain x are rare ( in your example e is rare so ce,de,cde are also rare ) , this is can be helpful not to generate all possible substrings in such case , so one we have found the first rare we can generate all it's ascendants, right ??