quinoa's blog

By quinoa, history, 4 years ago, In English

Say you have a binary string of length N where each character is equally likely to appear. What is the probability that there will be a substring of K 1 characters?

For instance if K = 2, N = 3 the answer is 3/8 = 37.5%.

000: NO
001: NO
010: NO
011: YES
100: NO
101: NO
110: YES
111: YES

How to solve this problem. Can it be done in polynomial time?

  • Vote: I like it
  • +7
  • Vote: I do not like it

»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it

What are the bounds on $$$N$$$ and $$$K$$$? There is a pretty straightforward $$$\mathcal{O}(NK)$$$ dynamic programming solution.

  • »
    »
    4 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    No particular bounds, I'm simply wondering what is the most efficient solution.

    What would be your DP formulation/recursion?

    • »
      »
      »
      4 years ago, # ^ |
      Rev. 2   Vote: I like it +11 Vote: I do not like it

      Oh, okay.

      Well its obvious that if a string contains substring of $$$1$$$'s with length $$$\geq K$$$, then it also contains a substring of exactly length $$$K$$$.

      To compute the probability we can look at the complement problem and that is the probability that longest substring of $$$1$$$'s in a string is strictly less than $$$K$$$. It is known that $$$P(A) = 1 - P(\overline{A})$$$. Which means that we can just compute the probability of the complement and subtract it from 1.

      Let $$$C$$$ be equal to all binary strings of length $$$N$$$ that don't contain a substring of $$$1$$$'s with length $$$\geq K$$$, then the answer will be $$$1 - \frac{C}{2^n}$$$, since there are $$$2^n$$$ binary strings.

      To compute $$$C$$$ we use dynamic programming. Let $$$f(i, j)$$$ be equal to number of strings of lenght $$$i$$$ with current substring of $$$1$$$'s of length $$$j$$$ that don't have substrings of $$$1$$$'s with lenght $$$\geq$$$ K. Obviously, we are looking for the value $$$f(N, 0)$$$. The relations are the following, at each step we can either add $$$1$$$ or $$$0$$$ next, if we add $$$1$$$ then we increase our current substring lenght by one and if we add $$$0$$$ then we reset it back to 0.

      $$$ f(i, j) = \left\{ \begin{array}{ll} 0 & \mbox{if } j == k, \\ 1 & \mbox{if } i == 0, \\ f(i - 1, j + 1) + f(i - 1, 0) & \mbox{otherwise } \end{array} \right. $$$

      There are $$$\mathcal{O}(NK)$$$ states and the transitions are $$$\mathcal{O}(1)$$$ so the total complexity is $$$\mathcal{O}(NK)$$$.

      • »
        »
        »
        »
        4 years ago, # ^ |
          Vote: I like it +3 Vote: I do not like it

        Thanks! Thinking about the complement problem was clever, nice solution.

»
4 years ago, # |
  Vote: I like it +22 Vote: I do not like it

Consider the complement problem of counting how many binary strings of length $$$N$$$ exist such that length of longest substring of 1s is strictly less than $$$K$$$. Define $$$DP(i)$$$ to represent number of such binary strings of length $$$i$$$.

Trivially, $$$DP(i) = 2^i$$$ for $$$0 \leq i < K$$$.

Furthermore, $$$DP(i) = DP(i - 1) + DP(i - 2) + \ldots + DP(i - K)$$$ for $$$i \geq K$$$

This is because to get a string of length $$$N$$$, we could do

  • (length N — 1 string) + "0"
  • (length N — 2 string) + "01"
  • $$$\ldots$$$
  • (length N — K string) + "0" + "1" * (K — 1)

This can be done in $$$O(N)$$$ time with a sliding window approach.

If $$$N$$$ is large but $$$K$$$ is small, then this can be easily solved as a linear recurrence in $$$O(K^3log(N))$$$ time (matrix exponentiation).

Linear recurrences can also be solved in $$$O(K^2log(N))$$$ time or even $$$O(Klog(K)log(N))$$$ time (see this blog and this code)

»
4 years ago, # |
  Vote: I like it +3 Vote: I do not like it

Let's solve the problem when there can not be k consecutive 1s.

Let $$$dp[i]$$$ be the answer for prefix $$$i$$$.

Now there are two cases. first, if we set 0 at the ith position then the answer will be $$$dp[i-1]$$$.

In the second case, we assume some subarray of length $$$j$$$ filled with 1 ending at ith position and starting at $$$i-j+1$$$ also there will 0 be $$$(i-j)$$$th index. $$$j$$$ will vary from 1 to k-1. In that case, the answer will be $$$dp[i-j-1]$$$. summing up both the cases we get

$$$dp[i]$$$ = $$$\sum\limits_{j = {i-k}}^{i-1}dp[j]$$$

So by using prefix sums we can calculate the answer in $$$O(N)$$$