Anthony_Smith's blog

By Anthony_Smith, history, 3 years ago, In English

Consider the following problem:

There are N cells in a row, each of which must be colored with a color C_i. In one step, you can choose to color any subarray of the N cells with their corresponding colors, but this has a cost equivalent to (number of distinct colors in that subarray)^2. Find the minimum cost to color all N cells.

I was able to quickly think of a simple O(N^2) DP solution: Let dp[n] = the cost to color cells 0 to n. Then, dp[n] transitions from all dp[<n]; O(N) transitions for O(N) states.

However, in this problem, N <= 50,000 (and C_i also <= 50000, but I'm not sure why this matters). This combined with the fact that the cost function is the square of the number of distinct colors makes me think that there is some sort of sqrt-bash solution. Also, dp[n] is monotonically increasing, but I'm not sure how this helps...

Can anyone provide some hints/a solution to this problem?

  • Vote: I like it
  • +10
  • Vote: I do not like it

»
3 years ago, # |
  Vote: I like it 0 Vote: I do not like it

I'm a little puzzled, do you also have to minimize steps? Otherwise it seems to me that just picking each subarray of size 1 gives n cost, and we can't seem to do better.

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +9 Vote: I do not like it

    Well, consider something like {1 3 1 3 1 3 1 3 1 3}. Clearly, using only one step to color all cells costs less than coloring each cell individually. (4 < 10)

    • »
      »
      »
      3 years ago, # ^ |
        Vote: I like it +8 Vote: I do not like it

      That's true — my mistake. Thanks for spotting this :)

»
3 years ago, # |
Rev. 2   Vote: I like it -8 Vote: I do not like it

Do you get TLE? I think that there is chance that this solution can get AC.

You said that dp[n] you get by consider dp[<n]... but if dp[i..n] have 7 colors and dp[i-1..n] have 7 colors... then there is no reason to consider dp[i] for the solution. Do you think you can exploit that?

»
3 years ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

In your dp you are trying to answer the following question for each position:

should I append the current element to the last range/operation, or create a new one?

Appending is "profitable" only if the last range already contains the current color, or else $$$(c + 1)^2 > c^2 + 1^2$$$, where $$$c$$$ is the number of distinct colors in the last range. Hence you should append for free iff the last range starts at or before $$$\text{prev}(i)$$$, the previous occurrence of color $$$c_i$$$.

Let's say dp[i][j] is the best we can do on a prefix of length $$$i$$$ with the last segment starting at position $$$j$$$. This dp should be easy to update with a Fenwick tree, as the only operation appears to be adding $$$1$$$ on range $$$(\text{prev}(i), i]$$$.

You can probably even get rid of the induced extra log factor in the complexity, but I'm too lazy.

  • »
    »
    3 years ago, # ^ |
    Rev. 3   Vote: I like it 0 Vote: I do not like it

    Uh oh, I was wrong, we should increment $$$(\text{prev}(i), i]$$$ by a linear function $$$2 (i - x) + 1 = -2x + (2i + 1)$$$ of position $$$x$$$.

    This is harder but also doable. You will probably need a Fenwick tree that stores its values as a linear function $$$a x + b$$$ (i.e. two trees: one for $$$a$$$ and one for $$$b$$$).

    • »
      »
      »
      3 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Oh wait, it is not linear, as it depends on the number of different colors in that interval. Okay, I give up :(

»
3 years ago, # |
  Vote: I like it +31 Vote: I do not like it

Maximum answer can be $$$N$$$ (We can make $$$N$$$ sub-arrays with $$$1$$$ element in each). For the same reason, we will never make a sub-array with more than $$$\sqrt{N}$$$ distinct colors. That leaves us with $$$\sqrt{N}$$$ transitions from each state.

»
3 years ago, # |
Rev. 2   Vote: I like it +18 Vote: I do not like it

Alright, your sqrt ideas were correct from the beginning. Specifically, as there exists a trivial solution with cost $$$n$$$, an optimal solution cannot contain an interval with $$$c > \sqrt{n}$$$ different colors.

Therefore, each of $$$O(n)$$$ states only has $$$O(\sqrt{n})$$$ meaningful transitions. Namely, we should loop over $$$c$$$ from $$$1$$$ to $$$\sqrt{n}$$$ and consider the transition corresponding to the longest interval with $$$c$$$ colors.

Left endpoints of these $$$\sqrt{n}$$$ intervals should be easy to maintain efficiently with the use of $$$\text{prev}(c_i)$$$ introduced above. Looks like $$$i$$$ should be added for $$$c = 1$$$ and the one after $$$\text{prev}(c_i)$$$ should be deleted.

Upd: similar to above, didn't see it while writing.