obag's blog

By obag, 11 years ago, In English

Hi there, Codeforces community! This question has been bugging me for a while now, so I thought I would share it with you, and see if we can work out the answer together.

The problem is the following: given an array of n integer numbers (positive and negative), find a (contiguous) subarray for which the product is maximum. I would like to find an algorithm with complexity less than O(n2).

For instance, if the array is

then the underlined subarray would be the one we are looking for (its "score" is 7×4 = 28) .

Notice that if we changed "sum" into "min" the problem would be solvable in time. Similarly, if we changed "sum" into "gcd" the problem would still be solvable in time. (I can elaborate if you are interested). The problem is, with other associative operations, such as sum or product, I don't know how to go down from n2 to something less.

Any clue?

  • Vote: I like it
  • +49
  • Vote: I do not like it

| Write comment?
»
11 years ago, # |
Rev. 4   Vote: I like it +12 Vote: I do not like it

Notice that if we changed "sum" into "max" the problem would be solvable in time.

We'd just need to take either the whole array, or nothing. So, this is O(n).

  • »
    »
    11 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Ahah, you're right! I meant "min" (I edited the post, thanks)

    • »
      »
      »
      11 years ago, # ^ |
      Rev. 2   Vote: I like it 0 Vote: I do not like it

      LOL , it is still solvable in O(n) time.

      for every i find nearest integer smaller than i'th integer from left and from rightز

      more formally:

      let L[i] equal to largest j such that j<i and A[j]<A[i]

      and R[i] equal to smallest k such that i<k and A[i]>A[k]

      then the answer is max( ( (R[i]-1) — (L[i]+1) + 1) * A[i] ) for all i

      finding the values of L and R can be done using stack in O(n)

      this is similar problem HISTOGRA

      • »
        »
        »
        »
        11 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Yes, this was also my solution indeed. Only difference I was computing those L[i] and R[i] values using a binary search for each i.

        But then you're right, it is possible to compute L and R inductively in O(n) time.

        The point is: can we still solve the problem efficiently if we change that "min" into a "sum"?

  • »
    »
    11 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Not if negative numbers are allowed.

    • »
      »
      »
      11 years ago, # ^ |
      Rev. 2   Vote: I like it 0 Vote: I do not like it

      if there exists atleast one non-negative integer, then choose the entire array as the subarray, and the answer will be maximized.
      if not, then choose the least negative integer (single element) as the subarray, and the magnitude of the answer (which will be negative) will be minimized.
      so it is!

      • »
        »
        »
        »
        11 years ago, # ^ |
          Vote: I like it +3 Vote: I do not like it

        Hm not sure what I was thinking :D I guess I was thinking about sums. Of course you're right.

»
11 years ago, # |
Rev. 3   Vote: I like it 0 Vote: I do not like it

For the example the answer should be 66.

  • »
    »
    11 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Yes, you're absolutely right. Sorry for that, I fixed the example :)

    • »
      »
      »
      8 years ago, # ^ |
        Vote: I like it +1 Vote: I do not like it

      Hi

      Could you please elaborate this?

      "Similarly, if we changed "sum" into "gcd" the problem would still be solvable in time. (I can elaborate if you are interested)."

      Thanks

      • »
        »
        »
        »
        8 years ago, # ^ |
        Rev. 2   Vote: I like it +58 Vote: I do not like it

        There can be only different gcd values among suffixes of an array (due to Euclide's algorithm complexity). So you can add characters one by one keeping all possible gcd's of current array's suffixes.

        By the way, does anyone know solution for the original problem now, 2 years later?

        • »
          »
          »
          »
          »
          8 years ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          How to implement adding in O(nlogn) ?

          • »
            »
            »
            »
            »
            »
            8 years ago, # ^ |
              Vote: I like it +5 Vote: I do not like it

            For each gcd keep the min and max length of suffixes which has such gcd. After that when you want to add element, for each segment compute gcd with this new element and update the borders. Also you should add one new segment which contains only that element. You should glue segments with same gcd when you do so.

        • »
          »
          »
          »
          »
          8 years ago, # ^ |
          Rev. 2   Vote: I like it +3 Vote: I do not like it

          I was thinking to try something like the solution of GSS3 Link to the problem... The idea is to save the answer for a suffix and a prefix subaaray for particular segment and also saving the subarray with maximum( sum* length) for that segemnt. Then keep merging them to get the answer for the whole array. I implemented a solution using segment trees (Link).

          You can check on some testcases. I don't know if its completely correct. Just thought to share the idea :P . Maybe it can be implemented without segment trees because we don't need to save the answer for every node.

          • »
            »
            »
            »
            »
            »
            8 years ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            But what do you store for the prefix and suffix? The entire difficulty in the sum * length problem is that there isn't 1 optimal suffix/prefix.

            • »
              »
              »
              »
              »
              »
              »
              8 years ago, # ^ |
              Rev. 2   Vote: I like it 0 Vote: I do not like it

              If the values compared in the merge() function are equal then any of them could produce a larger answer later on while building a tree. I didn't think of it before. It might be the problem in this approach.

              • »
                »
                »
                »
                »
                »
                »
                »
                8 years ago, # ^ |
                  Vote: I like it 0 Vote: I do not like it

                But saving the suffix and prefix with maximum sum * length is not necessarily correct.

                Consider the case where the left 'subtree' is -1 -1 -1 -1 and the right 'subtree' is 100 100 100 100.

                The suffix of the left subtree with greatest sum * length is the empty suffix with sum * length = 0. But the optimal solution is to take the entire string since the sum in the right subtree outweighs the negative sum in the left subtree. (And you can't fix this by adding it as a special case, the optimal suffix in the left subtree could be any length.)

          • »
            »
            »
            »
            »
            »
            8 years ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            For array

             - 2, 3, 7,  - 8,  - 3, 1, 1, 7,  - 2,  - 3

            it produces answer 36, whereas correct answer is 56.

          • »
            »
            »
            »
            »
            »
            8 years ago, # ^ |
              Vote: I like it +5 Vote: I do not like it

            So I think we got the problem here.. thanks gendelpiekel and Shapo for taking the time to read the code. This might be the problem in this approach.

        • »
          »
          »
          »
          »
          8 years ago, # ^ |
          Rev. 4   Vote: I like it +35 Vote: I do not like it

          I have a solution to propose but I'm sure I made mistakes everywhere. Probably I even made a mistake immediately and this makes no sense.

          Firstly, we can restate the problem. Consider the graph (as in x-axis y-axis graph, not nodes + edges graph) of the cumulative sums. We wish to find the greatest (positive) rectangular area between two points.

          Secondly, consider two points which we are considering to use as the 'bottom left' corner of the rectangle, call them P = (x0, y0), Q = (x1, y1). Without loss of generality (wlog) let us say x1 > x0.

          Observe firstly that if x1 >= x0 and y1 >= y0, we will always choose P over Q. So we can say y1 < y0. I.e. we only need to consider a set of points that become lower as you move right.

          Now, which points T = (x, y) are there such that we wish to choose P over Q? I claim they are the points above the line going through (x0, y1) and (x1, y0). Diagram (we should choose P if T is above the red line, otherwise we should choose Q):

          In fact since we only want positive rectangles, the red line should not be a line but a RAY

          Now things get very hand-wavy and probably wrong or suboptimal.

          Let's call the red line the 'optimality line' of P and Q.

          Now we wish to solve the problem: given some potential 'bottom-left' points (call them 'candidates'), which is the best to use for some given query points? We can solve the problem for 2 candidates, but how do we solve it for more than 2?

          Consider 3 candidates. Call them P, Q, R in order of increasing x (remember that this means they are also in order of decreasing y). The optimality lines between P and Q, P and R and Q and R will all interesect at the same point. Example diagram (P = (0,4), Q = (2,3), R = (4,0):

          something something RAY something

          Purple, green and orange areas are where R, Q and P are optimal respectively.

          I believe (*wave hands*) that it should be a similar case for any 3 candidates (*wave hands more*). So for any 3 candidates, we should be able to calculate the x coordinate at which the middle candidate becomes redundant.

          So now, we can sweep from left to right to answer queries. As we move, we calculate when each point will become redundant and remove them as necessary. (*wave hands vigorously*)

          To answer the queries themselves, we can binary search over the currently valid candidates (in order of the y values where they are optimal).

          blah blah something something it seems O(n lg n) maybe. (edit actually maybe n lg^2 n, who knows) (more edit: actually probably still n lg n)

          Okay I'm going to sleep now. I missed quite a few details and this is pretty poorly written because I had to change my solution as I realised some parts were wrong. Tomorrow I will probably wake up, read this and realise I'm totally insane >_>

          edit: 1) actually maybe you need a set to store candidates since you (might?) have to remove from the middle (?) so it would be O(n lg^2 n), 2) there's lot's of implicit details about keeping the candidates in order of decreasing y value, blah blah details something something

          mote edit: actually binary search is separate from set operations so probably still n lg n (okay I'm actually going to sleep now)

          • »
            »
            »
            »
            »
            »
            8 years ago, # ^ |
            Rev. 2   Vote: I like it +8 Vote: I do not like it

            you slightly missed areas of optimality: green area will be not in bottom, but in opposite direction

            * Assume we have only bottom-left candidates (we can calculate in O(N) using stack).
            * Assume that for every 3 consecutive candidates optimality lines aren't parallel (if not, we can completely remove middle candidate, because it won't be optimal anywhere). Then, if we can prove that for every 3 candidates optimality lines intersect in one point (Sympy suspects it is true), we can prove by induction that for every set of candidates their pairwise optimality lines intersect in one point. Therefore, the whole plane divided into sectors, and every sector corresponds to one candidate, which is optimal here.
            * Moreover, we can compute this division in O(K), where K — number of candidates, because every sector is defined by two optimality lines between one candidate and its neighbours (for first and last candidate we pick last and first one, respectively).
            * After that, each query processed in : you just simply find sector containing query poing using binary search.
            So, we have solution in .

            As for other part — nevermind, it's bullshit about induction

            • »
              »
              »
              »
              »
              »
              »
              8 years ago, # ^ |
                Vote: I like it +10 Vote: I do not like it

              Hmm... I'm pretty sure the green area is correct in my diagram. For example consider T = (4, 4), it is clear we should choose Q = (2, 3) as our bottom left point.

              But you're right, there are cases where the 'green' area is above instead of below. Specifically: the green area is below when Q is above (the line) PR, the green area is above when Q is below PR. Also, as you pointed out, there are cases where the lines actually don't intersect, which is when Q is on PR.

              So in fact there are cases where we should add candidates as we sweep from left to right, not just remove (and cases when we don't add or remove the candidate at all).

              Also, I think that we might actually not be able to consider only the two adjacent points when calculating when to add or remove a point. Probably we can't without more observations and maybe not at all.

              So actually, perhaps we can't do better than sqrt decomposition and the final solution is quite similar (or perhaps identical) to the half plane solution you described.

              • »
                »
                »
                »
                »
                »
                »
                »
                8 years ago, # ^ |
                  Vote: I like it 0 Vote: I do not like it

                Yes, you're absolutely right about green area. I found mistake in my thoughts.

          • »
            »
            »
            »
            »
            »
            8 years ago, # ^ |
            Rev. 2   Vote: I like it +16 Vote: I do not like it

            Isn't this idea like convex hull trick in 3D? For each j we want to find i < j maximizing (j - i)(pi - pj) = jpj - ipj - pij + ipi; jpj is constant, so we have a collection of planes z =  - ix - piy + ipi and need to find the topmost of them for (x, y) = (j, pj) each time. It looks hard in general case.

        • »
          »
          »
          »
          »
          8 years ago, # ^ |
            Vote: I like it +18 Vote: I do not like it

          Disclaimer: It took slightly complicated :) I don't think it's the best solution ever, moreover I can miss something. So, comments and questions are welcome!

          Assume we have array of n integer numbers indexed 1 trough n. Let's denote Ai as i-th number in this array.
          First of all, calculate array of prefix sums named S, so that . Now we try to solve the following problem equivalent to the original one:

          Now let's make a hypothesis that we have a data structure, which operates on points in 2-dimensional space and allows to:

          • Preprocess K points (xi, yi) in .
          • For each of Q arbitrary points (aj, bj) computes in amortized .

          Given such a data structure, one can use SQRT-decomposition:

          • Divide array of partial sums S into blocks with consecutive elements in each, treating partial sums as points (Si, i) in 2-dimensional space;
          • For each block build data structure mentioned above, overall ;
          • Calculate answer for partial sums from the same block in a naive way, overall ;
          • To account for partial sums from different blocks, for each block consider partial sums from previous blocks as queries to data structure, overall .

          In total we have complexity.

          As for data structure, I'd like only to mention the key ideas:

          • For fixed set of K points (xi, yi) we can divide plane no more than into K convex polygons (possibly unbounded), each part corresponds to points where we have fixed argmax. This can be done in using algorithm for intersecting halfplanes.
          • For processing one query we only need to figure out the polygon this query lies inside. This can be done using ideas from wikipedia
»
8 years ago, # |
Rev. 2   Vote: I like it +26 Vote: I do not like it

Okay, I think I have an solution.

Let's make a binary search on the answer. Now we want to know if there are indices L, R so that (aL + 1 + ... + aR)(R - L) ≥ M for fixed M. Set Sk = a1 + ... + ak, then the inequality above becomes (SR - SL)(R - L) ≥ M. Solving for SR, we get:

Let's make a set of functions f1, ..., fn so that for x > L. Now we want to see if there are any L, R for which fL(R) ≤ SR, that is, when we set f(x) = min fi(x), if there is any R such that f(R) ≤ SR.

How to compute f? Notice that each fL is a hyperbola. We can easily see that:

  • no two functions intersect in more than one point in their domain (solve equation fi = fj, then see).
  • we want to make a lover envelope of such hyperbolas (hyperbola parts). Funnily enough, all of them are exactly the same, though translated.
  • say , . If a < c, then fi is at first larger than fj, then intersects fj and is smaller forever — thus, if it is a global minimum, it must come after fj. Same if c > a. However, if a = c, compare b and d. If b < d, the first hyperbola is always before the second. (Probably some picture would be handy here...)
  • We can make a total ordering basing on the preceding rules. Then fire up a linear-time lower-envelope "convex-hull" algorithm in the same way we did for lines.

(It is a bit messy, but I gave you an idea). The running time is now , however we can sort everything only in the beginning of the runtime. This brings down the running time to what I stated.