v0rtex's blog

By v0rtex, history, 3 years ago, In English

If you are given an unsorted array of size N and some queries Q which consists of l,r as the range of the subarray. The task is to find the median of this subarray. 1<=N,Q<=5*10^4. Now if we sort the subarray per query the time complexity will be Q*nlogn (here n is the length of the subarray) which will give us TLE. Can anybody share the optimal approach?

  • Vote: I like it
  • +2
  • Vote: I do not like it

»
3 years ago, # |
  Vote: I like it +11 Vote: I do not like it

Let's build merge-sort tree on our array. We can use binary search to check if the given value k is less than or equal to the median of subarray. Let's decompose interval into segments on our merge-sort tree. We can sum up the number of elements that are less than or equal to k in each of the segments with lower_bound or sth. similar. Based on this sum we can make transitions in binary search (if the sum is smaller than (r-l+1)/2 then the median is bigger etc). We can answer each query in O(log^3 N) since we use lower_bound on O(log N) segments in each iteration of binary search. I hope my explanation is clear.

  • »
    »
    3 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I know the concept of segment tree but I didn't know about merge-sort tree so I was sure that something special is needed for this problem, something like segment tree. Now that you have cleared my doubt, I can solve this. Thanks!

    • »
      »
      »
      3 years ago, # ^ |
        Vote: I like it +3 Vote: I do not like it

      Merge-sort tree is basically a segment tree. In its segment responsible for interval [l, r] it keeps sorted subarray [l, r].

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +5 Vote: I do not like it

    FYI, another name for this structure is a wavelet tree, in case you look it up online.

    • »
      »
      »
      3 years ago, # ^ |
      Rev. 2   Vote: I like it +10 Vote: I do not like it

      Wavelet tree is not the same. It is has same functionality as described but is an optimized version of merge sort tree via fractional cascading.

      • »
        »
        »
        »
        3 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Wavelet trees are not mergesort trees (even when you use fractional cascading). Wavelet trees are actually sort of the opposite of mergesort trees, because instead of keeping values a_i such that l<=i<=r in its nodes, we keep values a_i such that l<=a_i<=r. Wavelet trees are also usually more versatile, due to their ability to perform more complex updates on the sequence than classic mergesort trees allow.

»
3 years ago, # |
  Vote: I like it +15 Vote: I do not like it

This problem can easily be reduced to SPOJ's MKTHNUM task. There are a variety of solutions to that including a $$$\mathcal{O}(QlogN)$$$ one using persistent segment tree.

»
3 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Mo's algorithm should also work here as $$$N, Q$$$ is not that big. Using a built-in order statistic tree or Segment Tree to quickly find median give us an $$$O((N + Q) * \sqrt{N} * log(N))$$$ solution!

  • »
    »
    3 years ago, # ^ |
    Rev. 2   Vote: I like it +3 Vote: I do not like it

    You don't need the order statistic tree or any another hand-written trees actually.

    This problem is special because it asks only for the median of the range.

    Hence, in this case, we can use the built-in balance binary search tree (set in C++ and TreeSet in Java). We will need two of them in this case. Then, we just need to follow the two-heap algorithm for solving the famous problem "find median of a running stream of integers" (you can easily google for this). Of course, we need to handle delete operations for our task. That's why we used a balanced binary search tree instead of a heap.

    • »
      »
      »
      3 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Oh I never seen that before, really nice technique!

»
3 years ago, # |
Rev. 3   Vote: I like it +60 Vote: I do not like it

There 3(maybe more) methods of solving this problem.

1) Persistent Segment Tree

Finding the k-th smallest number in a range

In this task you just need find $$$n/2$$$-th smallest number in a range.

Complexity : $$$O($$$ $$$N$$$$$$log(N)$$$ $$$+$$$ $$$Q$$$ $$$*$$$ $$$log$$$$$$(N)$$$$$$)$$$

2) Merge Sort Tree.

Saving the entire subarrays in each vertex:

(Described in the other comment)

Complexity : $$$O($$$ $$$N$$$$$$log(N)$$$ $$$+$$$ $$$Q$$$ $$$*$$$ $$$log^3$$$$$$(N)$$$$$$)$$$

3) Mo's algorithm

Mo's algorithm

If you want to solve it with MO's algorithm you need to use some data structure that can :

  • Add element

  • Delete element

  • Find the $$$n/2$$$-th element of the multiset.

You can use segment tree to do this.

Complexity : $$$O($$$ $$$Q$$$$$$log(Q)$$$ $$$+$$$ $$$(N + Q)$$$ $$$*$$$ $$$log$$$$$$(N)$$$ $$$*$$$ $$$sqrt(N)$$$ $$$)$$$

Sorry for my bad english.

»
3 years ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

Ok I think we can solve it by maintaining one min and one max heap. Like we do in case of finding median of stream of numbers. Of course in this case we should do it offline by sorting the queries. As we proceed in the queries in sorted order we will encounter some addition or deletion so just keep balance between these two heap. Time complexity would be $$$O(NlogN+QlogQ)$$$.

I think it should work but you are welcome to point it out if you think it doesn't.

  • »
    »
    3 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it
    1. I am not sure what is your sort key. But plainly sorting by left or right query endpoint does not work. It is easy to create test cases to make that strategy get TLE.

    2. Using the built-in heap (for Java and C++) doesn't work. Because you'll need to delete elements from your data structures efficiently. By default, deletion in heap is bad because you have to search the entire heap for that element

    • »
      »
      »
      3 years ago, # ^ |
      Rev. 2   Vote: I like it 0 Vote: I do not like it

      No no. I said heap just for understanding. Of course we are going to use multisite for this purpose otherwise it will TLE.

      But I feel your 1st point is valid.

»
14 months ago, # |
  Vote: I like it 0 Vote: I do not like it

You can solve this using a Trie in $$$O((N + Q) \log N \log max(A_i))$$$, build a Trie and insert each number from left to right as a padded decimal string, and while inserting, for each Trie Node in path, append the index of the number in node. ie. If values with indices 4, 6, 9 passed through some node while inserting, then node will consist of $$$[4, 6, 9]$$$, if you insert in left to right order, these will be sorted for each node by default.

This allows us to simulate a trie that only consists of elements in a range, for example, if range query is $$$[L, R]$$$ then for each node binary search the count of indices in range $$$[L, R]$$$, this will be number of values with given prefix in given range, Now we can do all query operations on this trie like any usual trie.

For this problem the operation is $$$Kth$$$ smallest element in Trie $$$(K = N / 2)$$$.

This can be performed as shown in this blog https://codeforces.net/blog/entry/104650

All we do is replace node frequency with binary search on indices in range, This solves the problem, and we can also extend this idea to allow updates (delete old and insert new) by keeping ordered set in each Trie Node, which allows $$$O(\log N \log max(A_i))$$$ updation but higher constant factor.