If you are given an unsorted array of size N and some queries Q which consists of l,r as the range of the subarray. The task is to find the median of this subarray. 1<=N,Q<=5*10^4. Now if we sort the subarray per query the time complexity will be Q*nlogn (here n is the length of the subarray) which will give us TLE. Can anybody share the optimal approach?
Let's build merge-sort tree on our array. We can use binary search to check if the given value k is less than or equal to the median of subarray. Let's decompose interval into segments on our merge-sort tree. We can sum up the number of elements that are less than or equal to k in each of the segments with lower_bound or sth. similar. Based on this sum we can make transitions in binary search (if the sum is smaller than (r-l+1)/2 then the median is bigger etc). We can answer each query in O(log^3 N) since we use lower_bound on O(log N) segments in each iteration of binary search. I hope my explanation is clear.
I know the concept of segment tree but I didn't know about merge-sort tree so I was sure that something special is needed for this problem, something like segment tree. Now that you have cleared my doubt, I can solve this. Thanks!
Merge-sort tree is basically a segment tree. In its segment responsible for interval [l, r] it keeps sorted subarray [l, r].
FYI, another name for this structure is a wavelet tree, in case you look it up online.
Wavelet tree is not the same. It is has same functionality as described but is an optimized version of merge sort tree via fractional cascading.
Wavelet trees are not mergesort trees (even when you use fractional cascading). Wavelet trees are actually sort of the opposite of mergesort trees, because instead of keeping values a_i such that l<=i<=r in its nodes, we keep values a_i such that l<=a_i<=r. Wavelet trees are also usually more versatile, due to their ability to perform more complex updates on the sequence than classic mergesort trees allow.
This problem can easily be reduced to SPOJ's MKTHNUM task. There are a variety of solutions to that including a $$$\mathcal{O}(QlogN)$$$ one using persistent segment tree.
Mo's algorithm should also work here as $$$N, Q$$$ is not that big. Using a built-in order statistic tree or Segment Tree to quickly find median give us an $$$O((N + Q) * \sqrt{N} * log(N))$$$ solution!
You don't need the order statistic tree or any another hand-written trees actually.
This problem is special because it asks only for the median of the range.
Hence, in this case, we can use the built-in balance binary search tree (set in C++ and TreeSet in Java). We will need two of them in this case. Then, we just need to follow the two-heap algorithm for solving the famous problem "find median of a running stream of integers" (you can easily google for this). Of course, we need to handle delete operations for our task. That's why we used a balanced binary search tree instead of a heap.
Oh I never seen that before, really nice technique!
There 3(maybe more) methods of solving this problem.
1) Persistent Segment Tree
Finding the k-th smallest number in a range
In this task you just need find $$$n/2$$$-th smallest number in a range.
Complexity : $$$O($$$ $$$N$$$$$$log(N)$$$ $$$+$$$ $$$Q$$$ $$$*$$$ $$$log$$$$$$(N)$$$$$$)$$$
2) Merge Sort Tree.
Saving the entire subarrays in each vertex:
(Described in the other comment)
Complexity : $$$O($$$ $$$N$$$$$$log(N)$$$ $$$+$$$ $$$Q$$$ $$$*$$$ $$$log^3$$$$$$(N)$$$$$$)$$$
3) Mo's algorithm
Mo's algorithm
If you want to solve it with MO's algorithm you need to use some data structure that can :
Add element
Delete element
Find the $$$n/2$$$-th element of the multiset.
You can use segment tree to do this.
Complexity : $$$O($$$ $$$Q$$$$$$log(Q)$$$ $$$+$$$ $$$(N + Q)$$$ $$$*$$$ $$$log$$$$$$(N)$$$ $$$*$$$ $$$sqrt(N)$$$ $$$)$$$
Sorry for my bad english.
For persistent segment tree, complexity should be $$$O(NlogN+QlogN)$$$.
Ok I think we can solve it by maintaining one min and one max heap. Like we do in case of finding median of stream of numbers. Of course in this case we should do it offline by sorting the queries. As we proceed in the queries in sorted order we will encounter some addition or deletion so just keep balance between these two heap. Time complexity would be $$$O(NlogN+QlogQ)$$$.
I think it should work but you are welcome to point it out if you think it doesn't.
I am not sure what is your sort key. But plainly sorting by left or right query endpoint does not work. It is easy to create test cases to make that strategy get TLE.
Using the built-in heap (for Java and C++) doesn't work. Because you'll need to delete elements from your data structures efficiently. By default, deletion in heap is bad because you have to search the entire heap for that element
No no. I said heap just for understanding. Of course we are going to use multisite for this purpose otherwise it will TLE.
But I feel your 1st point is valid.
You can solve this using a Trie in $$$O((N + Q) \log N \log max(A_i))$$$, build a Trie and insert each number from left to right as a padded decimal string, and while inserting, for each Trie Node in path, append the index of the number in node. ie. If values with indices 4, 6, 9 passed through some node while inserting, then node will consist of $$$[4, 6, 9]$$$, if you insert in left to right order, these will be sorted for each node by default.
This allows us to simulate a trie that only consists of elements in a range, for example, if range query is $$$[L, R]$$$ then for each node binary search the count of indices in range $$$[L, R]$$$, this will be number of values with given prefix in given range, Now we can do all query operations on this trie like any usual trie.
For this problem the operation is $$$Kth$$$ smallest element in Trie $$$(K = N / 2)$$$.
This can be performed as shown in this blog https://codeforces.net/blog/entry/104650
All we do is replace node frequency with binary search on indices in range, This solves the problem, and we can also extend this idea to allow updates (delete old and insert new) by keeping ordered set in each Trie Node, which allows $$$O(\log N \log max(A_i))$$$ updation but higher constant factor.