MasterMind's blog

By MasterMind, history, 5 years ago, In English

The link to the problem Problem

I read the tutorial but I cannot make sense out of it.

would any one explain how to come up with the solution.

any help is really appreciated

  • Vote: I like it
  • +2
  • Vote: I do not like it

»
13 months ago, # |
Rev. 6   Vote: I like it +5 Vote: I do not like it

Yes, the problem is very difficult. At least at my level. Looking at bruteforce I can see that there needs to be some property that will reduce the runtime from $$$O(N^3)$$$ to at least $$$O(N ({\log N})^k)$$$.

When you think about the final array, you know some useful numbers. $$$\frac{N(N+1)}{2}$$$ is the number of elements in it. Similarly, when number x is a median of that array, there must be at least $$$\frac{1}{2} \cdot \frac{N(N+1)}{2}$$$ elements >=x. For example, when there's only 1 unique number in that array, this property holds, when all numbers are unique, this property holds.

So if you can somehow count the number of elements >=x in that array of subarray medians quickly (e.g. $$$O(N)$$$ or $$$O(N \log N)$$$) you can binary search for the moment when this count drops below $$$\frac{1}{2} \cdot \frac{N(N+1)}{2}$$$, all numbers greater than median will have less than half numbers greater than or equal to them.

What you notice first is that you do not have that array with $$$\frac{N(N+1)}{2}$$$ elements. You only have $$$N$$$ elements, but, the condition for counting still holds. If you can count the number of subarrays where there were more than $$$\frac{L}{2}$$$ (L is length of subarray) elements >=x, you also count all of the elements in that array of subarray medians that are >=x. The way you count this is by +1 -1 mapping and count all subarrays where sum of that is >=0. In that case you know that in each subarray there was $$$\geq\frac{L}{2}$$$ elements greater than x. When sum is 0, the element was the median as there was equal number of <x and >=x, when sum is >0 the element might have been median but it also might be less than median, if <0 then x is surely higher than median because more than half of array was smaller.

Now, you have binary search on x and this counting subproblem. This counting subproblem with >=0 is somewhat related to a simpler problem of "counting all the sums in subarrays where sum is equal to K". This one we know how to do in linear time. The difference here is that >=K is a bit more involved, because every time your prefix sum appears, you need to include not just the count of prefixSum-K but all previous sums where prefixSum-K>=0. You can do this with a BIT, counting all sums on the range -N and p+1 (exclusive, where p is the current prefix sum). Or I assume they use "inversion number" for whatever they use to count this. I do not know what that is.

Below is $$$O(N^3)$$$ that calculates this count for each element in a but you want to binary search it and get this sum efficiently.

N = len(a)
for i, m in enumerate(sorted(set(a))):  # binary search on this
	cs = defaultdict(lambda: 0)  # count prefix sums
	cs[0] = 1
	p = 0  # prefix sum
	cnt = 0  # count of subarrays where subarray sum is >= 0
	for t in a:
		p += 1 if t>=m else -1
		cnt += sum(cs[i] for i in range(-N, p+1))  # efficiently sum
		cs[p] += 1  # efficiently update

	if cnt < N*(N+1)//4: break
print(a[i-1])  # median

With BIT, this is a $$$O(N (\log N)^2)$$$ solution. There is a way to not use BIT and get $$$O(N \log N)$$$, this way exploits the fact that your prefix sum p only increases/decreases by 1. I'll skip that here.