Basically you are given an array and you have to keep removing any two unequal elements until the size of the array is minumum.
In the editorial it's said:- We get the following greedy solution — each time we take two characters with maximal occurrences number and delete them.
Can anyone plz tell why this solution always give the optimal answer ? Like why everytime we are selecting the elements with maximum frequency?
Thx a lot !!!
Because that ensures that the difference between the frequencies of the last two remaining distinct elements is minimal.
My code
I have come across this lemma as a subproblem in harder problems so I think its nice to write down a formal proof of this fact.
First this problem is equivalent to the following problem:
Let $$$f[i]$$$ denote the frequency of the value $$$i$$$ in the array. Let's sort $$$f[]$$$ in increasing (non-decreasing) order. There are two cases:
Case 1: $$$f[n]\geq f[1]+f[2]+...+f[n-1]$$$
For this case, first of all note that the maximum number of possible pairs is $$$f[1]+f[2]+...+f[n-1]$$$. It cannot exceed that because in each pair, elements need to be distinct. So we need at least one of $$$f[1], f[2],....f[n-1]$$$ in each pair. And moreover, just by pairing off $$$f[n]$$$ with others in each pair, we can achieve this maximum. If we keep pairing the maximums, easy to see that $$$f[n]$$$ always remains as the maximum and we pair off each $$$f[i]$$$ from $$$1$$$ to $$$n-1$$$ with $$$f[n]$$$.
Case 2: $$$f[n]< f[1]+f[2]+...+f[n-1]$$$
Here, I claim the maximum is $$$floor((f[1]+f[2]+..f[n])/2)$$$.
It is true, since the maximum cannot exceed this, as this is the case where every element gets divided into pairs (except possibly $$$1$$$, if total number of elements is odd).
Moreover, let's prove that the greedy method of pairing off two largest finds this maximum:
we proceed by induction on sum of elements.
We have $$$f[n]<=f[1]+f[2]+...+f[n-1]$$$ -(A)
Inductive hypothesis: if above inequality holds true, we have the number of pairs = $$$floor(sum/2)$$$ by the greedy approach.
Lets pair off $$$1$$$ $$$f[n]$$$ and $$$1$$$ $$$f[n-1]$$$.
Case 1: $$$f[n]$$$ remains the maximum
The statement (A) still holds as $$$f[n]-1<=f[1]+f[2]+...f[n-1]-1$$$ and the sum of elements has decreased by $$$2$$$. We can thus apply inductive hypothesis to get the maximum number of elements as $$$floor((sum-2)/2)+1$$$ which is $$$floor(sum/2)$$$.
Case 2: $$$f[n-2]$$$ becomes the maximum.
This is possible only if $$$f[n]=f[n-1]=f[n-2]$$$. In that case, if $$$f[n]>=2$$$, $$$f[n-2]<=f[n-1]-1+f[n]-1$$$ which means $$$f[n-2]<=f[1]+f[2]+...+f[n-1]-1+f[n]-1$$$. Again we can apply inductive hypothesis as above and get the result.
There is definitely a lack of good resources on greedy algorithms.
It seems that people consider that the solution is obvious, but often it is not.
Hope someone can fix this issue.