abcdqwerty12345's blog

By abcdqwerty12345, history, 3 years ago, In English

Problem

Basically you are given an array and you have to keep removing any two unequal elements until the size of the array is minumum.

In the editorial it's said:- We get the following greedy solution — each time we take two characters with maximal occurrences number and delete them.

Can anyone plz tell why this solution always give the optimal answer ? Like why everytime we are selecting the elements with maximum frequency?

Thx a lot !!!

  • Vote: I like it
  • +9
  • Vote: I do not like it

| Write comment?
»
3 years ago, # |
  Vote: I like it -7 Vote: I do not like it

Because that ensures that the difference between the frequencies of the last two remaining distinct elements is minimal.

My code

»
3 years ago, # |
Rev. 2   Vote: I like it +3 Vote: I do not like it

I have come across this lemma as a subproblem in harder problems so I think its nice to write down a formal proof of this fact.

First this problem is equivalent to the following problem:

Given an array $$$a[]$$$, find the maximum number of disjoint pairs of indices $$$(i, j)$$$ such that $$$a[i]\neq a[j]$$$ and each index is in at-most one pair.

Let $$$f[i]$$$ denote the frequency of the value $$$i$$$ in the array. Let's sort $$$f[]$$$ in increasing (non-decreasing) order. There are two cases:

Case 1: $$$f[n]\geq f[1]+f[2]+...+f[n-1]$$$

For this case, first of all note that the maximum number of possible pairs is $$$f[1]+f[2]+...+f[n-1]$$$. It cannot exceed that because in each pair, elements need to be distinct. So we need at least one of $$$f[1], f[2],....f[n-1]$$$ in each pair. And moreover, just by pairing off $$$f[n]$$$ with others in each pair, we can achieve this maximum. If we keep pairing the maximums, easy to see that $$$f[n]$$$ always remains as the maximum and we pair off each $$$f[i]$$$ from $$$1$$$ to $$$n-1$$$ with $$$f[n]$$$.

Case 2: $$$f[n]< f[1]+f[2]+...+f[n-1]$$$

Here, I claim the maximum is $$$floor((f[1]+f[2]+..f[n])/2)$$$.

It is true, since the maximum cannot exceed this, as this is the case where every element gets divided into pairs (except possibly $$$1$$$, if total number of elements is odd).

Moreover, let's prove that the greedy method of pairing off two largest finds this maximum:

we proceed by induction on sum of elements.

We have $$$f[n]<=f[1]+f[2]+...+f[n-1]$$$ -(A)

Inductive hypothesis: if above inequality holds true, we have the number of pairs = $$$floor(sum/2)$$$ by the greedy approach.

Lets pair off $$$1$$$ $$$f[n]$$$ and $$$1$$$ $$$f[n-1]$$$.

Case 1: $$$f[n]$$$ remains the maximum

The statement (A) still holds as $$$f[n]-1<=f[1]+f[2]+...f[n-1]-1$$$ and the sum of elements has decreased by $$$2$$$. We can thus apply inductive hypothesis to get the maximum number of elements as $$$floor((sum-2)/2)+1$$$ which is $$$floor(sum/2)$$$.

Case 2: $$$f[n-2]$$$ becomes the maximum.

This is possible only if $$$f[n]=f[n-1]=f[n-2]$$$. In that case, if $$$f[n]>=2$$$, $$$f[n-2]<=f[n-1]-1+f[n]-1$$$ which means $$$f[n-2]<=f[1]+f[2]+...+f[n-1]-1+f[n]-1$$$. Again we can apply inductive hypothesis as above and get the result.

»
3 years ago, # |
  Vote: I like it +6 Vote: I do not like it

There is definitely a lack of good resources on greedy algorithms.

It seems that people consider that the solution is obvious, but often it is not.

Hope someone can fix this issue.