While I'm doing problems on CSES like always, I met with a rather interesting problem : CSES-Stick Lengths. After solving it, I discover there was actually atleast 3 ways to do this problem (all in O(NlogN)): the one I used to AC, other one using Ternary Search, and finally, using a prominent solution that I saw others used. The idea for the third was short : You sort the whole array, take the median (N/2) as the target value and calculate the result using it. Here it how it goes on USACO :
#include <bits/stdc++.h>
using namespace std;
//variables used for the current problem
int n,median;
vector<int> p;
long long ans,cnt;
void solve() {
cin >> n;
p.resize(n);
for (int &x : p){
cin >> x;
}
sort(p.begin(),p.end());
median=p[n/2];
for (const int &x : p){
ans+=abs(median-x); //Calculate the cost to modify the stick length
}
cout << ans << "\n";
return;
}
int main(){
ios_base::sync_with_stdio(false);
cin.tie(nullptr);
solve();
return 0;
}
After a while of observation, I myself must admit the solution's correctness. But I just couldn't proof it. Can you guys help me with it ?
I think the proof for this can be supported by the fact that the median of a data stream is always the element that has the lowest total absolute difference from the elements of the data stream. The sum of absolute deviations is minimum when taken around the median.
Rough proof that the median is the point with the smallest sum of differences from all other points.
(1) Median is the center of the data set. Meaning to the left and right of the median there are an equal number of data points.
(2) Now imagine shifting the point on the median to the right until we move past a data point. Now we have more than half of the points on the left of the point. And the distance increased for those points. So the total sum of all the distances increased as more than half of the distances increases when we move off the median.