Given an array $$$arr$$$ of length $$$N$$$, find the sum of count of unique values in a subset for every subset of $$$arr$$$, modulo $$$10^9 + 7$$$.
I can see that the result depends only on frequency of distinct values, and seems like some combinatorial trick, it's a made up question but seems standard, what is the most optimal solution for length $$$N$$$ array?
there are two ways to do this one way is dp
assume each number selection as independent process. assume frequency of that number to f so number of ways we can select that number is
fC1 + fC2 + .... fCf = 2^f — 1
now either we chose that number, which will contribute 1 to our unique sum
or we dont chose it which will contribute 0 to our unique sum.
here we can use DP to calculate, but i have better mathematical way
so assume polynomial representation for this (x^0 + (2^{f_i} — 1)x^1)
Function F = PI(x^0 + (2^{f_i} — 1)*x^1) for all i , summation f_i = n obviously
but what we need is summation of coefficient * power — 1, hmm it seems its just derivative and value calculated x = 1. i.e. DF/Dx,x=1
well that's it .
ans = (sum (2^{fi}-1) / ( 1 + 2^{fi}-1) over all fi) * 2^n = (sum((2^{fi} — 1)/2^fi)) over all fi) * 2^n
lets take example i have array [2,2,3] , f1 = 2 ,f2 = 1
our answer is 8 * ( (2^2 — 1) / 4 + (1 / 2)
= 8 * (3/4 + 1/2)
= 10
10 is our answer.
What are the constraints?
My initial idea was for each element , counting the number of subsets that it does not exist and finding the answer from that. You iterate from left to right in $$$a$$$, let $$$ind_{i,j}$$$ equal to the index of the $$$j$$$th occurence of the element $$$i$$$:
$$$tans_i = \sum_{j=2}^{cnt_i} = (ind_{i,j} - ind_{i,j-1})(ind_{i,j} - ind_{i,j-1} - 1) / 2$$$
Then for each element $$$i$$$ , we know how many subarrays are there such that the subarray doesnt containt $$$i$$$ (which corresponds to $$$tans_i$$$), now what we do is :
$$$ans = \sum_{i=1}^{\max\limits_{i\epsilon [1,n]}{a_i}} = (n * (n+1) / 2) - tans_i$$$
and we print the $$$ans$$$