Let's Discuss a question I recently came across, it seems easy but constraints make it too hard (At least for me).
Given an array A of size N, find the maximum size of subarray in which frequency of all the elements are same.
1 <= N <= 3 x 10^5
1 <= A[i] <= 10^9
O(N^2) Approach I guess will be to maintain 2 maps and consider all subarray.
Can be do it better than O(N^2)?
What will be CF difficulty of this question with this constraints?
First of all N is 3 * 10 ^ 5 so you can use compression to reduce the complexity by logN.(Maps will add a logN factor.) Also you cannot use binary search for the answer, as for example; [1, 2, 2, 1, 3, 3, 4, 4, 4, 3, 4, 3] has a subarray (frequency of all elements are the same) of length 6 and 8 but not 7. It also has a subarray (frequency of all elements are the same) with all elements appearing 2 and 4 times but none where this count is 3. So I believe O(N^2) should be the best complexity.
I think we can maintain two maps here for :
-freq of element and freq of these freq
and we can use sliding window after it to check if the window is valid or not , which we can calculate in const time if the map having freq of freq's size is 1 i.e, there is only one freq of elements present in current subarray . If window is valid then find the max size , if not valid then shrink it upto curr element or while its valid.
Am I missing something??
[1, 2, 3, 3, 2, 1] For this case when you get the second 3 it isnt valid but if you shrink it you cannot get the answer which is the entire array.
we can construct frequency array fr of the elements and sort it in descending order. Now to check if a subarray of length L is possible, we need to iterate over all divisors(d) of L . It is possible to construct a subarray of length L with no of distinct elements d if fr[d] >= (L/d).
[1, 2, 3 4, 1, 2, 3, 1, 2, 1] => Array [4, 3, 2, 1] => fr lets consider L = 4 1 2 are d we can use, we cannot construct a subarray of length 4 with 2 distinct elements.Even though fr[2] = 2 >= 4/2 = 2. I think your solution works for subsequences only.
Sorry, I got confused between subarray and subsequence.
yupp right bro
didn't thought of this one ..
1446D2 - Frequency Problem (Hard Version) sounds similar
Does not look like the problem could be solved in faster than $$$O(N^2)$$$
After some deeper thought I came to a conclusion that this idea (at least as it stood in Rev3) does not work.
I won't delete this coment just so replies to this one don't look out of place, and in case that someone else has an idea how to build on the thought I had.
I think this looks right, but how do you hash the map efficiently?
Right now the only hash i can think of is polynomial hashing, which we can actually support by compressing the numbers.
So instead of a map we use a hist array, and then which each plus or minus 1 in it you can update the hash in O(1), but i dont know how good polynomial hashing is on arrays instead of strings.
Maybe using a few different bases or modulos can make it good enough if it has too much collisions.
Yeah sorry I forgot to explain that. I think polynomial hashing is good enogh. There will be at most $$$3*10^{5}$$$ different maps to hash, so if we choose a modulo like $$$10^{9}+7$$$, then it will be enough. In case you are still worried you can use multiple values as mod.
Using this approach at every step where you add 1 to value occurrences in the map, then you only need to add one value to the hash, and you don't need to recalculate the hash all over again.
I have an approach working in O(n^1.5*log(n)). Denote the length of an optimal subarray as a, frequency of elements in this subarray as f, number of distinct elements in this subarray as d. Then, a = f*d => min(f,d) <= sqrt(a) => min(f,d) <= sqrt(n) 1) Assume f <= sqrt(n). Let's iterate over all possible frequencies 1 <= f <= sqrt(n) and find optimal answer if frequency is exactly f. For each f we can find optimal answer in O(nlogn) by doing the following : Let's maintain some array b, which can do 2 types of operations in O(logn): 1. Adding some number x on some interval [l,r] . 2. Finding the last element equal to 0 . We can move from right to left and when we are at index i, we add 1 to b in range [i;n]. If there are f or more occurances of A[i] in range[i;n], we add -f to b in range(f-th occurance of A[i];n] (We can find f-th occurance of A[i] by maintaining deque of indices for each A[i]). If there are f+1 or more occurances of A[i] in range[i;n], we add f to b in range[(f+1)-th occurance of A[i];n]. After doing it we can find the last element equal to 0,let's denote its index as j, then we can do answer = max(answer,j-i+1), since [i,j] will be "good" subarray and it is also the longest "good" subarray starting from index i and having frequency equal to f . 2) Assume d <= sqrt(n). Let's iterate over all 1 <= i <= n and calculate an array V of sqrt(n) distinct values if we moved from i-th index to the first index. If a 'good' subarray ends at index 1, its distinct numbers must form some prefix of V. By using the idea of cacteyy in the comment above, we can find the length of an optimal subarray ending on index i and having <=sqrt(n) distinct values .
here is the implementation (might have bugs)
https://algnotes.wordpress.com/2012/04/23/longest-subsequence-where-all-elements-have-the-same-frequency/ Found a solution but don't think so it works in O(n)