Hardstone's hard problem

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	157
6	Qingyu	156
7	djm03178	152
7	adamant	152
9	luogu_official	150
10	awoo	147

hardstone.Orz gives you an integer array $$$A$$$. The length of $$$A$$$ is $$$n$$$ and there are $$$m$$$ distinct numbers in $$$A$$$. Count the number of tuple $$$(l, r)$$$, $$$1 \leq l \leq r \leq n$$$, such that:

Numbers that appear in the interval $$$a[l...r]$$$ appear the same number of times.

For example, $$$A=[1,2,1,2]$$$, then there are $$$8$$$ legal tuples: $$$(1, 1), (2, 2), (3, 3), (4, 4), (1, 2), (2, 3), (3, 4), (1, 4)$$$.

This is an open problem with brain storm. $$$O(n^2m)$$$ brute force using the prefix sum and $$$O(n2^m)$$$ brute force using bitmasks and hashtable are easy to come up with. I am looking for a $$$O(nmlog^k)$$$ solution. Are there any smart data structures?

Note that when $$$A = [1,2,3]$$$, all intervals are legal. For example, $$$[1, 2]$$$ is legal, as both $$$1$$$ and $$$2$$$ appear once. We do not care about $$$3$$$ because $$$3$$$ does not appear.

amenotiomoi proposes a genius randomized idea, which could make my yesterday's idea work: Similar to the Zobrist hashing, we assign a random value to each distinct integer. We record the prefix sum of the hash values in a hashtable (let $$$h_r$$$ be the prefix sum of hash values of $$$a[1...r]$$$). Then, we fix $$$l$$$ and count the number of $$$r$$$ with respect to this $$$l$$$. For each $$$l$$$, we denote $$$p(l, j)$$$ be the first place $$$j$$$ appears after $$$l$$$ (inclusive), somehow like std::string.find(j, l). If $$$j$$$ never appears after $$$l$$$, $$$p(l, j) = \infty$$$. For example, if $$$A=[4,1,2,3]$$$, then $$$p(2, 1)=2, p(2,2)=3, p(2,3)=4, p(2,4) = \infty$$$. The array $$$p$$$ could be fould via binary search in $$$O(mlogn)$$$. Note that $$$p(l, j) \neq p(l, k)$$$ if $$$j \neq k$$$. Then, we sort the pair $$${p(l, j), j}$$$ in the ascending order of $$$p(l, j)$$$, and let $$$q$$$ be the sorted list. The complexity of sorting is $$$O(mlogm)$$$. For two adjacent elements of $$$q$$$, the present and absent numbers could be uniquely determined. For example, $$$A=[1,2,2,2,3]$$$, $$$l=1$$$, $$$2 \leq r \leq 4$$$, then $$$1, 2$$$ appear and $$$3$$$ is absent. Therefore we need to find the number of $$$r$$$, $$$2 \leq r \leq 4$$$, such that $$$1$$$ and $$$2$$$ appear the same number of times with in $$$a[l...r]$$$. Yesterday I was stuck here. But with the genius hashtable, we only need to count $$$r$$$ that $$$(hashvalue(1) + hashvalue(2)) \mid h_r - h_{l-1}$$$. By the pigeon hole principle, the number appear the least number of times appear at most $$$\frac{n}{i}$$$ times, then we only need to enumerate $$$\frac{n}{i}$$$ items for each adjacent pair of $$$q$$$, there are $$$m$$$ adjacent pairs, and querying the hash table is $$$O(logn)$$$ (using std::map) or amortized $$$O(1)$$$ (using std::unordered_map), therefore the overall complexity could be reduced to $$$O(n(mlogn + mlogm + \sum\limits_{i=1}^m\frac{n}{i}logn)) = O(n(mlogn + mlogm+nlogmlogn))$$$. But this is not deterministic, and the error probability is hard to estimate, heavily depending on implementation.

Rev.	By	When	Δ	Comment
en18	dfsof	2023-08-17 10:52:07	7	Tiny change: 'fter $l$, we let $p(l, j) ' -> 'fter $l$, $p(l, j) '
en17	dfsof	2023-08-17 06:48:21	10
en16	dfsof	2023-08-17 06:44:25	2	Tiny change: '2,2,3]$, $r=1$, $2 \l' -> '2,2,3]$, $l=1$, $2 \l'
en15	dfsof	2023-08-17 06:26:34	5	Tiny change: '= O(n(mlogm + mlogn +nlogmlogn' -> '= O(n(mlogn + mlogm+nlogmlogn'
en14	dfsof	2023-08-17 06:23:56	0	(published)
en13	dfsof	2023-08-17 06:23:51	5	Tiny change: 'could be find via bina' -> 'could be fould via bina' (saved to drafts)
en12	dfsof	2023-08-17 06:22:37	0	(published)
en11	dfsof	2023-08-17 06:22:30	36	(saved to drafts)
en10	dfsof	2023-08-17 06:19:31	15	Tiny change: 'r$ be the hash value of $a[1..' -> 'r$ be the prefix sum of hash values of $a[1..'
en9	dfsof	2023-08-17 06:18:58	0	(published)
en8	dfsof	2023-08-17 06:18:53	1	Tiny change: ' $O(mlogm). For two ' -> ' $O(mlogm)$. For two ' (saved to drafts)
en7	dfsof	2023-08-17 06:17:52	0	(published)
en6	dfsof	2023-08-17 06:17:45	691
en5	dfsof	2023-08-17 06:10:29	739
en4	dfsof	2023-08-17 06:02:41	86
en3	dfsof	2023-08-17 06:02:04	542	(saved to drafts)
en2	dfsof	2023-08-17 05:34:18	185
en1	dfsof	2023-08-17 05:13:45	695	Initial revision (published)

History