Task about finding a substring in a string using a suffix array

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	157
6	Qingyu	156
7	adamant	151
7	djm03178	151
7	luogu_official	151
10	awoo	146

Today I tried to solve this problem (https://codeforces.net/gym/102069/problem/J).

There are two strings s, t. Given q queries. Each request must be answered how many times the string s[l1..r1] lies on the segment t[l2..r2] (l1, r1,l2, r2 we read in each request).

It seems to be obvious that in this case, you can use a suffix array, and then use the bin.search for all the necessary sub-sections (and for O (the length of the string to be found), check whether this sub-section is suitable), then use the segment tree to calculate the answer. However, this takes place in the first two subgroups (takes 25 points). There is one problem, it is that there is no guarantee that the sum of all the substrings that we have to check will not exceed any constant (for example, the length of the string). This means that you need to optimize the binary search (that is, find the desired sub-sections for a constant less than O(the length of the string to be found)). What data structure can be used to do this?

P.S: I was able to optimize my solution (in binary search, you can do a check using another binary search, checking whether this sub-section is suitable, using hashes). But it still takes only 60 points (passes only if the length of string does not exceed 100,000). What else can be optimized?

Comments (7)

Show archived | Write comment?

staniewzki

5 years ago, # |

If you preprocess the lcp array on the suffix array, you can get lcp of two suffixes in O(1) as it is just RMQ, which can be achieved with sparse table. Now you have lcp of any two substrings in O(1), so comparing any two substrings lexicographicly is also in O(1), what leads to binsearch in just O(logn), not including the length of substring in the complexity.

More details

→ Reply

Kavaliro

← Rev. 5 →

I solved it with a different approach.

Concatenate the two string with a diameter in between (S $ T) and build both suffix array and LCP array. Now for each query, find the range in the suffix array which contain the the substring from S, this can be done with binary search and sparse table.

After you find such range for each query, the answer for that query is the number of indices in its range that have values between [L, R-len+1], where L and R are represent the range from T for that query, and len is the length of the substring we're counting.

To count those values, you can either build a segment tree with each node being a sorted vector, and do a binary search on it ( O(n*log^2(n) + m*log^2(n), probably won't get 100 score), or you can process the queries offline. the answer for some query with corresponding range [a, b] in the suffix array, is :

Ans = (Number of values >= L in [a,b]) + (Number of values <= R-len+1 in [a,b]) — ( Length of [a,b] )

So, you can update values from 0 to |S|+|T| in order and find the (Number of values >= L), and similarly for the (Number of values <= R-len+1)

My Submission

P.S: I'm using an O(n) suffix array algorithm, and its code is just unreadable XD.