Help finding sum of the number of distinct characters in all the distinct substrings of string S.

→ Pay attention

Before contest
Codeforces Round (Div. 1)
6 days

Before contest
Codeforces Round (Div. 2)
6 days

→ Streams

Codeforces Blitz Cup 2025: aryanc403 vs 244mhq (R1)

By aryanc403

Before stream 09:36:10

View all →

→ Top rated

#	User	Rating
1	tourist	3857
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3463
9	ecnerwala	3451
10	heuristica	3431

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	cry	165
2	-is-this-fft-	161
3	Qingyu	160
4	Dominater069	158
5	atcoder_official	157
6	adamant	155
7	Um_nik	151
8	djm03178	150
8	luogu_official	150
10	awoo	148

View all →

→ Find user

→ Recent actions

Detailed →

deepak1527's blog

Help finding sum of the number of distinct characters in all the distinct substrings of string S.

By deepak1527, history, 6 years ago, In English

How to solve the following problem. Find the sum of the number of distinct characters in all the distinct substrings of S. 1<=|S|<=100000. S="aabb"
Set of distinct sub-strings of "aabb" = {a,b,aa,ab,bb,aab,abb,aabb} sum = 1 + 1 + 1 + 2 + 1 + 2 + 2 + 2 = 12.

Thanks!

deepak1527
6 years ago
4

Comments (4)

Write comment?

aviroop123

6 years ago, # |

+28

Can you give the link to the problem so that I can verify that it's not a part of some ongoing contest?

→ Reply

deepak1527

6 years ago, # ^ |

No it's not a part of any ongoing contest, this problem was asked in coding round of a company and that round has ended.

→ Reply

m0nk3ydluffy

6 years ago, # |

+19

I would like to propose a solution. It would be nice if someone can verify it.

The main idea is that for each character, we have to figure out its contribution to the final sum i.e. for each character $$$S_{i}$$$ in our string, we have to find how many distinct substrings exist such that $$$S_{i}$$$ is the first occurrence of its kind in that substring? This can be answered using a Suffix Automaton. (If we didn't have to count distinct substrings, the problem would have an easier solution.)

So we will build a Suffix Automaton on the given string. Lets name the starting node of the automaton as $$$t_{0}$$$. Consider an edge from node $$$u$$$ to node $$$v$$$ using the character $$$c$$$. The contribution of $$$c$$$ to the final sum = $$$dp[u][c] * dp[v]$$$ where,

$$$dp[u][c]$$$ = The number of paths from $$$t_{0}$$$ to $$$u$$$ such that we never use an edge with character $$$c$$$

$$$dp[v]$$$ = The number of paths that begin from node $$$v$$$ and end at any other node.

The above two dp's can be calculated fairly easily. The final answer should be the sum of the contribution of each edge in the Suffix Automaton.

Since you mentioned that this question was asked in the coding round of a company, it should have an easier solution.

→ Reply

MSchallenkamp

6 years ago, # |

We can solve this with a suffix array by recognizing that each unique substring can be represented as a prefix of a suffix.

Let's construct a suffix array. Now we'll walk forward, starting with the first string in the suffix array. For the first string in the suffix array all of the prefixes of this suffix are valid. For the next string only the prefixes longer than the longest common prefix between it and the last string are unique. All the others were counted as part of some earlier suffix. This will take nlogn time to construct the suffix array, nlogn time to find the longest common prefix between every two suffixes, and finally n time to sum all the values for each suffix.

→ Reply