String Algos - Codeforces

Codeforces and Polygon may be unavailable from December 6, 19:00 (UTC) to December 6, 21:00 (UTC) due to technical maintenance. ×

→ Pay attention

Before contest
Codeforces Round 991 (Div. 3)
19:32:54
Register now »

→ Top rated

#	User	Rating
1	tourist	3985
2	jiangly	3741
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3489
7	Radewoosh	3483
8	Kevin114514	3443
9	ecnerwala	3392
9	Um_nik	3392

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	158
5	-is-this-fft-	158
7	awoo	156
8	djm03178	155
9	TheScrasse	154
10	Dominater069	153

View all →

→ Find user

→ Recent actions

Detailed →

andrewtam's blog

String Algos

By andrewtam, history, 3 years ago, In English

After the recent ABC, I realized I suck at string algorithms (the only one I know is hashing). What are the important string algos I should learn (for cp) and in what order should I learn them?

andrewtam
3 years ago
19

Comments (18)

Show archived | Write comment?

tdpencil

3 years ago, # |

+44

Em, String hashing should be enough for most problems, but if you want to go above and beyond, here are some cool things to learn.

Aho-Corasick.

Trie (Prefix Tree).

Suffix Array.

Suffix Tree.

KMP / Z-Algorithm / Manachers.

That should be about it.

→ Reply

dblark

3 years ago, # ^ |

← Rev. 2 →

+11

I think Suffix Automaton is very important as well.

→ Reply

andrewtam

3 years ago, # ^ |

Thank you for the advice! Is there any particular order I should learn these (in terms of importance/usefulness, but also in terms of difficulty/prerequisites)?

→ Reply

tdpencil

3 years ago, # ^ |

Personally, I would say learn as much as you can in any order, but in terms of broadness I would prioritize Trie and Z-algorithm first because they can be widely applied in a bunch of areas. The rest can be learned after, since they aren't extremely common and implementing them yourself might be tricky.

→ Reply

andrewtam

3 years ago, # ^ |

I see. Thanks again!

→ Reply

Errichto

3 years ago, # |

+95

FYI you can get red without knowing any string algorithms other than hashing.

→ Reply

andrewtam

3 years ago, # ^ |

This is actually very interesting to me. I wonder why it is that many coders can get very far into their competitive programming journeys and not really use or learn string algorithms. I myself am an example of this, I have been doing cp for over a year now and have not really touched any string algorithms. Its not that string algorithms aren’t useful, I have seen them in plenty of problems and contests, but I just never bothered to learn them for some reason. From what I have observed of other coders, this is true for a decent amount of them as well.

→ Reply

Errichto

3 years ago, # ^ |

+18

Hard string problems are rare nowadays.

→ Reply

AnandOza

3 years ago, # ^ |

+12

+1, I've still only solved 1 problem in contest that required a "string algorithm" (1466G - Song of the Sirens), and it was a while after I got red. [*]

Still, it's nice to know a few so you don't end up completely stuck on a problem that requires them (and so you don't feel like there's a big gap in your knowledge).

[*] but of course, I'm not sure if there were any problems I should have solved along the way, but lacked the strings knowledge.

→ Reply

malvika.shalvika

3 years ago, # ^ |

Here's the strings problem that appeared in the latest ICPC Regionals from my country (Asia-Kanpur 2019).

Would it be possible to solve this without any fancy strings algorithms?

→ Reply

egor_bb

3 years ago, # |

← Rev. 2 →

+25

Combine hashing with the idea that "if the total length of strings is limited with $$$L$$$, the number of different lengths among strings $$$\le \sqrt{2L}$$$" and you can be completely fine for quite a long time. However, at some point, you will understand that in some problems you basically do the same stuff as in, for example, suffix array, but with hashes. At this moment you will know that learning suffix structures is beneficial for you.

Before this magical moment, knowing how suffix structures work will not be as rewarding: even if you learn how they can be applied, solving problems (beyond educational ones) involving them during contests will be extra tough.

Still, learning simple stuff like prefix- or z- function is rewarding at any level.

→ Reply

andrewtam

3 years ago, # ^ |

If you don't mind, could you please elaborate on your statement: "if the total length of strings is limited with L, the number of different lengths among strings ≤√2L." I'm not sure what is meant by this.

→ Reply

egor_bb

3 years ago, # ^ |

← Rev. 2 →

If in a problem you have $$$N$$$ strings, usually their total length is limited with some number $$$L$$$ (otherwise you will get like $$$10^5 \times 10^5$$$ characters in the input which is impossible). Then, if you can solve the problem in close-to-linear time for all strings of the same length, you can get something like $$$O(L\sqrt{L})$$$ solution which often can be squeezed in the time limit with a couple heuristics.

This idea helps when you have some string queries. If the intended solution requires, for example, suffix array, time limit that allows slow intended implementations to pass will often allow a fast hash-based implementation to squeeze in.

→ Reply

andrewtam

3 years ago, # ^ |

Got it, thanks!

→ Reply

egor_bb

3 years ago, # ^ |

To give you an example, https://codeforces.net/contest/547/problem/E.

Along with the suffix array solution, it has a simple hash-based solution involving the described idea.

→ Reply

tdpencil

3 years ago, # ^ |

Oh, I see. So because there are only sqrt(L) different lengths, we dont have to check all substrings — we can just check the sqrt(L) ones. Then we can just use prefix sums and calculate the answer.

→ Reply

egor_bb

3 years ago, # ^ |

Close enough. You cannot store prefix sums explicitly as it will lead to $$$O(NQ)$$$ complexity. You need to optimise it a little bit (think in binary search direction).

→ Reply

BERNARD

3 years ago, # |

Eertree is a cool data structure that deals with palindromes (also called "Palindromic tree").

Usually, you won't need anything sophisticated like this (maybe for hard problems but still, it's rare), the most important string algorithms/data structures are the basic ones like hashing and trie, and maybe the more complicated (but still, not hard) Z-array and Manacher's algorithm.

→ Reply