Simple Linear and Effectively Duval Algorithm for Lyndon Factorization

Revision en20, by SPyofgame, 2021-04-23 02:58:45



Table of content

Teleporter Description
I. Lyndon Definitions Definitions of Lyndon word, Lyndon factorization, ...
II. Duval Algorithm Talk about the duval algorithm and how it works
III. Real Life Applications Motivation to learn the algorithm
IV. Programming Applications The code is short and simple to implement
V. My Questions Are there any other applications ?
................................................................... ..........................................................................................................................










I. Lyndon Factorization



1) String Concatenation

  • Definition: The operation of joining character strings end-to-end

  • Property I: Non-empty string $$$S$$$ is a concatenation of all substrings of itself

  • Property II: Non-empty string $$$S$$$ is a concatenation of any empty string at any position in itself with itself

  • Property III: Let $$$A, B, C$$$ the non-empty string then $$$A + B + C = A + (B + C) = (A + B) + C$$$

  • For convention, let define the operator + is string concatenation



2) Lyndon Word

  • Definition: A nonempty string that is strictly smaller in lexicographic order than all of its rotations.

  • Property I: Lyndon word is nonempty and lexicographically strictly smaller than any of its proper suffixes.

  • Property II: Let $$$S, T, Z$$$ is nonempty word. $$$S$$$ is Lyndon word if $$$S < T\ \ \ \forall\ S = Z + T$$$

  • Property III: Lyndon word cant be factorized. It means that the only factor in its factorization is itself.



3) Lyndon Factorization

  • Definition: Split the string into many lyndon words in such a way that the words in the sequence are in lexicographically non-increasing order.

  • Property I: For $$$s = s_1 + s_2 + \dots + s_k$$$ where $$$s_1, s_2, \dots, s_k$$$ are lyndon words and in non-increasing order ($$$s1 \geq s2 \geq \dots \geq s_k$$$)

  • Property II: Lyndon factorization is unique.

  • Property III: The last Lyndon Factor is Lexicographically Smallest Suffix of the string



4) Least Starting Position (LSP)

  • Definition: The minimal position of some substrings that make it LMSR.

  • Property I: Let $$$X$$$ the substring of $$$S$$$ that its starting position $$$p$$$ is LSP. Then some Lyndon Factors in $$$X$$$ has the LSP $$$p$$$

  • Property II: $$$K$$$-repeated String, where each substring has size $$$L$$$ then there are $$$K$$$ LSP: $$$0, L, 2L, \dots, (K-1)L$$$

  • Property III: The Circular String start from LSP of given string is Lexicographically Minimal String Rotation











II. Duval Algorithm



  • Exist Time: 1983

  • Duval algorithm is an effecient algorithm for listing the Lyndon words of length at most $$$n$$$ with a given alphabet size $$$s$$$ in lexicographic order in $$$O(n)$$$

  • The position of the last Lyndon factorized word from Duval algorithm provides minimum cyclic string

The pseudo algorithm
Implementation - O(n) Time - O(1) Auxiliary Space
Detail Implementation - O(n) Time - O(1) Auxiliary Space










III. Real Life Applications



1) Finger print identification:
  • We can encode the finger print into many detailed circular strings. How to search such finger print again from those in very huge data base ? Circular comparision using lyndon factorization is requried.

2) Biological genetics:
  • In some cases, we need to know whether these two group's genetics are belonged to the same bacteria, virus.

3) Games:
  • Well, ofcourse we can apply the algorithm in some language/words-related games










IV. Programming Applications



1) Least rotation to make smallest lexicographical ordered string.


The problem:
  • Given a string $$$S$$$ of size $$$N$$$

  • A right-rotation is that move the leftmost character to rightmost of $$$S$$$

  • Find the least right-rotation to make $$$S$$$ become the smallest lexicographical ordered string

Important Property: After those right-rotations, the string will be Minimum Acyclic String



The solution:
  • One thing we can come in mind is to use hash or string algorithms in $$$O(n\ log\ n)$$$, but they are kinda complex

  • I will make a new blog about all other approachs


  • Bruteforces Solution: Let $$$t = s + s$$$. Then for each substring of size $$$|s|$$$, we compare and find the smallest
Bruteforces Solution - O(n^2) Time - O(n) Auxiliary Space
Optimized Bruteforces Solution - O(n) to O(n^2) Time - O(n) Auxiliary Space

  • Duval Solution: We can apply lyndon factorization with a little bit observation
Detail Duval Solution - O(n) Time - O(n) Auxiliary Space
None-comment Duval Solution - O(n) Time - O(n) Auxiliary Space
Optimized Duval Solution - O(n) Time - O(1) Auxiliary Space

Practices Problem:











V. My Question



1) Are there any other programming applications for Lyndon Factorization ?
  • The algorithm is simple to code while running pretty fast as its low complexities. It would be kinda sad if there is only one main application, isnt it :(

2) Are there any other problems for Lyndon Factorization ?
  • To remember something better and understand the algorithm deeper, we need to practice right :D It would be nice if there are some more problems
Tags string algorithm, lyndon words, lyndon factorization, duval algorithm

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en26 English SPyofgame 2021-04-25 17:04:35 91
en25 English SPyofgame 2021-04-23 07:30:37 5 Tiny change: '+ ... + sx) cases\n ' -> '+ ... + sx + sy) cases\n '
en24 English SPyofgame 2021-04-23 05:33:30 0 (published)
en23 English SPyofgame 2021-04-23 05:33:05 128759 Reverted to en21
en22 English SPyofgame 2021-04-23 05:32:05 128759 (saved to drafts)
en21 English SPyofgame 2021-04-23 03:18:18 240
en20 English SPyofgame 2021-04-23 02:58:45 634
en19 English SPyofgame 2021-04-22 19:45:39 2 Tiny change: 'x[i] = s[i mod n]$ (mean' -> 'x[i] = s[i\ mod\ n]$ (mean'
en18 English SPyofgame 2021-04-22 19:27:38 30
en17 English SPyofgame 2021-04-22 19:26:32 17
en16 English SPyofgame 2021-04-22 19:19:53 0 (published)
en15 English SPyofgame 2021-04-22 19:19:36 137 (saved to drafts)
en14 English SPyofgame 2021-04-22 19:13:13 49
en13 English SPyofgame 2021-04-17 12:52:26 1961
en12 English SPyofgame 2021-04-15 02:55:21 60
en11 English SPyofgame 2021-04-14 22:41:27 24 Tiny change: '------\n\n' -> '------\n\n--------------------\n\n'
en10 English SPyofgame 2021-04-14 16:57:32 9 Tiny change: '-\n\n- **Better Solution:' -> '-\n\n- **Bruteforces Solution:'
en9 English SPyofgame 2021-04-14 16:57:02 5097
en8 English SPyofgame 2021-04-14 16:54:37 1196
en7 English SPyofgame 2021-04-14 15:25:50 76
en6 English SPyofgame 2021-04-14 15:19:41 6687 Tiny change: 'oiler>\n\n------' -> 'oiler>\n\n\n\n------'
en5 English SPyofgame 2021-04-14 14:29:22 76
en4 English SPyofgame 2021-04-14 04:57:52 12
en3 English SPyofgame 2021-04-14 04:55:32 1 Tiny change: '.... |\n\n\n\n------' -> '.... |\n\n \n\n------'
en2 English SPyofgame 2021-04-13 16:58:24 439
en1 English SPyofgame 2021-04-13 16:54:43 13595 Initial revision (published)