SPyofgame's blog

By SPyofgame, history, 4 years ago, In English



Table of content

Teleporter Description
I. Lyndon Definitions Definitions of Lyndon word, Lyndon factorization, ...
II. Duval Algorithm Talk about the duval algorithm and how it works
III. Real Life Applications Motivation to learn the algorithm
IV. Programming Applications The code is short and simple to implement
V. My Questions Are there any other applications ?
................................................................... ..........................................................................................................................










I. Lyndon Factorization



1) String Concatenation

  • Definition: The operation of joining character strings end-to-end

  • Property I: Non-empty string $$$S$$$ is a concatenation of all substrings of itself

  • Property II: Non-empty string $$$S$$$ is a concatenation of any empty string at any position in itself with itself

  • Property III: Let $$$A, B, C$$$ the non-empty string then $$$A + B + C = A + (B + C) = (A + B) + C$$$

  • For convention, let define the operator + is string concatenation



2) Lyndon Word

  • Definition: A nonempty string that is strictly smaller in lexicographic order than all of its rotations.

  • Property I: Lyndon word is nonempty and lexicographically strictly smaller than any of its proper suffixes.

  • Property II: Let $$$S, T, Z$$$ is nonempty word. $$$S$$$ is Lyndon word if $$$S < T\ \ \ \forall\ S = Z + T$$$

  • Property III: Lyndon word cant be factorized. It means that the only factor in its factorization is itself.



3) Lyndon Factorization

  • Definition: Split the string into many lyndon words in such a way that the words in the sequence are in lexicographically non-increasing order.

  • Property I: For $$$s = s_1 + s_2 + \dots + s_k$$$ where $$$s_1, s_2, \dots, s_k$$$ are lyndon words and in non-increasing order ($$$s1 \geq s2 \geq \dots \geq s_k$$$)

  • Property II: Lyndon factorization is unique.

  • Property III: The last Lyndon Factor is Lexicographically Smallest Suffix of the string



4) Least Starting Position (LSP)

  • Definition: The minimal position of some substrings that make it LMSR.

  • Property I: Let $$$X$$$ the substring of $$$S$$$ that its starting position $$$p$$$ is LSP. Then some Lyndon Factors in $$$X$$$ has the LSP $$$p$$$

  • Property II: $$$K$$$-repeated String, where each substring has size $$$L$$$ then there are $$$K$$$ LSP: $$$0, L, 2L, \dots, (K-1)L$$$

  • Property III: The Circular String start from LSP of given string is Lexicographically Minimal String Rotation











II. Duval Algorithm



  • Exist Time: 1983

  • Duval algorithm is an effecient algorithm for listing the Lyndon words of length at most $$$n$$$ with a given alphabet size $$$s$$$ in lexicographic order in $$$O(n)$$$

  • The position of the last Lyndon factorized word from Duval algorithm provides minimum cyclic string

The pseudo algorithm
Implementation - O(n) Time - O(1) Auxiliary Space
Detail Implementation - O(n) Time - O(1) Auxiliary Space










III. Real Life Applications



1) Finger print identification:
  • We can encode the finger print into many detailed circular strings. How to search such finger print again from those in very huge data base ? Circular comparision using lyndon factorization is requried.

2) Biological genetics:
  • In some cases, we need to know whether these two group's genetics are belonged to the same bacteria, virus.

3) Games:
  • Well, ofcourse we can apply the algorithm in some language/words-related games










IV. Programming Applications



1) Least rotation to make smallest lexicographical ordered string.


The problem:
  • Given a string $$$S$$$ of size $$$N$$$

  • A right-rotation is that move the leftmost character to rightmost of $$$S$$$

  • Find the least right-rotation to make $$$S$$$ become the smallest lexicographical ordered string

Important Property: After those right-rotations, the string will be Minimum Acyclic String



The solution:
  • One thing we can come in mind is to use hash or string algorithms in $$$O(n\ log\ n)$$$, but they are kinda complex

  • Some other approachs can be found here


  • Bruteforces Solution: Let $$$t = s + s$$$. Then for each substring of size $$$|s|$$$, we compare and find the smallest
Bruteforces Solution - O(n^2) Time - O(n) Auxiliary Space
Optimized Bruteforces Solution - O(n) to O(n^2) Time - O(n) Auxiliary Space

  • Duval Solution: We can apply lyndon factorization with a little bit observation
Detail Duval Solution - O(n) Time - O(n) Auxiliary Space
None-comment Duval Solution - O(n) Time - O(n) Auxiliary Space
Optimized Duval Solution - O(n) Time - O(1) Auxiliary Space

Practices Problem:











V. My Question



1) Are there any other programming applications for Lyndon Factorization ?
  • The algorithm is simple to code while running pretty fast as its low complexities. It would be kinda sad if there is only one main application, isnt it :(

2) Are there any other problems for Lyndon Factorization ?
  • To remember something better and understand the algorithm deeper, we need to practice right :D It would be nice if there are some more problems
  • Vote: I like it
  • +137
  • Vote: I do not like it

»
4 years ago, # |
Rev. 3   Vote: I like it +8 Vote: I do not like it

In the Dual Elimination Solution, if it has a name, please tag me in

There is a funny thing that the solution still work fine with odd length candidate list

And I think there is an improvement to make it completely Linear

EDIT: I gonna make another blog for that

»
4 years ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by SPyofgame (previous revision, new revision, compare).

Added a little bit more main properties and definitions that might help you understand the algorithm

»
4 years ago, # |
  Vote: I like it +1 Vote: I do not like it

You can use Lyndon words to construct a De Bruijn Sequence. See https://github.com/bqi343/USACO/blob/master/Implementations/content/combinatorial%20(11.2)/DeBruijnSeq.h implementation from Benq, which cites this problem.

P.S. sorry for full URL, it seems CF does not like it being embedded.

  • »
    »
    4 years ago, # ^ |
      Vote: I like it +8 Vote: I do not like it

    Thank you, I will add the De Bruijn Sequence Construction soon

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

It is failing on this testcase

input : abaabbaa

correct output :aaabaabb

user output :aabaabba

code