Hello!
Today I will present a data structure that I have worked on for my Bachelor thesis a while ago, called (by me) a PreTree (link to thesis). This is a type of "compressed" trie that is arguably easier to implement than a regular compressed trie, and it achieves merging in amortized $$$O(N L)$$$ complexity for $$$N$$$ keys of equal length $$$L$$$. I will present an implementation as a segment tree, though, as it is easier to understand.
Prerequisite: know segment trees, and "sparse" segment trees (a.k.a. bitwise tries).
Main idea
We will keep a regular segment tree, with the twist that the root contains the minimum key. Basically, a PreTree on keys in the universe $$$[b, e)$$$ is a binary tree $$$T$$$ where either $$$T$$$ is empty, or the following conditions hold:
- $$$T$$$ holds the lowest key present in the range $$$[b, e)$$$
- $$$T$$$'s left son is a PreTree on values $$$[b, \frac{b + e}{2})$$$
- $$$T$$$'s right son is a PreTree on values $$$[\frac{b + e}{2}, e)$$$
Because $$$T$$$ is a one-to-one mapping between tree vertices and keys, then the memory complexity of $$$T$$$ is obviously $$$O(N)$$$.
Implementation
In order to make the implementation easier, we will define a single big operation called "merging" two PreTrees. Suppose we have two trees $$$T1$$$ and $$$T2$$$. Merging the two trees would be done as the following recursive procedure:
function Merge(T1, T2, b, e):
if T1 is empty then return T2
if T2 is empty then return T1
m = (b + e) / 2
if T1.k > T2.k then swap T1 and T2
T1.l = Merge(T1.l, T2.l, b, m)
T1.r = Merge(T1.r, T2.r, m, e)
T1.l = T1.r = empty
if T1.k != T2.k:
## We have to insert T1 into the corresponding child
if T2.k >= m:
T1.r = Merge(T1.r, T2, m, e)
else:
T1.l = Merge(T1.l, T2, b, m)
return T1
Insertion
In order to insert a key into the tree, just merge it with a singleton tree.
Erase
This is a bit more involved. Notice, however, that any PreTree $$$T$$$ is a min-heap on the keys. Find the key and then pop it as you would do in a heap.
Predecessors
In order to find the greatest key smaller than a given $$$k$$$, one can do the following:
function Smaller(T, k):
if T is empty or T.k >= k:
return NOT_FOUND
ret = Smaller(T.r, k)
if ret is NOT_FOUND:
ret = Smaller(T.l, k)
if ret is NOT_FOUND:
ret = T
return ret
Finding the smallest key greater than or equal to a given $$$k$$$ is possible, albeit a little more complicated. The simplest way would be to also keep the maximum on the subtree. I will omit the actual implementation.
Performance
The Merge
function seems not too efficient at first glance. However, the amortized complexity of it is bounded by $$$O(N log(U))$$$, where $$$U$$$ is the size of the universe, and $$$N$$$ is the total number of nodes. In order to see why is that, we will have to look at the depths of each node in the tree.
Let's suppose $$$T1$$$ and $$$T2$$$ have distinct keys. Each non-trivial call to the Merge
function increases the depth of at least one vertex. The reason that is true is because, after each call, the root node with the higher key will get "demoted" one level downwards, therefore increasing its depth by one. As depths of a node can never exceed $$$O(log(U))$$$, the total complexity cannot exceed $$$O(N log(U))$$$.
Follow-up 1: Prove that this still holds for non-distinct keys!
Follow-up 2: What about erase operations? How do they affect the amortized analysis?
Follow-up 3: What about (Treap) splitting operations? Can you split T into two trees, depending on some (monotonic) predicate? How does that affect complexity?
Applications
One of the most standard applications of this is to improve the $$$O(N log^2(N))$$$ bound on various problems involving the small-to-large trick. It has the additional improvement of having $$$O(N)$$$ memory without making the implementation a lot harder.
Example problems:
- 343D. Water Tree | AC Submission (here we can implicitly encode the keys in the nodes' value, yielding a very elegant and short implementation)
Could you tell me about some other problems involving some form of sparse segment trees and other tasks where this could come handy, to add them to the blog?