I recently completed a project, to solve a very general version of the associative range query problem using Rust's polymorphism. These are solved with some variation of segment trees; I prefer to call my generalized version an arq tree: "arq" is pronounced "arc" which has a similar meaning to segment, but also stands for "associative range query".
Associativity and Semigroups
You being with a finite array of elements a_1, a_2, a_3, ..., a_n. Each a_i belongs to a semigroup (S, +), where + stands for any associative binary operation on S.
Associative Law: +: S x S -> S such that (a + b) + c = a + (b + c) for all a, b, c in S.
Because + is associative, we can drop the parentheses and talk about range aggregates of the form a_l + a_{l+1} + ... a_r.
The ARQ Problem
In the Associative Range Query problem, we wish to support two types of operations: - Given bounds l and r, compute the aggregate a_l + a_{l+1} + ... a_r - Given bounds l and r, and a function f: S -> S, replace a_i with f(a_i) for all l <= i <= r.
In typical instances where computing a + b or f(a) take O(1) time, we wish to support each operation in O(log n) time.
Identity and Monoids
The segment tree is often presented as operating on monoids instead of semigroups. A monoid is simply a semigroup with a special identity element:
Identity Law: id in S such that a + a + id = id + a = a for all a in S.
Here's how you would represent Semigroup and Monoid as traits in Rust. The programmer must verify that each function satisfies its corresponding law:
trait Semigroup {
fn compose(&self, other: &Self) -> Self;
}
trait Monoid: Semigroup {
fn identity() -> Self;
}
In practice the trait bound Monoid turns out to be equivalent to Semigroup + Clone, and either of the two would suffice for our purposes. A semigroup can trivially be extended into a monoid by adding an identity element, and a monoid can implement Clone via composition with the identity element:
impl<T: Semigroup + Clone> Semigroup for Option<T> {
fn compose(&self, other: &Self) -> Self {
match self {
Some(ref a) => match other {
Some(ref b) => Some(a.compose(b)),
None => self.clone()
},
None => other.clone()
}
}
}
impl<T: Semigroup + Clone> Monoid for Option<T> {
fn identity() -> Self {
None
}
}
impl<T: Monoid> Clone for T {
fn clone(&self) -> Self {
self.compose(T::identity())
}
}
Note that if a type implements Monoid, then Clone is not needed since a copy can be generated by composition with the indentity. We could have made our arqtree operate on elements of a type implementing Semigroup + Clone, but I think it's more convenient to work with Monoid. And so, we have our arqtree API:
pub struct ArqTree<T> {
val: Vec<T>
}
impl<T: Monoid> ArqTree<T> {
fn modify(&mut self, pos: usize, f: &dyn Fn(&T) -> T) {
// implement modify
}
fn query(&self, l: usize, r: usize) -> T {
// implement query
}
}
However, this will not suffice when we need to do range updates! In order to update an entire range efficiently, we will need to apply f lazily, storing it in internal nodes of the tree. If multiple updates are performed, we may have to store a composition of updates for postponed application. While one may implement a composition operation f \circle g which simply calls applies g first and then f, this makes the cost of function application no longer O(1)!
Thus, we must switch from function pointers to an explicit, composable representation of our update operator. For instance, instead of storing a function that adds 10, we might store the number 10 to remember that we should add 10 later. The composition of an add-10 with an add-5 is an add-15, which still takes O(1) time.
pub struct ArqTree<T: ArqSpec> {
app: Vec<Option<T::F>>,
val: Vec<T::M>,
}
impl<T: ArqSpec> ArqTree<T>
where
T::F: Clone,
{ ... }
pub trait ArqSpec {
/// Type of data representing an endomorphism.
// Note that while a Fn(M) -> M may seem like a more natural representation
// for an endomorphism, compositions would then have to delegate to each of
// their parts. This representation is more efficient.
type F;
/// Type of monoid elements.
type M;
/// For eager updates, compose() ho be unimplemented!(). For lazy updates:
/// Require for all f,g,a: apply(compose(f, g), a) = apply(f, apply(g, a))
fn compose(f: &Self::F, g: &Self::F) -> Self::F;
/// For eager updates, apply() can assume to act on a leaf. For lazy updates:
/// Require for all f,a,b: apply(f, op(a, b)) = op(apply(f, a), apply(f, b))
fn apply(f: &Self::F, a: &Self::M) -> Self::M;
/// Require for all a,b,c: op(a, op(b, c)) = op(op(a, b), c)
fn op(a: &Self::M, b: &Self::M) -> Self::M;
/// Require for all a: op(a, identity()) = op(identity(), a) = a
fn identity() -> Self::M;
}