I want to talk about segment trees or, as I prefer to call them, arqtrees. The "arq" is pronounced "arc", which has a similar meaning to "segment", but it also stands for "associative range query". I believe this name gets to the crux of what this data structure is: you start with a semigroup (S, +), where + stands for any associative binary operation. In other words, it satisfied the following law:
(a + b) + c = a + (b + c) for all a, b, c in S.
In other words, parentheses can be ignored and so it makes sense to talk about range queries of the form:
a_i + a_{i+1} + ... + a_j.
Note that some arqtree introductions talk about monoids instead of semigroups. The only difference is that monoids have an identity element id, satisfying the following law:
a + id = id + a = a for all a in S.
Here's how you would represent Semigroup and Monoid as traits in Rust. Each function must satisfy its corresponding law: ~~~~~ trait Semigroup { fn compose(&self, other: &Self) -> Self; }
trait Monoid: Semigroup { fn identity() -> Self; } ~~~~~ Any semigroup can trivially be extended into a monoid by adding an identity element:
impl<T: Semigroup + Clone> Semigroup for Option<T> {
fn compose(&self, other: &Self) -> Self {
match self {
Some(ref a) => match other {
Some(ref b) => Some(a.compose(b)),
None => self.clone()
},
None => other.clone()
}
}
}
impl<T: Semigroup + Clone> Monoid for Option<T> {
fn identity() -> Self {
None
}
}
Note that if a type implements Monoid, then Clone is not needed since a copy can be generated by composition with the indentity. We could have made our arqtree operate on elements of a type implementing Semigroup + Clone, but I think it's more convenient to work with Monoid. And so, we have our arqtree API:
pub struct ArqTree<T> {
val: Vec<T>
}
impl<T: Monoid> ArqTree<T> {
fn modify(&mut self, pos: usize, f: &dyn Fn(&T) -> T) {
// implement modify
}
fn query(&self, l: usize, r: usize) -> T {
// implement query
}
}
However, this will not suffice when we need to do range updates! In order to update an entire range efficiently, we will need to apply f lazily, storing it in internal nodes of the tree. If multiple updates are performed, we may have to store a composition of updates for postponed application. While one may implement a composition operation f \circle g which simply calls applies g first and then f, this makes the cost of function application no longer O(1)!
Thus, we must switch from function pointers to an explicit, composable representation of our update operator. For instance, instead of storing a function that adds 10, we might store the number 10 to remember that we should add 10 later. The composition of an add-10 with an add-5 is an add-15, which still takes O(1) time.
pub struct ArqTree<T: ArqSpec> {
app: Vec<Option<T::F>>,
val: Vec<T::M>,
}
impl<T: ArqSpec> ArqTree<T>
where
T::F: Clone,
{ ... }
pub trait ArqSpec {
/// Type of data representing an endomorphism.
// Note that while a Fn(M) -> M may seem like a more natural representation
// for an endomorphism, compositions would then have to delegate to each of
// their parts. This representation is more efficient.
type F;
/// Type of monoid elements.
type M;
/// For eager updates, compose() ho be unimplemented!(). For lazy updates:
/// Require for all f,g,a: apply(compose(f, g), a) = apply(f, apply(g, a))
fn compose(f: &Self::F, g: &Self::F) -> Self::F;
/// For eager updates, apply() can assume to act on a leaf. For lazy updates:
/// Require for all f,a,b: apply(f, op(a, b)) = op(apply(f, a), apply(f, b))
fn apply(f: &Self::F, a: &Self::M) -> Self::M;
/// Require for all a,b,c: op(a, op(b, c)) = op(op(a, b), c)
fn op(a: &Self::M, b: &Self::M) -> Self::M;
/// Require for all a: op(a, identity()) = op(identity(), a) = a
fn identity() -> Self::M;
}