"Merging treaps" -- or how to merge sorted sets in good complexity...

Revision en1, by bicsi, 2022-11-01 11:03:35

... without doing much of anything.

I've been obsessed for the last two days about problem Data Centers from EGOI 2022. In particular, it has a subtask where $$$1 \leq N \leq 10^5$$$ and $$$1 \leq S \leq 5000$$$, where it seems $$$O(NS)$$$ wouldn't fit TL. I've been conjecturing that using treaps would amortize to good complexity, and after two days of burning grey matter, I've finally found a proof, and it's pretty cool to make a blog out of it.

The problem basically starts with an array $$$[v_1, v_2, ..., v_N]$$$. It then asks us to iteratively simulate the operation: "Subtact $$$x_i$$$ from the biggest $$$m_i$$$ elements.". All this while keeping the array sorted.

We'll show that just using treaps (or any BBST for that matter) yields a good amortized complexity. More exactly, we'll prove $$$O((N + Q) \log V \log N)$$$, and it should actually be better in practice.

Naive solution with treaps

Of course, the "naive" solution (which is much of what we'll implement here), does three things:

  1. Extract the $$$m_i$$$ suffix of the sorted array (we'll call it $$$B$$$, and the remainder $$$A$$$)
  2. Subtract $$$x_i$$$ from all elements of $$$B$$$
  3. Merge the resulting sequences $$$A$$$ and $$$B$$$

Using treaps, it's easy to see that steps 1 and 2 can be done in $$$O(\log N)$$$ via split and lazy propagation. Step 3 is the heaviest, requiring potentially $$$O(N)$$$ and maybe even $$$O(N \log N)$$$ steps depending on implementation.

Interweaving number

For a given merging process of sequences $$$A$$$ and $$$B$$$, let $$$k$$$ be the number of interweaving segments. See example below:

A = {2, 5,          15,   25, 30}
B = {      7, 9, 12,    21      }

In the example above, we have $$$k = 5$$$ interweaving segments (of lengths $$$2$$$, $$$3$$$, $$$1$$$, $$$1$$$, and $$$2$$$ respectively).

First, it is rather straightforward to write an implementation of merge on BBST that has complexity $$$O(k \log N)$$$ (via $$$k$$$ binary searches and $$$k$$$ splits).

Secondly, there can be at most $$$O((N + Q) \log V)$$$ merging interweaving segments over an array of size $$$N$$$ and $$$Q$$$ queries.

Potential method

For a given (increasing) sequence $$$A = [x_1, x_2, ..., x_n]$$$, let's define:

$$$\pi(A) = \sum_{i=1}^{n-1} \log(x_{i+1} - x_{i}) $$$

(all logarithms are base-2)

Let's suppose we are merging two sequences $$$A$$$ and $$$B$$$, and there are $$$k$$$ interweaving segments. See example below:

             <-d1->         <--d2-->      <--d3-->       <---d4--->
A = {[--a1--],                      [-a2-],                        [--a3--]}
B = {              [---b1---],                     [-b2-]                  }

In the picture above, the segments [-----] are placeholder to some sequence of elements. Numbers $$$d_i$$$ are the distances between numbers, e.g. $$$d_1 = first(b_i) - last(a_1)$$$.

The difference between potentials before and after merge $$$\Delta \pi = \pi (A) + \pi (B) - \pi (A \cup B)$$$ is:

$$$\Delta \pi = (\log (d_1 + b_1 + d_2) + \log (d_2 + a_2 + d_3) + \cdots + \log (d_{k-1} + a_{k/2} + d_{k})) - (\log d_1 + \log d_2 + \cdots + \log d_k) \quad (1)$$$

Nonetheless, we can see that:

$$$\Delta \pi \geq (\log (d_1 + d_2) + \cdots + \log (d_{k-1} + d_k)) - (\log d_1 + \cdots + \log d_k) \quad (2) $$$

(more naturally, the "worst case" is when doing a Faro shuffle, i.e. interweaving element by element).

However, because $$$\log$$$ is concave, we can use that $$$\log (\frac{a + b}{2}) \geq \frac{\log a + \log b}{2}$$$. Reformulating, we obtain:

$$$\log(a+b) \geq 1 + \frac{1}{2}(\log a + \log b) \quad (3)$$$

We'll add and subtract $$$\log (d_k + d_1)$$$ to inequality $$$(2)$$$, use inequality $$$(3)$$$, and simplify everything to finally obtain:

$$$\Delta \pi \geq k - \log (d_1 + d_k)$$$

(thanks freak93 for the easy one-liner about concavity of $$$\log$$$ )

The initial potential of the sequence is bounded by $$$N \log V$$$, and each merge operation adds at most $$$\log V$$$ to the potential, yielding a total of $$$(N + Q) \log V$$$ potential. Then, the number of interweavings is at most $$$(N + Q) \log V$$$.

Extra

It's not hard to see the power of this result. In essence, you can do all sorts of operations like: adding/subtracting some value $$$x$$$ to an arbitrary subsequence, dividing subsequence by some value, square rooting subsequence, and so on. All while keeping the sequence sorted.

Also, doing $$$\log N$$$ binary searches on treaps might be inconvenient. Another approach that is easier to code and faster is to use the treap union function (source: Wikipedia):

function union(t1, t2):
    if t1 = nil:
        return t2
    if t2 = nil:
        return t1
    if priority(t1) < priority(t2):
        swap t1 and t2
    t<, t> ← split t2 on key(t1)
    return join(union(left(t1), t<), key(t1),
                    union(right(t1), t>))

Conclusions

  • It feel like this is a bit more general method to this
  • Replace join treap operation in your library with merge
  • I feel like this trick is well known in some countries...
Tags treaps, bbst, merge, amortized

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en2 English bicsi 2022-11-01 11:13:42 41 Tiny change: '_i \leq V$. It then' -> '_i \leq V$ for all $i$. It then'
en1 English bicsi 2022-11-01 11:03:35 5258 Initial revision (published)