allthecode's blog

By allthecode, 11 years ago, In English

Motivating problem: You have an array of size N with N different numbers; calculate the sum of the numbers in the interval of a to b.

What is the proof that a Segment Tree performs those queries in log(n) time? Or where can I find the proof? Thanks.

  • Vote: I like it
  • -4
  • Vote: I do not like it

| Write comment?
»
11 years ago, # |
  Vote: I like it +13 Vote: I do not like it

The segment tree has O(log(n)) levels. The algorithm visits at most O(1) nodes per level during the query, because of this:

Let [L, R) be the interval covered by current vertex. Let denote # number covered by [a, b) and . the other nodes.

Only these situations can occur:

  1. if R<=a || b<=L, we dont care about this node, situation looks like ..........
  2. if a<=L && b<=R, whole node is in interval [a,b), situation looks like ##########
  3. L==a or b==R, #######... or .......###
  4. none of the previous options ...#####..

The situations 1 and 2 cost O(1) time.

The situation 3 divides the interval into two smallers ones. At most one of children will have type 3, the others will be 1 or 2.

  • ...####### -> ...## + ##### (3 -> 3, 2)
  • .......### -> ..... + ..### (3 -> 1, 3)
  • .....##### -> ..... + ##### (3 -> 1, 2)

The situation 4 can divide into:

  • ....#####.. -> ...## + ###.. (4 -> 3, 3)
  • ......##... -> ..... + .##.. (4 -> 1, 4)
  • .....###... -> ..... + ###.. (4 -> 1, 3)

So there will be only one child of type 4 or there will be at most two children of type 3.

At the beginning of query we have one of the stituations 1, 2, 3, 4 on top level. It is easy to see that there can be at most 1 node of type 4 per level and if there is no type 4 there can be at most two nodes of type 3 per level.

So there can be only 4 = O(1) nodes visited in each level (at most two of type {3,4})

That means O(log(n)) visited nodes. Inside each node we spent O(1) time.

»
11 years ago, # |
  Vote: I like it +8 Vote: I do not like it

Well, firstly, you have to build the tree in O(nlogn). Every vertex has: interval covered, sum-to-be-added and overall-sum. So when you go deeper if the interval searched is bigger than the covered you just get the sum. So in the worst case you will have query (a = n/2, b=n/2+1) for which you will go to 2 different leaves of the tree. This means 2*logn operations at most in one query. The sum-to-be-added value is used if you have updates for an interval for the same reason. When you reach a fully covered interval you update that value, so that you can use it in the queries by adding that value times the length of the answer to the result. PS: The update trick with sum-to-be added is called lazy propagation, so you can look for it on the net.

»
11 years ago, # |
  Vote: I like it 0 Vote: I do not like it

A short idea of lazy propagation:

Let's take the same type you propose, where queries are for sum of interval.

Imagine that you want to increase all the numbers in the interval corresponding to node n by x. You don't need to update the whole subtree; instead, just mark in n "for this subtree, each element is actually plus x". Now, you don't traverse deeper into the subtree anymore.

Now, what happens if you need to traverse some of the nodes of this subtree later? You must pass through n. And at that point, you can update the sons of n — if all elements in the interval of n are larger by x, then so will be all the elements in the intervals of the sons of n (since those 2 together make the interval of n). So, you just copy the information "for this subtree, each element is actually plus x" into the sons of n.

Now, you must make sure you won't repeat that again for the same information. In node n, you remember the sum of its interval [z,k). Increase that sum by x * (k - z) (all the k - z elements are increased by x). And set x = 0. If there are more queries which end at node n, x can be further modified.

The details of how the updates are done differ slightly between different types of trees, but the overall idea stays the same: if you know that an operation is applied to a subtree, store it at the root.

»
11 years ago, # |
  Vote: I like it +3 Vote: I do not like it

It´s because a segment tree have log(n) levels, and in every step, if your querry is outside the interval of the node that you are you can discard this node, if all your querry is inside the interval of your node you return this sum, else you need to go to the next level. but if you see this, you will do this maximum 2 times for level, then your complexity will be 2(log(n)). :)