[Tutorial] Fast and simple RSQ with range addition (not segment tree!)

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	156
6	adamant	152
6	djm03178	152
8	Qingyu	151
9	luogu_official	149
10	awoo	147

TL;DR: `range_sum_range_add` with only two BITs.

Suppose we have (sub)task to proceed following queries:

range_sum(l, r)
range_add(l, r, x)

Most familiar solution is segment tree with lazy propagation. And maybe you didn't knew, but such segment tree does not needs pushes!

code of segment tree without pushes

template<class T>
struct rsq_add_segt {
	explicit rsq_add_segt(size_t sz = 0, const T &val = 0) {
		for(d=1; d<sz; d<<=1);
		t.assign(d*2, {val, {}});
	}
	
	rsq_add_segt(const vector<auto> &vals): rsq_add_segt(size(vals)) {
		for(size_t i=0; i<size(vals); ++i) t[i+d].first = vals[i];
		for(size_t i=d; i-->1; ) t[i].first = t[i*2].first + t[i*2+1].first;
	}
	
	void add(size_t l, size_t r, const T &val) {
		if(l>=r) return ;
		_add(l, r, val, 0, d, 1);
	}
	
	T operator()(size_t l, size_t r) const {
		if(l>=r) return {};
		return _calc(l, r, 0, d, 1);
	}
	
	private:
	size_t d;
	//t[i] = pair of #(sum on range of node) and #(total additions on range of node)
	vector<pair<T,T>> t;
	
	void _add(size_t i, size_t j, const T &val, size_t l, size_t r, size_t v) {
		//adding to whole range
		if(i==l && j==r){
			//sum increases by value * length_of_range
			t[v].first+=val*T(r-l);
			//total addition increases by value
			t[v].second+=val;
			return ;
		}
		size_t m = (l+r)>>1;
		if(i<m) _add(i,min(j,m),val,l,m,v*2);
		if(m<j) _add(max(i,m),j,val,m,r,v*2+1);
		//update current node: sum on range is sum of children + own addition
		t[v].first = t[v*2].first + t[v*2+1].first + t[v].second * T(r-l);
	}
	
	T _calc(size_t i, size_t j, size_t l, size_t r, size_t v) const {
		//sum of whole range already known
		if(i==l && j==r) return t[v].first;
		size_t m = (l+r)>>1;
		return 
		//query splited to children
		(j<=m ? _calc(i,j,l,m,v*2) : i>=m ? _calc(i,j,m,r,v*2+1)
		 : _calc(i,m,l,m,v*2) + _calc(m,j,m,r,v*2+1))
		//and adding own addition
		  + t[v].second * T(j-i);
	}
};

Every time I see segment tree with pushes in this problem, my heart is bleeding... But! This is unnecessary, because this problem does not needs segment tree!

Reducing `range_sum_range_add` to `range_sum_position_add`

All following is described in 1-index numeration, and by range I mean half-interval $$$[L, R)$$$.

Let me remind you that sum on range can be reduced to sum on prefix (or suffix). And in the same way — adding on range can be reduced to adding on prefix (or suffix).

How?

OK, suppose we have some changes of array (adding on prefix). We have for each $$$i$$$ value $$$a_i$$$ means this value is added on $$$i$$$-th prefix. How to calculate sum on particular prefix $$$k$$$? All added values inside prefix, i.e. $$$i \leq k$$$, must be added fully as $$$a_i \times i$$$. Values outside prefix, i.e. $$$k < i$$$, must be added as $$$a_i \times k$$$.

Lets keep two classic range_sum_position_add data structures. First, call it f, takes $$$a$$$ as is. Second, call it t, as $$$a_i \times i$$$. It means, if we need to proceed adding $$$x$$$ on prefix $$$i$$$, we call f.position_add(i, x) and t.position_add(i, x*i).

To answer prefix sum query we need:

all values inside: it will be t.range_sum(1, i+1),
all values outside: it will be f.range_sum(i+1, n+1) * i.

That's all! With help of Binary Indexed Tree, as most popular rsq, we can achieve fast, non recursive and short way to implement required data structure.

We can change prefix/prefix to other three combinations and get similar formulas. As example, my code library have prefix/suffix version to achieve only prefix summation and suffix addition in both nested BITs:

code

template<class T>
struct rsq_add {
	explicit rsq_add(size_t sz = 0): f(sz) {}
	
	rsq_add(const vector<auto> &vals): f(size(vals)) {
		for(size_t i=0; i<size(vals); ++i) {
			T x = vals[i]; if(i) x-=T(vals[i-1]);
			f[i].first+=x;
			f[i].second+=x*T(i);
			if(size_t j = i|(i+1); j<size(f)) {
				f[j].first+=f[i].first;
				f[j].second+=f[i].second;
			}
		}
	}
	
	void add(size_t l, size_t r, const T &val) {
		if(l>=r) return ;
		add_suf(l, val);
		add_suf(r, -val);
	}
	
	T operator()(size_t l, size_t r) const {
		if(l>=r) return T();
		return sum_until(r) - sum_until(l);
	}
	
	inline T operator[](size_t i) const { return sum_until(i+1) - sum_until(i); }
	
	private:
	vector<pair<T,T>> f;
	
	void add_suf(size_t pos, const T &val) {
		T m = val * T(pos);
		for(size_t i=pos; i<size(f); i|=i+1) 
			f[i].first+=val, f[i].second+=m;
	}
	
	T sum_until(size_t pos) const {
		T a{}, b{};
		for(size_t i=pos; i--; i&=i+1) 
			a+=f[i].first, b+=f[i].second;
		return a*T(pos) - b;
	}
};

Bonus: `sqrt` versions

It is well-known that with classic sqrt-decomposition queries can be proceeded in $$$O(\sqrt n)$$$ each. But with described reducing to range_sum_position_add we can achieve $$$O(1)$$$ / $$$O(\sqrt n)$$$ or $$$O(\sqrt n)$$$ / $$$O(1)$$$ versions for range_add / range_sum.

First is simple, adding to one element followed with adding to block contain this element, and summation is sum of $$$O(\sqrt n)$$$ blocks and sum of $$$O(\sqrt n)$$$ corner elements.

Second needs reducing range_sum_position_add to position_sum_range_add: lets support suffix sums (which gives sum on range as difference of two suffixes in $$$O(1)$$$) and changing one element $$$i$$$ affecting only suffixes sums from $$$0$$$ to $$$i$$$ (this part takes $$$O(\sqrt n)$$$).

Comments (3)

Write comment?

kpw29

3 years ago, # |

+23

Cool post, but the trick isn't new, Petr described it in 2013. Sorry to kill the fun :/

https://petr-mitrichev.blogspot.com/2013/05/fenwick-tree-range-updates.html

→ Reply

oversolver

3 years ago, # ^ |

← Rev. 2 →

yeah, I remember this post and discussion on cf, but it was like "magic code without description", now I understand this trick more generally.

agarus

← Rev. 6 →

Nice trick. Some steps seams to be missing. One those things is: how to get final answer to L,R query after number of sum updates. Right before "That's all!" part. What should we do with t and f?

Should the the answer be: ans = initial + updates = (pref[r] — pref[l-1]) + ...? I got it in 20 minutes, but it would be nice to have it with explanation: that you need f(i+1, n+1) * i — t(1, i+1) and why. And maybe, maybe, maybe some good self-explanatory naming for f(l,r) and t(l,r), if you care.

Oh, no, ans is a little bit more complicated: updates = f(r+1, n+1) * (r + 1) + t(1, r) — (f(l+1, N+1) * (l +1) + t(1,l) ).

oversolver's blog

TL;DR: range_sum_range_add with only two BITs.

Reducing range_sum_range_add to range_sum_position_add

Bonus: sqrt versions

TL;DR: `range_sum_range_add` with only two BITs.

Reducing `range_sum_range_add` to `range_sum_position_add`

Bonus: `sqrt` versions