prefix sum overview

Revision en2, by THE_THUNDERSTORM_BEGINS, 2025-03-08 22:08:12

this blog will talk about prefix sums in general , how to use them , nice tricks , etc ...

this blog was made to summarize HCPC-2025 training for prefix sums in Homs university / Syria.

overview

1. what is prefix sum?

2. get range sum in $$$O(1)$$$

3. find number of sub arrays with sum equal to $$$x$$$ (map trick)

4. even more prefix sums

lets start:

1. what is prefix sum?

prefix sum of an array a of size $$$n$$$ is another array of size $$$n+1$$$ as follows:

        a(1)       a(2)           a(3)                 ...  a(n)
p(0)=0  p(1)=a(1)  p(2)=a(1)+a(2) p(3)=a(1)+a(2)+a(3)  ...  p(n)=a(1)+a(2)+...+a(n)

so in general $$$p(i)$$$ contains the sum of all elements up to $$$i$$$

and we introduce another element at the front representing the empty prefix

we can build prefix sum array in simple one line of code

for(int i=1;i<=n;i++)p[i]=p[i-1]+a[i];

this is the only info we care about (for now).

2.get range sum in $$$O(1)$$$

as we want the sum in range $$$[L,R]$$$

we realize that it is the same as $$$p(R)-p(L-1)$$$

a(L)+a(L+1)+...+a(R)=(a(1)+a(2)+...+a(L-1)+a(L)...+a(R))-(a(1)+a(2)+...+a(L-1)) = p(R)-p(L-1)

3. find the number of sub arrays with sum equal to $$$x$$$

given the array $$$a$$$ of size $$$n$$$

every sub array sum is represented by two prefix sums $$$p(L-1),p(R)$$$

first we initialize a map

map<int,int> cnt;

and loop through the array

lets say we are considering index $$$i$$$

at this point every prefix sum $$$p(j)$$$ such that $$$j<i$$$ is considered in the map

it is necessary to consider the empty prefix at first

cnt[0]=1;

and then at each index $$$i$$$

we do *something to get all sub arrays ending at index $$$i$$$ and have the sum equal to $$$x$$$ count and add it to the final answer

and then do

cnt[p[i]]++;

what to *do each time?

well we want the following equation $$$ p(i)-V=x $$$ when V is the value that we are searching

then $$$V=p(i)-x$$$

so we simply add $$$cnt[p[i]-x]$$$ to the final answer each time

note that $$$V=p(L-1)$$$ that's why we considered the empty prefix at the first place

there is a problem at CSES mentioning this trick

4. even more prefix sums

here i will mention other prefix stuff that might be useful to use later,

1.prefix xor

we can create an array containing the xor of each prefix, this way we can get the xor of any range

for(int i=1;i<=n;i++)p[i]=p[i-1]^a[i];
//xor of a range [L,R] is p[R]^p[L-1]

this is possible because

X^X=0 , A^B=B^A , X^0=X

2.prefix $$$mul$$$,$$$and$$$ , $$$or$$$ , $$$max$$$ , $$$min$$$ , $$$gcd$$$ and rarely $$$lcm$$$

these queries are sadly not usual to get in a range as we will need advanced data structures (aka sparse table) to get them in $$$O(1)$$$ (or $$$log(n)$$$ in case of gcd,lcm)

but you can get them in $$$log(n)$$$ complexity by using segment trees (out of topic but worth mentioning)

if we want $$$mul$$$ in range we wouldd need to make sure a lot of things are satisfied , like there should not be zero (as it will make all prefixes ahead zeros and we cant cancel them),and overflow cannot happen (which could mean a lot of things), in our case we will focus on $$$mul$$$ in range with $$$mod$$$ so we can get inverse mod in $$$O(log(n))$$$,or even $$$O(1)$$$ if the mod is small enough

it is also important to mention that we can get $$$(and,or)$$$ queries in range using simple prefix sum

by applying prefix sum on each bit as follows:

for(int j=0;j<30;j++){
    for(int i=1;i<=n;i++){
        p[j][i]=p[j][i-1]+((a[i]>>j)&1);
    }
}

and then we can get $$$or$$$ or $$$and$$$ queries in a range $$$[L,R]$$$ as follows:

int AND(int L,int R){
    int res=0;
    for(int j=0;j<30;j++){
        if(p[j][R]-p[j][L-1]==R-L+1)res+=1<<j;
    }
    return res;
}

int OR(int L,int R){
    int res=0;
    for(int j=0;j<30;j++){
        if(p[j][R]-p[j][L-1]>0)res+=1<<j;
    }
    return res;
}

but in certain problems it is enough to get them in a prefix (or a suffix maybe?)

3. 2D prefix sum

the idea is we want to get sum in a rectangle in $$$O(1)$$$

given the 2D array $$$a$$$ of $$$n$$$ rows and $$$m$$$ columns

   a(1,1) a(1,2)  ...  a(1,m)
   a(2,1) a(2,2)  ...  a(2,m)
   ...
   a(n,1) a(n,2)  ...  a(n,m)

we do a prefix sum on each row of $$$a$$$ first so that

p(i,j)=a(i,1)+a(i,2)+...+a(i,j)

and then we do another prefix sum on columns of $$$p$$$ so that

s(i,j)=p(1,j)+p(2,j)+...+p(i,j)

actually we don't need two more arrays for that, we can simply apply these operations on the same array twice, or one more array just to keep the original array.

the code is as follows:

for(int i=1;i<=n;i++)
for(int j=1;j<=m;j++)p[i][j]=a[i][j]+p[i][j-1];

for(int j=1;j<=m;j++)
for(int i=1;i<=n;i++)p[i][j]+=p[i-1][j];

we can get sum of elements in a rectangle with upper left corner at $$$(i,j)$$$ and with $$$x$$$ rows and $$$y$$$ columns like this

p[i+x-1][j+y-1]-p[i+x-1][j-1]-p[i-1][j+y-1]+p[i-1][j-1]

using inclusion exclusion principle (which is out of topic right now)

but i can explain why this is true

let the array $$$f$$$ represent how may times we added an element from the array $$$a$$$

first each $$$f(i,j)=0$$$

i will add each component to see how the frequency of elements change

notice that adding $$$p(i,j)$$$ represents increasing the value of $$$f(i1,j1)$$$ $$$i1<=i&&j1<=j$$$ by one

i=2,j=2,x=4,y=3
             +p[i+x-1][j+y-1]    -p[i+x-1][j-1]     -p[i-1][j+y-1]   +p[i-1][j-1]
  0 0 0 0 0  1 1 1 1 0           0 1 1 1 0          -1 0 0 0 0        0 0 0 0 0
  0 0 0 0 0  1 1 1 1 0           0 1 1 1 0           0 1 1 1 0        0 1 1 1 0
  0 0 0 0 0  1 1 1 1 0           0 1 1 1 0           0 1 1 1 0        0 1 1 1 0
  0 0 0 0 0  1 1 1 1 0           0 1 1 1 0           0 1 1 1 0        0 1 1 1 0
  0 0 0 0 0  1 1 1 1 0           0 1 1 1 0           0 1 1 1 0        0 1 1 1 0

4.sweep line

the idea is simple

given an array of zeros $$$a$$$

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

you can add $$$V$$$ to a range $$$[L,R]$$$ by adding $$$V$$$ to $$$a(L)$$$ and subtracting $$$V$$$ from $$$a(R+1)$$$

0 0 0 V 0 0 0 0 0 -V 0 0 0 0 0

and then applying prefix sum on the array

0 0 0 V V V V V V 0 0 0 0 0 0 0

well that is not useful when talking about one query , the thing is you can add multiple values to multiple ranges and then apply prefix sum , this is considered offline approach in multiple cases.

5. sweep line on a 2D array

adding V to rectangle with upper left corner at $$$(i,j)$$$ and with $$$x$$$ rows and $$$y$$$ columns

add V to both $$$(i,j)$$$ $$$(i+x,j+y)$$$

subtract V from both $$$(i+x,j)$$$ $$$(i,j+y)$$$

and then apply prefix sum 2D like we learned

i=2,j=2,x=2,y=3
   0 0 0 0 0    0 0 0 0 0     0 0 0 0 0
   0 0 0 0 0    0 V 0 0 -V    0 V V V 0
   0 0 0 0 0    0 0 0 0 0     0 V V V 0
   0 0 0 0 0    0 -V 0 0 V    0 0 0 0 0 
   0 0 0 0 0    0 0 0 0 0     0 0 0 0 0

you can actually extend this to 3D array (worth mentioning).

problem solving

CSES-1662

spoj-CSUMQ

CF-835C

CF-1398C

CF-1291D

AC-abc164D

CF-2072D

CF-1501B

CF-433B

CF-296C

CF-816B

CF-1722E

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en2 English THE_THUNDERSTORM_BEGINS 2025-03-08 22:08:12 2 Tiny change: 'j)$ $(i,j+x)$\n\nand ' -> 'j)$ $(i,j+y)$\n\nand '
en1 English THE_THUNDERSTORM_BEGINS 2025-03-08 21:52:16 7907 Initial revision (published)