Detailed Editorial for problem "The Struggle" from XXII Opencup, Grand Prix of XiAn

#	User	Rating
1	tourist	3857
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3463
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	165
2	-is-this-fft-	161
3	Qingyu	160
4	Dominater069	158
5	atcoder_official	157
6	adamant	154
7	Um_nik	151
8	djm03178	150
9	luogu_official	149
10	awoo	147

Hello, Codeforces!

"The Struggle" (Codeforces Gym 103329F) is a problem I authored which appeared in the HDU Multi-university Training, the Ptz Summer Camp and the Open Cup. Despite appearing in contests where there are a total of ~1300 three people teams, I know of few (possibly no more than 5) people who have learned and independently implemented the solution.

The problem is pretty much fun and the solution is quite easy to implement (actual implementation < 2kb). hos_lyric said that this is a good problem! From this blog you will easily learn how the algorithm works and how to implement the solution effortlessly. There shall be no more mystery, and you will become able to solve this OpenCup problem that few people have solved right today!

The problem statement is very simple: Given an ellipse $$$E$$$ that is contained in $$$(0,4 \times 10^6) \times (0,4 \times 10^6)$$$, calculate the value $$$\sum_{(x, y) \in E}(x \oplus y)^{33} x^{-2} y^{-1} \mod 10^9+7$$$ over all integer points $$$(x,y)$$$. In this problem, $$$\oplus$$$ is the bitwise XOR operation.

While the solution does not seem to be obvious, we shall consider a easier case: how should we compute $$$\sum_{x = 0}^{2^n-1}\sum_{y = 0}^{2^n-1} (x \oplus y)^{33} x^{-2} y^{-1} \mod 10^9+7$$$? i.e. If the aria is a square $$$[0,2^n-1] \times [0,2^n-1]$$$, how to calculate the value? (For our purposes we shall consider $$$0^{-2} = 0^{-3} \equiv 0 \mod 10^9+7$$$.)

This is quite simple! This can be done in $$$O(n \log n)$$$ time, using an algorithm called "Fast Walsh Hadamard Transforms" or FWHT or FWT or fast xor convolution. The convolution basically calculates $$$c_i = \sum_{x = 0}^{2^n-1}\sum_{y = 0}^{2^n-1} [x \oplus y = i]a_xb_y$$$. If we set $$$a_i = i^{-2}$$$ and $$$b_i = i^{-3}$$$, we can calculate $$$\sum_{i = 0}^{2^n-1} c_i \times i^{33}$$$ and this will be the answer for our easier case.

We shall then consider: What if my square is different than $$$[0,2^n-1] \times [0,2^n-1]$$$? What if the square we want to calculate on is $$$[x\times2^n,x\times2^n+2^n-1] \times [y\times2^n,y\times2^n+2 ^n-1]$$$?

This case turns out to be just as simple! We can see that as all bits in the binary representation except the last $$$n$$$ bit changes, for $$$0 \le i,j < 2^n$$$ we have $$$(x\times 2^n+i) \oplus (y\times 2^n+j) = 2^n(x \oplus y)+i \oplus j$$$. Based on this observation we can simply set $$$a_i = (i+x\times 2^n)^{-2}, b_i = (i+y\times 2^n)^{-3}$$$ and calculate $$$c_i = \sum_{x = 0}^{2^n-1}\sum_{y = 0}^{2^n-1} [x \oplus y = i]a_xb_y$$$. $$$\sum_{i = 0}^{2^n-1} c_i \times (i+(x \oplus y)2^n)^{33}$$$ will be the answer. The complexity will be $$$O(n \log n)$$$.

After making the above observations, we can come up with a quite efficient algorithm already! The algorithm simply works as the following pseudocode:

int solve(square S = [x*2^n,x*2^n+2^n-1]*[y*2^n,y*2^n+2^n-1]){
    if(S is completely in the ellipse){
        return the value calculated by the above discussed FWHT method.
    }
    Let the four sub-squares be S1,S2,S3,S4;
    return solve(S1)+solve(S2)+solve(S3)+solve(S4);
}

What is the complexity of this algorithm? Unfortunately, the complexity is $$$O(n \log^2 n)$$$. The analysis is not simple, but I would assure you that I did the analysis for simpler cases where the range is like $$$|x-y|<c$$$ and there is two logs. The analysis is omitted here. The algorithm does not run fast enough.

which is not fast enough. Consider optimizing this algorithm. The method is to perform FWT from the bottom up, and calculate the squares that need to be calculated at each layer. After calculating the inner product of FWT array, we should not calculate the inverse FWT, but should "accumulate" it on the result array. (See author's solution for better understanding)

One issue in the complexity analysis of this question is to prove that the sum of the side lengths of all squares is $$$O(n \log n)$$$. This fact can be proved on the condition that the border function is a monotone function, and the boundary of the ellipse can be split into four monotone functions. The idea of the proof is to see that the y-intervals corresponding to each x-interval must be a constant plus some "extra" intervals, and for x-coordinate intervals of the same size, the total length of the "extra y-intervals" cannot exceed $$$n$$$. Since there is only $$$\log n$$$ sizes for x-intervals, the proof is done.

For implementation, please reference the author's solution.

Here is the author's solution for reference

#include <bits/stdc++.h>
using namespace std;
using ll = long long; 
#define tcT template<class T
#define tcTU tcT, class U
#define FOR(i,a,b) for (int i = (a); i < (b); ++i)
#define F0R(i,a) FOR(i,0,a)
#define ROF(i,a,b) for (int i = (b)-1; i >= (a); --i)
#define R0F(i,a) ROF(i,0,a)
#define each(a,x) for (auto& a: x)
const int mod = 1000000007;
constexpr int pct(int x) { return __builtin_popcount(x); } // # of bits set
ll fdiv(ll a, ll b) { return a/b-((a^b)<0&&a%b); } // divide a by b rounded down
tcTU> T lstTrue(T lo, T hi, U f) { lo --; assert(lo <= hi); while (lo < hi) { T mid = lo+(hi-lo+1)/2; f(mid) ? lo = mid : hi = mid-1; } return lo; }

const int MX = (2<<22)+10;
ll a,b,c,d,e,f;
int N,miv[MX],xv2[MX],yv2[MX],resv[MX],li[MX*2],ri[MX*2];

inline int mul(int x,int y){return 1ll*x*y%mod;}
inline int add(int x,int y){return x+y>=mod?x+y-mod:x+y;}
inline int sub(int x,int y){return x-y<0?x-y+mod:x-y;}
inline int sq(int x){return 1ll*x*x%mod;}
int mpow(int a,int b){return b == 0 ? 1 : ( b&1 ? mul(a,sq(mpow(a,b/2))) : sq(mpow(a,b/2)));}

void solve(){
    cin>>a>>b>>c>>d>>e>>f;
    ll xbnd = lstTrue(0,4000000,[&](ll x){return (4*c*a-e*e)*x*x<=4*c*f;});
    ll ybnd = lstTrue(0,4000000,[&](ll x){return (4*c*a-e*e)*x*x<=4*a*f;});
    int cn = max(b+xbnd,d+ybnd)+10;
    N = 1;while(N<cn)N*=2;
    F0R(i,N){
        yv2[i] = miv[i];
        xv2[i] = 1ll*yv2[i]*yv2[i]%mod;
        resv[i] = 0;
    }
    F0R(ii,N){
        if(ii<b-xbnd || ii>b+xbnd){
            li[ii+N] = 1;ri[ii+N] = 0;
            continue;
        }
        ll i = ii-b,cv = e*e*i*i-4*c*(a*i*i-f),ce = sqrt(cv);
        while(ce*ce>cv)ce-=1; while((ce+1)*(ce+1)<=cv)ce+=1;
        ri[ii+N] = fdiv(-e*i+ce,c*2)+d;
        li[ii+N] = fdiv(-e*i-ce+c*2-1,c*2)+d;
    }
    int msk = N-2;
    R0F(i,N){
        li[i] = N-((N-max(li[i*2],li[i*2+1]))&msk);
        ri[i] = ((min(ri[i*2],ri[i*2+1])+1)&msk)-1;
        if(pct(i) == 1)msk-=msk&-msk;
    }
    auto conv = [&](int* xxa,int i){
        for(int s =0;s<N;s+=i*2){
            int* f1 = xxa+s,*f2 =xxa+s+i;
            for(int j=0;j<i;j++){ int c1 = f1[j],c2 = f2[j]; f1[j]=add(c1,c2); f2[j]=sub(c1,c2); }
        }
    };
    for(int i = 1;i<N;i*=2){
        int s;
        function<void(int,int)> calc= [&](int l,int r){
            for(int j=l;j<r;j+=i){
                int *a = xv2+s,*b = yv2+j,*res = resv+(s^j);
                for(int k=0;k<i;k++) res[k]=(1ll*a[k]*b[k]+res[k])%mod;
            }
        };
        for(s =0;s<N;s+=i){
            int id = (N+s)/i;
            if(li[id]>ri[id])continue;
            if(li[id/2]>ri[id/2]){
                calc(li[id],ri[id]+1);
            }else{
                calc(li[id],li[id/2]);
                calc(ri[id/2]+1,ri[id]+1);
            }
        }
        conv(xv2,i);conv(yv2,i);conv(resv,i);
    }
    for(int i = 1;i<N;i*=2) conv(resv,i);
    int ans = 0;
    F0R(i,N) ans=add(ans,mul(resv[i],mpow(i,33)));
    ans=mul(ans,mpow(N,mod-2));
    cout<<ans<<"\n";
}

int main() {
    int T;cin>>T;
    miv[0] = miv[1]= 1;
    FOR(i,2,MX) miv[i] = mod-(long long)mod/i*miv[mod%i]%mod;
    while(T--){
        solve();
    }
    return 0;
}

During the HDU competition the problem was $$$\sum_{(x, y) \in E}(x \oplus y)^{3} x^{-2} y^{-1} \mod 10^9+7$$$. The team Inverted Cross wrote a data structures based solution which works on $$$O(3 n \log n)$$$ time (with large constant). The program was unfortunately, not fast enough.

Another solution written by Inverted Cross, but only works when 33 is substituted for 3

#include<bits/stdc++.h>
#define ll long long
#define ull unsigned long long
#define For(i,j,k) for (int i=(int)(j);i<=(int)(k);i++)
#define Rep(i,j,k) for (int i=(int)(j);i>=(int)(k);i--)
using namespace std;

const int max_N=1<<22;
const int mo=1000000007;
int power(int x,int y){
    int s=1;
    for (;y;y/=2,x=1ll*x*x%mo)
        if (y&1) s=1ll*s*x%mo;
    return s;
}

long long tim=0;

struct Solver{
    int fac[max_N];
    int inv[max_N];
    int nn;
    
    Solver(){
        inv[0]=inv[1]=1;
        for (int i=2;i<max_N;i++)
            inv[i]=1ll*inv[(mo%i)]*(mo-mo/i)%mo;
    }
    
    int t[max_N*2][4],pl[max_N*2],pr[max_N*2];
    int fl[max_N*2],lvl[max_N*2];
    void pushup(int k){
        int ls=2*k+fl[k];
        int rs=2*k+1-fl[k],w=1ll*lvl[k]*t[rs][0]%mo,ww=1ll*lvl[k]*lvl[k]%mo;
        t[k][0]=(t[ls][0]+t[rs][0]>=mo?t[ls][0]+t[rs][0]-mo:t[ls][0]+t[rs][0]);
        t[k][1]=(t[ls][1]+t[rs][1]>=mo?t[ls][1]+t[rs][1]-mo:t[ls][1]+t[rs][1]);
        t[k][1]=(t[k][1]+w>=mo?t[k][1]+w-mo:t[k][1]+w);
        t[k][2]=(t[ls][2]+t[rs][2]+1ll*lvl[k]*w+2ll*lvl[k]*t[rs][1])%mo;
        t[k][3]=(t[ls][3]+t[rs][3]+1ll*ww*w+3ll*ww*t[rs][1]+3ll*lvl[k]*t[rs][2])%mo;
    }
    unsigned long long ansl,ansr;
    void query(int l,int r,int x){
        l+=nn-1; r+=nn+1;
        for (;l^r^1;l>>=1,r>>=1){
            if (!(l&1)){
                int S=pl[l^1]^x; S-=S&(lvl[l>>1]-1);
                ansl=(ansl+((1ll*t[l^1][0]*S%mo+3ll*t[l^1][1])*S+3ll*t[l^1][2])%mo*S+t[l^1][3]);
            }
            if (r&1){
                int S=pl[r^1]^x; S-=S&(lvl[r>>1]-1);
                ansr=(ansr+((1ll*t[r^1][0]*S%mo+3ll*t[r^1][1])*S+3ll*t[r^1][2])%mo*S+t[r^1][3]);
            }
        }
    }
    int calc(int n,int *ly,int *ry){
        nn=1; int my=n;
        for (int i=1;i<=n;i++) my=max(my,ry[i]);
        for (;nn<=my;nn<<=1);
        for (int d=1,nw=nn>>1;d<nn;d<<=1,nw>>=1)
            for (int i=d;i<d+d;i++) lvl[i]=nw;
        for (int i=0;i<nn;i++){
            t[i+nn][0]=inv[i]; pl[i+nn]=pr[i+nn]=i;
            t[i+nn][1]=t[i+nn][2]=t[i+nn][3]=0;
        }
        for (int i=nn-1;i>=1;i--){
            fl[i]=0,pushup(i);
            pl[i]=pl[i*2];
            pr[i]=pr[i*2+1];
        }
        int x=0,ans=0,maxv=0;
        for (int i=1;i<nn;i++){
            int v=i^(i-1);
            for (;v!=(v&(-v));v-=v&(-v));
            for (int j=2*v-1;j>=v;j--) fl[j]^=1;
            maxv=max(maxv,2*v-1);
            x^=nn/v/2;
            if (x<=n&&ly[x]!=-1&&ly[x]<=ry[x]){
                for (;maxv;--maxv) pushup(maxv);
                ansl=ansr=0;
                query(ly[x],ry[x],x),ansl%=mo,ansr%=mo;
                ans=(ans+1ll*inv[x]*inv[x]%mo*(ansl+ansr))%mo;
            }
        }
        return ans;
    }    
}PJY;
long long a,b,c,d,e,f;
void calc(int n,int *ly,int *ry) {
    long long A,B,C;
    for (int x=0; x<=n; x++){
        A=c,B=-2ll*c*d+1ll*e*(x-b),C=1ll*a*(x-b)*(x-b)+1ll*c*d*d-1ll*e*(x-b)*d-f;
        if (B*B-4*A*C<0) ly[x]=ry[x]=-1; else
        {
            long double len=sqrt(B*B-4*A*C),l=(-B-len)/(2.0*A),r=(-B+len)/(2.0*A);
            ly[x]=floor(l-(1e-12))+1,ry[x]=floor(r+(1e-12));
        }
    }
}
int ly[max_N*2],ry[max_N*2];
void solve(){
    scanf("%lld%lld%lld%lld%lld%lld",&a,&b,&c,&d,&e,&f);
    calc(2*b,ly,ry);
    int n=2*b;
    for (;ly[n]==-1;--n);
    printf("%d\n",PJY.calc(n,ly,ry));
}
int main(){
    int T;
    scanf("%d",&T);
    while (T--) solve();
}

Rev.	By	When	Δ	Comment
en23	satoshi	2021-11-02 02:33:56	0	(published)
en22	satoshi	2021-11-02 02:32:38	1088
en21	satoshi	2021-11-01 18:01:53	419
en20	satoshi	2021-11-01 17:57:36	892
en19	satoshi	2021-11-01 17:51:00	61
en18	satoshi	2021-11-01 17:49:52	3
en17	satoshi	2021-11-01 17:49:28	834
en16	satoshi	2021-11-01 17:47:04	4243
en15	satoshi	2021-11-01 17:45:33	220
en14	satoshi	2021-11-01 17:41:48	55
en13	satoshi	2021-11-01 17:40:49	105
en12	satoshi	2021-11-01 17:39:31	5892
en11	satoshi	2021-11-01 17:37:18	529
en10	satoshi	2021-11-01 17:29:32	56
en9	satoshi	2021-11-01 17:28:52	18
en8	satoshi	2021-11-01 17:28:10	715
en7	satoshi	2021-11-01 17:20:37	2	Tiny change: '^n-1} c_i^33$ and this' -> '^n-1} c_i^{33}$ and this'
en6	satoshi	2021-11-01 17:20:07	409	Tiny change: 't a time a$[x\times2' -> 't a time a $[x\times2'
en5	satoshi	2021-11-01 17:14:34	227
en4	satoshi	2021-11-01 17:10:04	200
en3	satoshi	2021-11-01 17:06:32	304
en2	satoshi	2021-11-01 17:01:48	389
en1	satoshi	2021-11-01 16:57:29	2205	Initial revision (saved to drafts)

History