[Tutorial] An interesting counting problem related to square product

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	157
6	Qingyu	156
7	adamant	151
7	djm03178	151
7	luogu_official	151
10	awoo	146

The statement:

Given three integers $$$n, k, p$$$, $$$(1 \leq k \leq n < p)$$$.

Count the number of array $$$a[]$$$ of size $$$k$$$ that satisfied

$$$1 \leq a_1 < a_2 < \dots < a_k \leq n$$$
$$$a_i \times a_j$$$ is perfect square $$$\forall 1 \leq i < j \leq n$$$

Since the number can be big, output it under modulo $$$p$$$.

For convenient, you can assume $$$p$$$ is a large constant prime $$$10^9 + 7$$$

Yet you can submit the problem for $$$k = 3$$$ he re.

Extra Tasks

Solved A: Can we also use phi function or something similar to solve for $$$k = 3$$$ in $$$O(\sqrt{n})$$$ ?

Solved B: Can we also use phi function or something similar to solve for general $$$k$$$ in $$$O(\sqrt{n})$$$ ?

Solved C: Can we also solve the problem where there can be duplicate: $$$a_i \leq a_j\ (\forall\ i < j)$$$ and no longer $$$a_i < a_j (\forall\ i < j)$$$ ?

Solved D: Can we solve the problem where there is no restriction between $$$k, n, p$$$ ?

Solved E: Can we solve for negative integers, whereas $$$-n \leq a_1 < a_2 < \dots < a_k \leq n$$$ ?

F: Can we solve for a specific range, whereas $$$L \leq a_1 < a_2 < \dots < a_k \leq R$$$ ?

G: Can we solve for cube product $$$a_i \times a_j \times a_k$$$ effectively ?

H: Can we solve if it is given $$$n$$$ and queries for $$$k$$$ ?

I: Can we solve if it is given $$$k$$$ and queries for $$$n$$$ ?

J: Can we also solve the problem where there are no order: Just simply $$$1 \leq a_i \leq n$$$ ?

K: Can we solve for $$$q$$$-product $$$a_{i_1} \times a_{i_2} \times \dots \times a_{i_q} = x^q$$$ (for given constant $$$q$$$) ?

M: Given $$$0 \leq \delta \leq n$$$, can we also solve the problem when $$$1 \leq a_1 \leq a_1 + \delta + \leq a_2 \leq a_2 + \delta \leq \dots \leq a_k \leq n$$$ ?

*Marked as solved only if tested with atleast $$$10^6$$$ queries

Solution for k = 1

The answer just simply be $$$n$$$

Solution for k = 2

Algorithm

We need to count the number of pair $$$(a, b)$$$ that $$$1 \leq a < b \leq n$$$ and $$$a \times b$$$ is perfect square.

Every positive integer $$$x$$$ can be represent uniquely as $$$x = u \times p^2$$$ for some positive integer $$$u, p$$$ and $$$u$$$ as small as possible ($$$u$$$ is squarefree number).

Let represent $$$x = u \times p^2$$$ and $$$y = v \times q^2$$$ (still, minimum $$$u$$$, $$$v$$$ ofcourse).

We can easily proove that $$$x \times y$$$ is a perfect square if and if only $$$u = v$$$.

So for a fixed squarefree number $$$u$$$. You just need to count the number of ways to choose $$$p^2$$$.

The answer will be the sum of such ways for each fixed $$$u$$$.

Implementation

Implementation using factorization

vector<int> prime; /// prime list
vector<int> lpf;   /// Lowest prime factor, lpf[x] is smallest prime divisor of x
void sieve(int lim = LIM) /// O(n)
{
    prime.assign(1, 2);
    lpf.assign(lim + 1, 2);

    lpf[1] = 1; /// For easier calculation but can cause inf loops
    for (int i = 3; i <= lim; i += 2) {
        if (lpf[i] == 2) prime.push_back(lpf[i] = i);
        for (int j = 0; j < sz(prime) && prime[j] <= lpf[i] && prime[j] * i <= lim; ++j)
            lpf[prime[j] * i] = prime[j];
    }
}

/// mask(x) is smallest positive number that mask(x) * x is a perfect square
int getMask(int x) /// O(log n)
{
    int mask = 1;
    while (x > 1) {
        int p = lpf[x], f = 0;
        do x /= p, f++; while (p == lpf[x]);
        if (f & 1) mask *= p; /// if current power is odd, we mutiple mask with current prime
    }
    return mask;
}

int cnt[LIM];
int magic(int n) /// O(n log max(a))
{
    memset(cnt, 0, sizeof(cnt[0]) * (n + 1));

    ll res = 0;
    for (int a = 1; a <= n; ++a) /// Check all cases of a
        res += cnt[getMask(a)]++;

    res %= MOD;
    return res;
}

int main() /// O(n log max(a))
{
    int n;
    cin >> n;
    sieve(n + 500);
    cout << magic(n);    
    return 0;
}

Implementation 1

int solve(int n)
{
    memset(is_squarefree, true, sizeof(is_squarefree[0]) * (n + 1));

    long long res = 0;
    for (int i = 1, j; i <= n; ++i) if (is_squarefree[i]) 
    {
        for (j = 1; i * j * j <= n; ++j)
            is_squarefree[i * j * j] = false;

        res += 1LL * (j - 1) * (j - 2) / 2;
    }

    res %= MOD;
    return res;
}

Implementation 2

int solve(int n)
{
    memset(is_squarefree, true, sizeof(is_squarefree[0]) * (n + 1));

    long long res = 0;
    for (int i = 1; i <= n; ++i) if (is_squarefree[i]) 
    {
        for (int j = 1; i * j * j <= n; ++j)
        {
            is_squarefree[i * j * j] = false;
            res += j - 1;
        }
    }

    res %= MOD;
    return res;
}

Implementation related to Möbius function

/// Linear Sieve
vector<bool> isPrime; /// Characteristic function = A010051
vector<int> prime;    /// Prime list              = A000040
vector<int> lpf;      /// Lowest prime factor     = A020639
vector<int> mu;       /// Mobius                  = A008683
vector<int> phi;      /// Euler totient phi       = A000010
void sieve(int n)
{
    if (n < 1) return ;
    /// Extension part
    mu.assign(n + 1, 1);
    phi.assign(n + 1, 1);
    /// Main part
    prime.clear();
    lpf.assign(n + 1, 0);
    isPrime.assign(n + 1, true);
    isPrime[0] = isPrime[1] = false;
    for (int x = 2; x <= n; ++x)
    {
        if (isPrime[x]) /// Func[Prime]
        {
            lpf[x] = x;
             mu[x] = -1;
            phi[x] = x - 1;
            prime.push_back(x);
        }
 
        for (int p : prime) /// Func[Prime * X] <- Func[Prime]
        {
            if (p > lpf[x] || x * p > n) break;
            isPrime[x * p] = 0;
            lpf[x * p] = p;
            if (lpf[x] == p)
            {
                 mu[x * p] = 0;
                phi[x * p] = phi[x] * p;
            }
            else 
            {
                 mu[x * p] = -mu[x];
                phi[x * p] = phi[x] * phi[p];
            }
        }
    }
}

long long solve(int n)
{
    sieve(n + 1);
    long long res = 0;
    for (int x = 1; x <= n; ++x) if (mu[x]) 
    {
        int t = sqrt(n / x);
        res += 1LL * t * (t - 1) / 2;
    }

    res %= MOD;
    return res;
}

Complexity

So about the complexity....

For the implementation using factorization, it is $$$O(n \log n)$$$.

Hint 1

Hint 2

Proof

Bonus

For the 2 implementations below, the complexity is linear.

Hint 1

Hint 2

Hint 3

Hint 4

Proof

For the last implementation, the complexity is Linear

Hint 1

Hint 2

Proof

Solution for general k

Using the same logic above, we can easily solve the problem.

Now you face up with familliar binomial coefficient problem

This implementation here is using the assumption of $$$p$$$ prime and $$$p > max(n, k)$$$

You can still solve the problem for squarefree $$$p$$$ using lucas and CRT

Yet just let things simple as we only focus on the counting problem, we will assume $$$p$$$ is a large constant prime.

O(n) solution

const int LIM = 1e7 + 17;
const int MOD = 1e9 + 7;

int fact[LIM + 1]; /// factorial:         fact[n] = n!
int invs[LIM + 1]; /// inverse modular:   invs[n] = n^(-1)
int tcaf[LIM + 1]; /// inverse factorial: tcaf[n] = (n!)^(-1)
void precal_nck(int n = LIM)
{
    fact[0] = fact[1] = 1;
    invs[0] = invs[1] = 1;
    tcaf[0] = tcaf[1] = 1;
    for (int i = 2; i <= n; ++i)
    {
        fact[i] = (1LL * fact[i - 1] * i) % MOD;
        invs[i] = MOD - 1LL * (MOD / i) * invs[MOD % i] % MOD;
        tcaf[i] = (1LL * tcaf[i - 1] * invs[i]) % MOD;
    }
}

int nck(int n, int k)
{
    k = min(k, n - k);
    if (k < 0) return 0;

    long long res = fact[n];
    res *= tcaf[k];         res %= MOD;
    res *= tcaf[n - k];     res %= MOD;
    return res;
}

int solve(int n, int k)
{
    memset(is_squarefree, true, sizeof(is_squarefree[0]) * (n + 1));
    precal_nck(n);

    long long res = 0;
    for (int i = 1, j; i <= n; ++i) if (is_squarefree[i]) 
    {
        for (j = 1; i * j * j <= n; ++j)
            is_squarefree[i * j * j] = false;

        res += nck(j - 1, k);
    }

    res %= MOD;
    return res;
}

A better solution for k = 2

Idea

In the above approach, we fix $$$u$$$ as a squarefree and count $$$p^2$$$.

But what if I fix $$$p^2$$$ to count $$$u$$$ instead ?

Yet you can see that the first loop now is $$$O(\sqrt{n})$$$, but it will still $$$O(n)$$$ total because of the second loop

Swap for loop implementation

int solve(int n)
{
    memset(is_squarefree, true, sizeof(is_squarefree[0]) * (n + 1));
    long long res = 0;

    int t = sqrt(n);
    while (t * t < n) ++t;
    while (t * t > n) --t;
    for (int j = t; j > 1; --j)
    {
        for (int i = 1; i * j * j <= n; ++i)
        {
            if (!used[i * j * j])
            {
                used[i * j * j] = true;
                res += j - 1;
            }
        }
    }

    res %= MOD;
    return res;
}

Approach

Let $$$f(n)$$$ is the number of pair $$$(a, b)$$$ that $$$1 \leq a < b \leq n$$$ and $$$(a, b, n)$$$ is a three-term geometric progression.

Let $$$g(n)$$$ is the number of pair $$$(a, b)$$$ that $$$1 \leq a \leq b \leq n$$$ and $$$(a, b, n)$$$ is a three-term geometric progression.

Let $$$F(n) = \overset{n}{\underset{p=1}{\Large \Sigma}} f(p)$$$.

But why do we need these functions anyway

So it is no hard to prove that $$$g(n) = f(n) + 1$$$.

This interesting sequence $$$g(n)$$$ is A000188, having many properties, such as

Number of solutions to $$$x^2 \equiv 0 \pmod n$$$.
Square root of largest square dividing $$$n$$$.
Max $$$gcd \left(d, \frac{n}{d}\right)$$$ for all divisor $$$d$$$.

Well, to make the problem whole easier, I gonna skip all the proofs to use this property (still, you can use the link in the sequence for references).

$$$g(n) = \underset{d^2 | n}{\Large \Sigma} \phi(d)$$$.

From this property, we can solve the problem in $$$O(\sqrt{n})$$$.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

Yet this paper also takes you to something similar.

Implementation

O(sqrt n log log sqrt n) solution

#include <iostream>
#include <cstring>
#include <numeric>
#include <cmath>

using namespace std;

const int MOD = 1e9 + 7;
const int LIM = 1e7 + 17;
const int SQRT_LIM = ceil(sqrt(LIM) + 1) + 1;

int euler[SQRT_LIM];
void sieve_phi(int n)
{
    iota(euler, euler + n + 1, 0);
    for (int x = 2; x <= n; x++) if (euler[x] == x)
        for (int j = x; j <= n; j += x)
            euler[j] -= euler[j] / x;
}

int solve(int n)
{
    sieve_phi(ceil(sqrt(n) + 1) + 1);
    
    long long res = 0;
    for (int p = 2; p * p <= n; ++p)
        res += 1LL * euler[p] * (n / (p * p));

    res %= MOD;
    return res;
}

int main()
{
    ios::sync_with_stdio(false);
    cin.tie(NULL);
    
    int n;
    cin >> n;
    cout << solve(n);
    return 0;
}

O(sqrt) solution

#include <iostream>
#include <cstring>
#include <numeric>
#include <vector>
#include <cmath>

using namespace std;

const int MOD = 1e9 + 7;

vector<int> lpf;
vector<int> prime;
vector<int> euler;
void linear_sieve_phi(int n)
{
    lpf.assign(n + 1, 0);
    euler.assign(n + 1, 1);
    for (int x = 2; x <= n; ++x)
    {
        if (lpf[x] == 0)
        {
            prime.push_back(lpf[x] = x);
            euler[x] = x - 1;                    
        }
        for (int i = 0; i < prime.size() && x * prime[i] <= n; ++i)
        {
            lpf[x * prime[i]] = prime[i];
            if (x % prime[i] == 0) {
                euler[x * prime[i]] = euler[x] * prime[i];    
                break;
            }
            euler[x * prime[i]] = euler[x] * euler[prime[i]];
        }
    }
}

int solve(int n)
{
    linear_sieve_phi(ceil(sqrt(n) + 1) + 1);
    
    long long res = 0;
    for (int p = 2; p * p <= n; ++p)
        res += 1LL * euler[p] * (n / (p * p));

    res %= MOD;
    return res;
}

int main()
{
    ios::sync_with_stdio(false);
    cin.tie(NULL);
    
    int n;
    cin >> n;
    cout << solve(n);
    return 0;
}

A better solution for general k

Extra task A, B

Algorithm

As what clyring decribed here

Let $$$f_k(n)$$$ is the number of set $$$(a_1, a_2, \dots, a_k, n)$$$ that $$$1 \leq a_1 < a_2 < \dots < a_k \leq n$$$ and $$$(a_1, a_2, \dots, a_k, n)$$$ is a $$$(k+1)$$$-term geometric progression.

Let $$$g_k(n)$$$ is the number of set $$$(a_1, a_2, \dots, a_k, n)$$$ that $$$1 \leq a_1 \leq a_2 \leq \dots \leq a_k \leq n$$$ and $$$(a_1, a_2, \dots, a_k, n)$$$ is a $$$(k+1)$$$-term geometric progression.

Let $$$F_k(n) = \overset{n}{\underset{p=1}{\Large \Sigma}} f_k(p)$$$.

Let $$$s_k(n)$$$ is the number of way to choose $$$p^2$$$ among those $$$k$$$ numbers when you fix squarefree $$$u$$$ (though we are doing in reverse).

The formula

Implementation

O(sqrt n log sqrt n)

const int LIM = 5e6 + 56;
const int SQRT_LIM = ceil(sqrt(LIM) + 1) + 1;
const int MOD = 1e9 + 7;

/// Precalculating factorials under prime modulo
int fact[SQRT_LIM + 10]; /// fact[n] = n!
int invs[SQRT_LIM + 10]; /// invs[n] = n^(-1)
int tcaf[SQRT_LIM + 10]; /// tcaf[n] = (n!)^(-1)
void precal_nck(int n = SQRT_LIM)
{
    fact[0] = fact[1] = 1;
    invs[0] = invs[1] = 1;
    tcaf[0] = tcaf[1] = 1;
    for (int i = 2; i <= n; ++i)
    {
        fact[i] = (1LL * fact[i - 1] * i) % MOD;
        invs[i] = MOD - 1LL * (MOD / i) * invs[MOD % i] % MOD;
        tcaf[i] = (1LL * tcaf[i - 1] * invs[i]) % MOD;
    }
}

/// Calculating binomial coefficient queries
int nck(int n, int k)
{
    k = min(k, n - k);
    if (k < 0) return 0;

    long long res = fact[n];
    res *= tcaf[k];         res %= MOD;
    res *= tcaf[n - k];     res %= MOD;
    return res;
}

/// Linear Sieve
vector<int> prime;           /// prime list              = A000040
bool isPrime[SQRT_LIM + 10]; /// characteristic function = A010051
int lpf[SQRT_LIM + 10];      /// lowest prime factor     = A020639
int mu[SQRT_LIM + 10];       /// mobius                  = A008683
void linear_sieve(int n)
{
    if (n < 1) return ;
    /// Extension Sieve || You can add something more
    memset(lpf, 0, sizeof(lpf[0]) * (n + 1));
    fill_n(mu, n + 1, 1);
    /// Main Sieve || Without this, you barely able to achive linear complexity
    prime.clear();
    prime.reserve(n / log(n - 1));
    memset(isPrime, true, sizeof(isPrime[0]) * (n + 1));
    isPrime[0] = isPrime[1] = false;
    for (int x = 2; x <= n; ++x) /// For each number
    {
        if (isPrime[x]) /// Func[Prime]
        {
            mu[x] = -1;
            lpf[x] = x;
            prime.push_back(x);
        }
        for (int p : prime) /// Func[Prime * X] <- Func[Prime]
        {
            if (p > lpf[x] || x * p > n) break;
            isPrime[x * p] = 0;
            lpf[x * p] = p;
            mu[x * p] = (lpf[x] == p) ? 0 : -mu[x];
        }
    }
}

/// Divisor sieve
vector<int> divisors[SQRT_LIM];
void precal_div(int n) /// O(n log n)
{
    for (int u = n; u >= 1; --u)
    {
        divisors[u].clear();
        for (int v = u; v <= n; v += u)
            divisors[v].push_back(u);
    }
}

/// Solving for n, k
long long solve(int n, int k)
{
    /// We only care for d that 1 <= d <= sqrt(n)
    int t = ceil(sqrt(n) + 1) + 1;
    linear_sieve(t);
    precal_nck(t);
    precal_div(t);

    long long res = 0;
    for (int d = 1; d * d <= n; ++d) /// For each fixed p^2
    {
        long long sum = 0;
        for (int p : divisors[d]) /// For each (p | d)
            sum += mu[d / p] * nck(p - 1, k - 1);

        sum %= MOD;
        res += sum * (n / (d * d));
    }

    res %= MOD;
    return res;
}

int main()
{
    ios::sync_with_stdio(false);
    cin.tie(NULL);

    /// Assumming constant p = 10^9 + 7
    int n, k;
    cin >> n >> k;
    cout << solve(n, k);
    return 0;
}

O(sqrt log log sqrt n)

vector<int> prime;           /// prime list              = A000040
bool isPrime[SQRT_LIM + 10]; /// characteristic function = A010051
int lpf[SQRT_LIM + 10];      /// lowest prime factor     = A020639
void linear_sieve(int n)
{
    if (n < 1) return ;
    prime.clear();
    prime.reserve(n / log(n - 1));
    memset(lpf, 0, sizeof(lpf[0]) * (n + 1));
    memset(isPrime, true, sizeof(isPrime[0]) * (n + 1));
    isPrime[0] = isPrime[1] = false;
    for (int x = 2; x <= n; ++x)
    {
        if (isPrime[x]) /// Func[Prime]
        {
            lpf[x] = x;
            prime.push_back(x);
        }
        for (int p : prime) /// Func[Prime * X] <- Func[Prime]
        {
            if (p > lpf[x] || x * p > n) break;
            isPrime[x * p] = 0;
            lpf[x * p] = p;
        }
    }
}

long long res[SQRT_LIM + 10];
long long solve(int n, int k)
{
    int t = ceil(sqrt(n) + 1) + 1;
    linear_sieve(t);
    precal_nck(t);

    memset(res, 0, sizeof(res[0]) * (t + 1));
    for (int d = 1; d * d <= n; ++d) 
        res[d] = nck(d - 1, k - 1);

    for (int p : prime)
        for (int d = t / p; d > 0; --d)
            res[d * p] -= res[d];

    long long ans = 0;
    for (int d = 1; d * d <= n; ++d)
        ans += res[d] * (n / (d * d));

    ans %= MOD;
    return ans;
}

Complexity

The complexity of the first implementation is $$$O(\sqrt{n} \log \sqrt{n})$$$

Hint 1

Hint 2

Hint 3

Proof

The complexity of the second implementation is $$$O(\sqrt{n} \log \log \sqrt{n})$$$

Hint 1

Proof

Solution for duplicates elements in array

Extra task C

Idea

It is no hard to proove that we can use the same algorithm as described in task A, B or in original task.

Hint

Proof

Using the same algorithm, the core of calculating is to find out the number of non-decreasing integer sequence of size $$$k$$$ where numbers are in $$$[1, n]$$$.

The formula is

Can you proove it ?

Hint 1

Hint 2

Hint 3

Proof

Now it is done, just that it

The idea is the same as what clyring described here but represented in the other way

Implementation

O(n) solution


int fact[SQRT_LIM + 10];
int invs[SQRT_LIM + 10];
int tcaf[SQRT_LIM + 10];
void precal_nck(int n = SQRT_LIM)
{
    fact[0] = fact[1] = 1;
    invs[0] = invs[1] = 1;
    tcaf[0] = tcaf[1] = 1;
    for (int i = 2; i <= n; ++i)
    {
        fact[i] = (1LL * fact[i - 1] * i) % MOD;
        invs[i] = MOD - 1LL * (MOD / i) * invs[MOD % i] % MOD;
        tcaf[i] = (1LL * tcaf[i - 1] * invs[i]) % MOD;
    }
}

int nck(int n, int k)
{
    k = min(k, n - k);
    if (k < 0) return 0;

    long long res = fact[n];
    res *= tcaf[k];         res %= MOD;
    res *= tcaf[n - k];     res %= MOD;
    return res;
}

bool is_squarefree[LIM];
int solve(int n, int k)
{
    memset(is_squarefree, true, sizeof(is_squarefree[0]) * (n + 1));
    precal_nck(2 * n + 1);

    long long res = 0;
    for (int i = 1, j; i <= n; ++i) if (is_squarefree[i]) 
    {
        for (j = 1; i * j * j <= n; ++j)
            is_squarefree[i * j * j] = false;

        res += nck(k + j - 2, k);
    }

    res %= MOD;
    return res;
}

O(sqrt n log sqrt n + k) solution

const int LIM = 5e6 + 56;
const int SQRT_LIM = ceil(sqrt(LIM) + 1) + 1;
const int MOD = 1e9 + 7;

/// Precalculating factorials under prime modulo
int fact[SQRT_LIM + 10]; /// fact[n] = n!
int invs[SQRT_LIM + 10]; /// invs[n] = n^(-1)
int tcaf[SQRT_LIM + 10]; /// tcaf[n] = (n!)^(-1)
void precal_nck(int n = SQRT_LIM)
{
    fact[0] = fact[1] = 1;
    invs[0] = invs[1] = 1;
    tcaf[0] = tcaf[1] = 1;
    for (int i = 2; i <= n; ++i)
    {
        fact[i] = (1LL * fact[i - 1] * i) % MOD;
        invs[i] = MOD - 1LL * (MOD / i) * invs[MOD % i] % MOD;
        tcaf[i] = (1LL * tcaf[i - 1] * invs[i]) % MOD;
    }
}

/// Calculating binomial coefficient queries
int nck(int n, int k)
{
    k = min(k, n - k);
    if (k < 0) return 0;

    long long res = fact[n];
    res *= tcaf[k];         res %= MOD;
    res *= tcaf[n - k];     res %= MOD;
    return res;
}

/// Linear Sieve
vector<int> prime;           /// prime list              = A000040
bool isPrime[SQRT_LIM + 10]; /// characteristic function = A010051
int lpf[SQRT_LIM + 10];      /// lowest prime factor     = A020639
int mu[SQRT_LIM + 10];       /// mobius                  = A008683
void linear_sieve(int n)
{
    if (n < 1) return ;
    /// Extension Sieve || You can add something more
    memset(lpf, 0, sizeof(lpf[0]) * (n + 1));
    fill_n(mu, n + 1, 1);
    /// Main Sieve || Without this, you barely able to achive linear complexity
    prime.clear();
    prime.reserve(n / log(n - 1));
    memset(isPrime, true, sizeof(isPrime[0]) * (n + 1));
    isPrime[0] = isPrime[1] = false;
    for (int x = 2; x <= n; ++x) /// For each number
    {
        if (isPrime[x]) /// Func[Prime]
        {
            mu[x] = -1;
            lpf[x] = x;
            prime.push_back(x);
        }
        for (int p : prime) /// Func[Prime * X] <- Func[Prime]
        {
            if (p > lpf[x] || x * p > n) break;
            isPrime[x * p] = 0;
            lpf[x * p] = p;
            mu[x * p] = (lpf[x] == p) ? 0 : -mu[x];
        }
    }
}

/// Divisor sieve
vector<int> divisors[SQRT_LIM];
void precal_div(int n) /// O(n log n)
{
    for (int u = n; u >= 1; --u)
    {
        divisors[u].clear();
        for (int v = u; v <= n; v += u)
            divisors[v].push_back(u);
    }
}

/// Solving for n, k
long long solve(int n, int k)
{
    /// We only care for d that 1 <= d <= sqrt(n)
    int t = ceil(sqrt(n) + 1) + 1;
    linear_sieve(t);
    precal_nck(t);
    precal_div(t);

    long long res = 0;
    for (int d = 1; d * d <= n; ++d) /// For each fixed p^2
    {
        long long sum = 0;
        for (int p : divisors[d]) /// For each (p | d)
            sum += mu[d / p] * nck(d + k - 2, k - 1);

        sum %= MOD;
        res += sum * (n / (d * d));
    }

    res %= MOD;
    return res;
}

int main()
{
    ios::sync_with_stdio(false);
    cin.tie(NULL);

    /// Assumming constant p = 10^9 + 7
    int n, k;
    cin >> n >> k;
    cout << solve(n, k);
    return 0;
}

O(sqrt n log log sqrt n + k) solution

int fact[SQRT_LIM + 10];
int invs[SQRT_LIM + 10];
int tcaf[SQRT_LIM + 10];
void precal_nck(int n = SQRT_LIM)
{
    fact[0] = fact[1] = 1;
    invs[0] = invs[1] = 1;
    tcaf[0] = tcaf[1] = 1;
    for (int i = 2; i <= n; ++i)
    {
        fact[i] = (1LL * fact[i - 1] * i) % MOD;
        invs[i] = MOD - 1LL * (MOD / i) * invs[MOD % i] % MOD;
        tcaf[i] = (1LL * tcaf[i - 1] * invs[i]) % MOD;
    }
}

int nck(int n, int k)
{
    k = min(k, n - k);
    if (k < 0) return 0;

    long long res = fact[n];
    res *= tcaf[k];         res %= MOD;
    res *= tcaf[n - k];     res %= MOD;
    return res;
}

vector<int> prime;           /// prime list              = A000040
bool isPrime[SQRT_LIM + 10]; /// characteristic function = A010051
int lpf[SQRT_LIM + 10];      /// lowest prime factor     = A020639
void linear_sieve(int n)
{
    if (n < 1) return ;
    prime.clear();
    prime.reserve(n / log(n - 1));
    memset(lpf, 0, sizeof(lpf[0]) * (n + 1));
    memset(isPrime, true, sizeof(isPrime[0]) * (n + 1));
    isPrime[0] = isPrime[1] = false;
    for (int x = 2; x <= n; ++x)
    {
        if (isPrime[x]) /// Func[Prime]
        {
            lpf[x] = x;
            prime.push_back(x);
        }
        for (int p : prime) /// Func[Prime * X] <- Func[Prime]
        {
            if (p > lpf[x] || x * p > n) break;
            isPrime[x * p] = 0;
            lpf[x * p] = p;
        }
    }
}

long long res[SQRT_LIM + 10];
long long solve(int n, int k)
{
    int t = ceil(sqrt(n) + 1) + 1;
    linear_sieve(t + k);
    precal_nck(t + k);

    memset(res, 0, sizeof(res[0]) * (t + 1));
    for (int d = 1; d * d <= n; ++d) 
        res[d] = nck(d + k - 2, k - 1);

    for (int p : prime)
        for (int d = t / p; d > 0; --d)
            res[d * p] -= res[d];

    long long ans = 0;
    for (int d = 1; d * d <= n; ++d)
        ans += res[d] * (n / (d * d));

    ans %= MOD;
    return ans;
}

Complexity

In the first implementation it is obviously linear.

Hint 1

Hint 2

The second and third implementation is also easy to show its complexity

Hint 1

Hint 2

Sadly, since $$$k \leq n$$$. We also conclude that the complexity is $$$O(n)$$$, and even worse it also contains large constant factor compared to that in the first implementation.

But it is still effecient enough to solve problem where $$$k$$$ is small.

Solution when there are no restriction between k, n, p

Extra task D

Idea

So first of all, the result do depend on how you calculate binomial coefficient but they are calculated independently even if you can somehow manage to use the for loop of binomial coeffient go first.

Therefore even if there is no restriction between $$$k, n, p$$$, the counting part and the algorithm doesnt change.

You just need to change how you calculate binomial coefficient, and that is all for this task.

Let just ignore the fact that though this need more detail, but as the blog is not about nck problem I will just make it quick

For large prime $$$p > max(n, k)$$$

Just using normal combinatorics related to factorial (since $$$p > max(n, k)$$$ nothing will affect the result)
For taking divides under modulo you can just take modular inversion (as a prime always exist such number)
Yet this is standard problem, just becareful of the overflow part
You can also optimize by precalculating factorial, inversion number and inversion factorial in linear too

For general prime $$$p$$$

We can just ignore factors $$$p$$$ in calculating $$$n!$$$.
You also need to know how many times factor $$$p$$$ appears in $$$1 \dots n$$$
Then combining it back when calculating for the answer.
If we dont do this $$$n!$$$ become might divides some factors of $$$p$$$.
By precalculation you can answer queries in $$$O(1)$$$

For squarefree $$$p$$$

Factorize $$$p = p_1 \times p_2 \times p_q$$$ that all $$$p_i$$$ is prime.
Ignore all factors $$$p_i$$$ when calculate $$$n!$$$.
Remember to calculate how many times factors $$$p_i$$$ appear in $$$1 \dots n$$$.
When query for the answer we just combine all those part back.
Remember you can just take modulo upto $$$\phi(p)$$$ which you can also calculate while factorizing $$$p$$$.
Remember that $$$n!$$$ must not divides any factor $$$p_i$$$ otherwise you will get wrong answer.
By precalculation you can answer queries in $$$O(\log p)$$$

For general positive modulo $$$p$$$

Factorize $$$p = p_1^{f_1} \times p_2^{f_2} \times p_q^{f_q}$$$ that all $$$p_i$$$ is unique prime.
We calculate $$$C(n, k)$$$ modulo $$$p_i^{f_i}$$$ for each $$$i = 1 \dots q$$$.
To do that, we need to calculate $$$n!$$$ modulo $$$p_i^{f_i}$$$ which is described here.
To get the final answer we can use CRT.
Yet this is kinda hard to code and debug also easy to make mistake so you must becareful
I will let the implementation for you lovely readers.
Yet depends on how you calculate stuffs that might increase your query complexity
There are few (effective or atleast fully correct) papers about this but you can read the one written here

Implementation

O(n) for prime p > max(n, k)

/// SPyofgame linear template for precalculating factorials under large prime modulo
int fact[SQRT_LIM + 10]; /// fact[n] = n!
int invs[SQRT_LIM + 10]; /// invs[n] = n^(-1)
int tcaf[SQRT_LIM + 10]; /// tcaf[n] = (n!)^(-1)
void precal_nck(int n = SQRT_LIM)
{
    fact[0] = fact[1] = 1;
    invs[0] = invs[1] = 1;
    tcaf[0] = tcaf[1] = 1;
    for (int i = 2; i <= n; ++i)
    {
        fact[i] = (1LL * fact[i - 1] * i) % MOD;
        invs[i] = MOD - 1LL * (MOD / i) * invs[MOD % i] % MOD;
        tcaf[i] = (1LL * tcaf[i - 1] * invs[i]) % MOD;
    }
}

/// Calculating binomial coefficient queries
int nck(int n, int k)
{
    k = min(k, n - k);
    if (k < 0) return 0;

    long long res = fact[n];
    res *= tcaf[k];         res %= MOD;
    res *= tcaf[n - k];     res %= MOD;
    return res;
}

O(n log mod + sqrt(mod)) for prime p or squarefree p

vector<int> factor;
int factorize(int n) /// Calculating phi(n) while factorizing (n) in O(sqrt n)
{
    factor.clear();
    int phi = n;

    if (!(n & 1))
    {
        n >>= __builtin_ctz(n);
        factor.push_back(2);
        phi -= phi / 2;
    }

    for (int x = 3; x * x <= n; x += 2)
    {
        if (n % x == 0)
        {
            do n /= x; while (n % x == 0);
            factor.push_back(x);
            phi -= phi / x;
        }
    }

    if (n > 1)
    {
        factor.push_back(n);
        phi -= phi / n;
    }

    return phi;
}

int f[LIM];    /// f[x] = nck(n, x)
int fact[LIM]; /// n! 
int tcaf[LIM]; /// n!^(-1)
int divp[LIM]; /// x but ignore all factors p[i]
int cntp[LIM][LOG_LIM]; /// cntp[x][i] = Number of time factor p[i] appear in 1..x
void precal(int MOD) /// Calculate f[x] for all x = 1 -> n in O(n log mod + sqrt mod)
{
    int PHIMOD = factorize(MOD);
    for (int x = 1; x <= n; ++x) /// For each part x in n!
    {
        int &t = divp[x] = x;
        for (int i = 0; i < factor.size(); ++i) /// Ignore all factor p[i] of p
        {
            cntp[x][i] = cntp[x - 1][i];
            for (; t % factor[i] == 0; t /= factor[i]) /// Count how many times p[i] appears in 1..n
                ++cntp[x][i];
        }
    }

    fact[0] = fact[1] = 1;
    tcaf[0] = tcaf[1] = 1;
    for (int x = 2; x <= n; ++x) /// Finding n! and n!^(-1)
    {
        fact[x] = (1LL * fact[x - 1] * divp[x]) % MOD;
        tcaf[x] = powMOD(fact[x], PHIMOD - 1, MOD);
    }

    memset(f, 0, sizeof(f[0]) * k);
    for (int x = k; x <= n; ++x)
    {
        /// Calculate nck % p normally
        f[x] = fact[x];
        mulMOD(f[x], tcaf[k], MOD);
        mulMOD(f[x], tcaf[x - k], MOD);
        for (int i = 0; i < factor.size(); ++i) /// Bringing those factors back
        {
            int p = cntp[x][i] - cntp[k][i] - cntp[x - k][i];
            f[x] = 1LL * f[x] * powMOD(factor[i], p, MOD) % MOD;
        }
    }
}

Complexity

In the first implementation it is obviously linear.

Hint

And for the second implementation.

Hint 1

Hint 2

So you got $$$O(n \times \log p + \sqrt{p})$$$ in final.

Bonus

Though you can still optimize this but by doing that why dont you just go straight up to solve for non squarefree $$$p$$$ too ?

Solution when numbers are also bounded by negative number

Extra task E

Idea

Yet this is the same as extra task C where only the counting part should be changed.

As we only care about integer therefore let not use complex math into this problem.

If there exist a negative number and a positive number, the product will be negative thus the sequence will not satisfied.

Becareful, there are the zeros too.

When the numbers are all unique, or $$$-n \leq a_1 < a_2 < \dots < a_k \leq n$$$

There are 4 cases:

Thus give us the formula of $$$task_E(n, k) = 2 \times task_B(n, k) + 2 \times task_B(n, k - 1)$$$.

Hint 1

Hint 2

Hint 3

Proof

Remember that when $$$k = 0$$$ the answer is $$$0$$$ otherwise you might somewhat having wrong result for negative number in binomial coefficients formula

So what if I mix the problem with task C too ?

When the numbers can have duplicates, or $$$-n \leq a_1 \leq a_2 \leq \dots \leq a_k \leq n$$$

There are 5 cases:

Yet once again you can simplified it with less cases for easier calculation.

There are 2 main cases:

Thus give us the formula of $$$task_E(n, k) = 1 + 2 \times \overset{k}{\underset{t = 1}{\Large \Sigma}} task_B(n, t)$$$.

Why the formula is 2 * ...?

No I mean why there is no binomial coefficients for selecting the number of zeros ?

So where is the part 1 come frome ? - Why isnt it 2 instead ?

But this give you a $$$O(k)$$$ solution.

You can do better with math

Hint 1

Hint 2

Solution

Implementation

O(sqrt n log log sqrt n) when the numbers are unique


long long res[SQRT_LIM + 10];
long long solve(int n, int k)
{
    int t = ceil(sqrt(n) + 1) + 1;
    linear_sieve(t);
    precal_nck(t);

    memset(res, 0, sizeof(res[0]) * (t + 1));
    for (int d = 1; d * d <= n; ++d) 
    {
        res[d] += (k >= 1) * nck(d - 1, k - 1) * 2;
        res[d] += (k >= 2) * nck(d - 1, k - 2) * 2;  
    }

    for (int p : prime)
        for (int d = t / p; d > 0; --d)
            res[d * p] -= res[d];

    long long ans = 0;
    for (int d = 1; d * d <= n; ++d)
        ans += res[d] * (n / (d * d));

    ans %= MOD;
    return ans;
}

O(kn) = O(n^2) when duplicates are allowed

bool is_squarefree[LIM];
int brute(int n, int k)
{
    memset(is_squarefree, true, sizeof(is_squarefree[0]) * (n + 1));
    precal_nck(2 * n + 1);

    long long res = 0;
    for (int i = 1, j; i <= n; ++i) if (is_squarefree[i]) 
    {
        for (j = 1; i * j * j <= n; ++j)
            is_squarefree[i * j * j] = false;

        res += nck(k + j - 2, k);
    }

    res %= MOD;
    return res;
}


long long solve(int n, int k)
{
    long long res = 1;
    for (int t = 1; t <= k; ++t)
        res += brute(n, t) * 2;

    res %= MOD;
    return res;
}

O(k sqrt n log sqrt n) = O(n sqrt n log n) when duplicates are allowed

long long res[SQRT_LIM + 10];
long long brute(int n, int k)
{
    /// We only care for d that 1 <= d <= sqrt(n)
    int t = ceil(sqrt(n) + 1) + 1;
    linear_sieve(t);
    precal_nck(t + k);
    precal_div(t);

    long long res = 0;
    for (int d = 1; d * d <= n; ++d) /// For each fixed p^2
    {
        long long sum = 0;
        for (int p : divisors[d]) /// For each (p | d)
            sum += mu[d / p] * nck(d + k - 2, k - 1);

        sum %= MOD;
        res += sum * (n / (d * d));
    }

    res %= MOD;
    return res;
}

long long solve(int n, int k)
{
    long long res = 1;
    for (int t = 1; t <= k; ++t)
        res += brute(n, t) * 2;

    res %= MOD;
    return res;
}

O(k sqrt n log log sqrt n) = O(n sqrt n log log n) when duplicates are allowed

long long res[SQRT_LIM + 10];
long long brute(int n, int k)
{
    int t = ceil(sqrt(n) + 1) + 1;
    linear_sieve(t);
    precal_nck(t + k);

    memset(res, 0, sizeof(res[0]) * (t + 1));
    for (int d = 1; d * d <= n; ++d) 
        res[d] = nck(d + k - 2, k - 1);

    for (int p : prime)
        for (int d = t / p; d > 0; --d)
            res[d * p] -= res[d];

    long long ans = 0;
    for (int d = 1; d * d <= n; ++d)
        ans += res[d] * (n / (d * d));

    ans %= MOD;
    return ans;
}

long long solve(int n, int k)
{
    long long res = 1;
    for (int t = 1; t <= k; ++t)
        res += brute(n, t) * 2;

    res %= MOD;
    return res;
}

O(k sqrt n + sqrt n log log sqrt n) = O(n sqrt n) when duplicates are allowed

long long res[SQRT_LIM + 10];
long long solve(int n, int k)
{
    int t = ceil(sqrt(n) + 1) + 1;
    linear_sieve(t);
    precal_nck(t + k);

    memset(res, 0, sizeof(res[0]) * (t + 1));
    for (int d = 1; d * d <= n; ++d) 
        for (int t = 1; t <= k; ++t)
            res[d] += nck(d + t - 2, t - 1) * 2;

    for (int p : prime)
        for (int d = t / p; d > 0; --d)
            res[d * p] -= res[d];

    long long ans = 1;
    for (int d = 1; d * d <= n; ++d)
        ans += res[d] * (n / (d * d));

    ans %= MOD;
    return ans;
}

O(k + sqrt n log log sqrt n) = O(n) when duplicates are allowed


long long res[SQRT_LIM + 10];
long long solve(int n, int k)
{
    int t = ceil(sqrt(n) + 1) + 1;
    linear_sieve(t);
    precal_nck(t + k);

    memset(res, 0, sizeof(res[0]) * (t + 1));
    for (int d = 1; d * d <= n; ++d) 
        res[d] += nck(d + k - 1, k - 1) * 2;

    for (int p : prime)
        for (int d = t / p; d > 0; --d)
            res[d * p] -= res[d];

    long long ans = 1;
    for (int d = 1; d * d <= n; ++d)
        ans += res[d] * (n / (d * d));

    ans %= MOD;
    return ans;
}

Complexity

So.. you might be tired of calculating complexity again and again for those are too familar to you.

So I gonna skip this as the proof is as the same as what you can read above.

Contribution

Yurushia for pointing out the linear complexity of squarefree sieve.
clyring for fixing typos, and the approach for tasks A, B, C, D, E, G, H, J

Rev.	By	When	Δ	Comment
en60	SPyofcode	2021-11-09 12:21:19	4
en59	SPyofcode	2021-11-09 12:20:38	6	Tiny change: 'n subimit [https://' -> 'n subimit here: [https://'
en58	SPyofcode	2021-11-09 12:19:51	265
en57	SPyofcode	2021-11-09 12:12:22	3178
en56	SPyofcode	2021-11-05 19:03:46	3619
en55	SPyofcode	2021-11-01 08:53:10	64
en54	SPyofcode	2021-11-01 08:43:50	158
en53	SPyofcode	2021-11-01 06:55:47	2	Tiny change: ' for $k = 3$ in $O(\s' -> ' for $k = 2$ in $O(\s'
en52	SPyofcode	2021-11-01 06:44:45	71
en51	SPyofcode	2021-11-01 06:40:09	16954	(published)
en50	SPyofcode	2021-11-01 06:22:49	49881	(saved to drafts)
en49	SPyofcode	2021-11-01 05:27:15	49
en48	SPyofcode	2021-11-01 05:18:00	184
en47	SPyofcode	2021-10-31 19:17:30	40
en46	SPyofcode	2021-10-31 19:10:27	5305
en45	SPyofcode	2021-10-31 12:29:39	196
en44	SPyofcode	2021-10-31 12:15:21	229
en43	SPyofcode	2021-10-31 12:12:51	5175
en42	SPyofcode	2021-10-31 12:04:20	4012	(published)
en41	SPyofcode	2021-10-31 07:41:25	0	(saved to drafts)
en40	SPyofcode	2021-10-31 07:10:39	916	Tiny change: ') solution">\n\n```c' -> ') solution for k = 3">\n\n```c'
en39	SPyofcode	2021-10-31 06:59:31	6996
en38	SPyofcode	2021-10-30 20:35:28	436
en37	SPyofcode	2021-10-30 19:51:26	8
en36	SPyofcode	2021-10-30 19:50:57	168
en35	SPyofcode	2021-10-30 19:49:17	38
en34	SPyofcode	2021-10-30 19:48:18	710
en33	SPyofcode	2021-10-30 19:43:42	6
en32	SPyofcode	2021-10-30 19:42:58	714
en31	SPyofcode	2021-10-30 19:36:20	3825
en30	SPyofcode	2021-10-30 16:20:57	8
en29	SPyofcode	2021-10-30 16:19:47	2877
en28	SPyofcode	2021-10-30 16:18:30	56
en27	SPyofcode	2021-10-30 16:16:24	6543
en26	SPyofcode	2021-10-30 11:38:14	769
en25	SPyofcode	2021-10-30 06:48:18	76
en24	SPyofcode	2021-10-30 06:45:41	23	Reverted to en22
en23	SPyofcode	2021-10-30 06:41:21	23
en22	SPyofcode	2021-10-30 05:34:13	2034	Tiny change: ' summary="Hint 2">\n\nThe ' -> ' summary="Proof">\n\nThe '
en21	SPyofcode	2021-10-30 04:47:01	229
en20	SPyofcode	2021-10-29 20:05:05	68	Tiny change: '---\n\n## Tasks\n\n' -> '---\n\n## Extra Tasks\n\n'
en19	SPyofcode	2021-10-29 19:57:01	603
en18	SPyofcode	2021-10-29 19:50:27	5436	Tiny change: '------\n\n' -> '------\n\n-------------------------\n\n-------------------------\n\n'
en17	SPyofcode	2021-10-29 17:43:10	26
en16	SPyofcode	2021-10-29 12:50:12	3892
en15	SPyofcode	2021-10-29 12:16:06	1739
en14	SPyofcode	2021-10-29 03:52:02	2
en13	SPyofcode	2021-10-28 18:40:23	452	Tiny change: '## Statement:\' -> '## The statement:\'
en12	SPyofcode	2021-10-28 18:31:38	1487
en11	SPyofcode	2021-10-28 18:09:33	748
en10	SPyofcode	2021-10-28 13:42:36	278
en9	SPyofcode	2021-10-28 13:31:37	99
en8	SPyofcode	2021-10-28 13:23:42	78
en7	SPyofcode	2021-10-28 13:21:55	1383
en6	SPyofcode	2021-10-28 13:08:47	885	Tiny change: 'e product effective' -> 'e product $a_i \times a_j \times a_k$ effective'
en5	SPyofcode	2021-10-28 13:01:32	1	Tiny change: 'n $O\left(sqrt{n} \l' -> 'n $O\left(\sqrt{n} \l'
en4	SPyofcode	2021-10-28 13:00:39	478
en3	SPyofcode	2021-10-28 12:53:55	748	(published)
en2	SPyofcode	2021-10-28 12:42:55	7656
en1	SPyofcode	2021-10-28 12:05:23	3517	Initial revision (saved to drafts)

The statement:

Extra Tasks

Solution for k = 1

Solution for k = 2

Algorithm

Implementation

Complexity

Solution for general k

A better solution for k = 2

Idea

Approach

Implementation

A better solution for general k

Algorithm

Implementation

Complexity

Solution for duplicates elements in array

Idea

Implementation

Complexity

Solution when there are no restriction between k, n, p

Idea

Implementation

Complexity

Solution when numbers are also bounded by negative number

Idea

Implementation

Complexity

Contribution

History