The statement:
Given three integers $$$n, k, p$$$, $$$(1 \leq k \leq n < p)$$$.
Count the number of array $$$a[]$$$ of size $$$k$$$ that satisfied
- $$$1 \leq a_1 < a_2 < \dots < a_k \leq n$$$
- $$$a_i \times a_j$$$ is perfect square $$$\forall 1 \leq i < j \leq n$$$
Since the number can be big, output it under modulo $$$p$$$.
For convenient, you can assume $$$p$$$ is a large constant prime $$$10^9 + 7$$$
Yet you can submit the problem for $$$k = 3$$$ here.
Notice that in this blog, we will solve for generalized harder variants
For original problem you can see in this blog [Tutorial] An interesting counting problem related to square product
Extra Tasks
These are harder variants, and generalization from the original problem. You can see more detail here
*Marked as solved only if tested with atleast $$$10^6$$$ queries
Solved A: Can we also use phi function or something similar to solve for $$$k = 2$$$ in $$$O(\sqrt{n})$$$ or faster ?
Solved B: Can we also use phi function or something similar to solve for general $$$k$$$ in $$$O(\sqrt{n})$$$ or faster ?
Solved C: Can we also solve the problem where there can be duplicate: $$$a_i \leq a_j\ (\forall\ i < j)$$$ and no longer $$$a_i < a_j (\forall\ i < j)$$$ ?
Solved D: Can we solve the problem where there is no restriction between $$$k, n, p$$$ ?
Solved E: Can we solve for negative integers, whereas $$$-n \leq a_1 < a_2 < \dots < a_k \leq n$$$ ?
Solved F: Can we solve for a specific range, whereas $$$L \leq a_1 < a_2 < \dots < a_k \leq R$$$ ?
Solved G: Can we solve for cube product $$$a_i \times a_j \times a_k$$$ effectively ?
H: Can we solve if it is given $$$n$$$ and queries for $$$k$$$ ?
I: Can we solve if it is given $$$k$$$ and queries for $$$n$$$ ?
J: Can we also solve the problem where there are no order: Just simply $$$1 \leq a_i \leq n$$$ ?
K: Can we also solve the problem where there are no order: Just simply $$$0 \leq a_i \leq n$$$ ?
M: Can we solve for $$$q$$$-product $$$a_{i_1} \times a_{i_2} \times \dots \times a_{i_q} = x^q$$$ (for given constant $$$q$$$) ?
N: Given $$$0 \leq \delta \leq n$$$, can we also solve the problem when $$$1 \leq a_1 \leq a_1 + \delta + \leq a_2 \leq a_2 + \delta \leq \dots \leq a_k \leq n$$$ ?
O: What if the condition is just two nearby elements and not all pairs. Or you can say $$$a_i \times a_{i+1} \forall 1 \leq i < n$$$ is a perfect square ?
A better solution for k = 2
Extra task A
Idea
In the above approach, we fix $$$u$$$ as a squarefree and count $$$p^2$$$.
But what if I fix $$$p^2$$$ to count $$$u$$$ instead ?
Yet you can see that the first loop now is $$$O(\sqrt{n})$$$, but it will still $$$O(n)$$$ total because of the second loop
int solve(int n)
{
memset(is_squarefree, true, sizeof(is_squarefree[0]) * (n + 1));
long long res = 0;
int t = sqrt(n);
while (t * t < n) ++t;
while (t * t > n) --t;
for (int j = t; j > 1; --j)
{
for (int i = 1; i * j * j <= n; ++i)
{
if (!used[i * j * j])
{
used[i * j * j] = true;
res += j - 1;
}
}
}
res %= MOD;
return res;
}
Let $$$f(n)$$$ is the number of pair $$$(a, b)$$$ that $$$1 \leq a < b \leq n$$$ and $$$(a, b, n)$$$ is a three-term geometric progression.
Let $$$g(n)$$$ is the number of pair $$$(a, b)$$$ that $$$1 \leq a \leq b \leq n$$$ and $$$(a, b, n)$$$ is a three-term geometric progression.
Let $$$F(n) = \overset{n}{\underset{p=1}{\Large \Sigma}} f(p)$$$.
Well, you see, since $$$(a, b, n)$$$ form a three-term geometric progression then you have $$$b^2 = an$$$.
It is not hard to prove that $$$F(N)$$$ will be our answer as we count over all possible squarefree $$$u$$$ for every fixed $$$p^2$$$.
We use $$$g(n)$$$ because its beautiful property, which make the calculation a lot easier.
So it is no hard to prove that $$$g(n) = f(n) + 1$$$.
This interesting sequence $$$g(n)$$$ is A000188, having many properties, such as
- Number of solutions to $$$x^2 \equiv 0 \pmod n$$$.
- Square root of largest square dividing $$$n$$$.
- Max $$$gcd \left(d, \frac{n}{d}\right)$$$ for all divisor $$$d$$$.
Well, to make the problem whole easier, I gonna skip all the proofs to use this property (still, you can use the link in the sequence for references).
$$$g(n) = \underset{d^2 | n}{\Large \Sigma} \phi(d)$$$.
From this property, we can solve the problem in $$$O(\sqrt{n})$$$.
$$$F(N) = \overset{n}{\underset{p=1}{\Large \Sigma}} f(p)$$$.
and
$$$f(n) = g(n) + 1$$$.
We have
$$$F(n)$$$
$$$= \overset{n}{\underset{p=2}{\Large \Sigma}} g(p) - n$$$
$$$= \overset{n}{\underset{p=2}{\Large \Sigma}} \underset{d^2 | p}{\Large \Sigma} \phi(d) - n$$$
Break the loop, make it simpler.
Reduce a for loop to constant.
$$$F(N)$$$
$$$= \overset{n}{\underset{p=1}{\Large \Sigma}} f(p)$$$
$$$= -n + \overset{n}{\underset{p=2}{\Large \Sigma}} g(p)$$$
$$$= -n + \overset{n}{\underset{p=2}{\Large \Sigma}} \underset{d^2 | p}{\Large \Sigma} \phi(d)$$$
$$$= -n + \overset{\left \lfloor \sqrt{n} \right \rfloor}{\underset{p=1}{\Large \Sigma}} \underset{1 \leq u \times p^2 \leq n}{\Large \Sigma} \phi(p)$$$
$$$= -n + \overset{\left \lfloor \sqrt{n} \right \rfloor}{\underset{p=1}{\Large \Sigma}} \phi(p) \times \left \lfloor \frac{n}{p^2} \right \rfloor$$$
$$$= \overset{\left \lfloor \sqrt{n} \right \rfloor}{\underset{p=2}{\Large \Sigma}} \phi(p) \times \left \lfloor \frac{n}{p^2} \right \rfloor$$$
Well, you can sieve $$$\phi(p)$$$ for all $$$2 \leq p \leq \sqrt{n}$$$ in $$$O\left(\sqrt{n} \log \log \sqrt{n} \right)$$$ or improve it with linear sieve to $$$O(\sqrt{n})$$$.
Therefore you can solve the whole problem in $$$O(\sqrt{n})$$$.
Yet this paper also takes you to something similar.
Implementation
#include <iostream>
#include <cstring>
#include <numeric>
#include <cmath>
using namespace std;
const int MOD = 1e9 + 7;
const int LIM = 1e7 + 17;
const int SQRT_LIM = ceil(sqrt(LIM) + 1) + 1;
int euler[SQRT_LIM];
void sieve_phi(int n)
{
iota(euler, euler + n + 1, 0);
for (int x = 2; x <= n; x++) if (euler[x] == x)
for (int j = x; j <= n; j += x)
euler[j] -= euler[j] / x;
}
int solve(int n)
{
sieve_phi(ceil(sqrt(n) + 1) + 1);
long long res = 0;
for (int p = 2; p * p <= n; ++p)
res += 1LL * euler[p] * (n / (p * p));
res %= MOD;
return res;
}
int main()
{
ios::sync_with_stdio(false);
cin.tie(NULL);
int n;
cin >> n;
cout << solve(n);
return 0;
}
#include <iostream>
#include <cstring>
#include <numeric>
#include <vector>
#include <cmath>
using namespace std;
const int MOD = 1e9 + 7;
vector<int> lpf;
vector<int> prime;
vector<int> euler;
void linear_sieve_phi(int n)
{
lpf.assign(n + 1, 0);
euler.assign(n + 1, 1);
for (int x = 2; x <= n; ++x)
{
if (lpf[x] == 0)
{
prime.push_back(lpf[x] = x);
euler[x] = x - 1;
}
for (int i = 0; i < prime.size() && x * prime[i] <= n; ++i)
{
lpf[x * prime[i]] = prime[i];
if (x % prime[i] == 0) {
euler[x * prime[i]] = euler[x] * prime[i];
break;
}
euler[x * prime[i]] = euler[x] * euler[prime[i]];
}
}
}
int solve(int n)
{
linear_sieve_phi(ceil(sqrt(n) + 1) + 1);
long long res = 0;
for (int p = 2; p * p <= n; ++p)
res += 1LL * euler[p] * (n / (p * p));
res %= MOD;
return res;
}
int main()
{
ios::sync_with_stdio(false);
cin.tie(NULL);
int n;
cin >> n;
cout << solve(n);
return 0;
}
Complexity
The complexity is depended on euler totient phi sieve and calculating for answer.
But as you can see, both can be done in $$$O(\sqrt n)$$$.
A better solution for general k
Extra task B
Idea
As what clyring decribed here.
We can generalize those definition above into $$$k$$$-term.
Let $$$f_k(n)$$$ is the number of set $$$(a_1, a_2, \dots, a_k, n)$$$ that $$$1 \leq a_1 < a_2 < \dots < a_k \leq n$$$ and $$$(a_1, a_2, \dots, a_k, n)$$$ is a $$$(k+1)$$$-term geometric progression.
Let $$$g_k(n)$$$ is the number of set $$$(a_1, a_2, \dots, a_k, n)$$$ that $$$1 \leq a_1 \leq a_2 \leq \dots \leq a_k \leq n$$$ and $$$(a_1, a_2, \dots, a_k, n)$$$ is a $$$(k+1)$$$-term geometric progression.
Let $$$F_k(n) = \overset{n}{\underset{p=1}{\Large \Sigma}} f_k(p)$$$.
Let $$$s_k(n)$$$ is the number of way to choose $$$p^2$$$ among those $$$k$$$ numbers when you fix squarefree $$$u$$$ (though we are doing in reverse).
$$$s_k(n) = C^{n-1}_{k-1}$$$
$$$f_k(n) = F_k(n) - F_k(n-1) = s_k(g_k(n)) = \underset{d^2 | n}{\Large \Sigma} (\mu \times s_k)(d) = \underset{d^2 | n}{\Large \Sigma} \underset{p | d}{\Large \Sigma} \mu(p) \times s_k\left(\frac{d}{p}\right)$$$
$$$F_k(n) = \overset{n}{\underset{p=1}{\Large \Sigma}} f_k(p)$$$
$$$= F_k(0) + \overset{\left \lfloor \sqrt{n} \right \rfloor}{\underset{d = 1}{\Large \Sigma}}(\mu \times s_k)(d) \times \left \lfloor \frac{n}{d^2} \right \rfloor$$$
$$$= \overset{\left \lfloor \sqrt{n} \right \rfloor}{\underset{d = 1}{\Large \Sigma}} \left \lfloor \frac{n}{d^2} \right \rfloor \times \left ( \underset{p | d}{\Large \Sigma} \mu(p) \times s_k\left(\frac{d}{p}\right) \right)$$$
Implementation
const int LIM = 5e6 + 56;
const int SQRT_LIM = ceil(sqrt(LIM) + 1) + 1;
const int MOD = 1e9 + 7;
/// Precalculating factorials under prime modulo
int fact[SQRT_LIM + 10]; /// fact[n] = n!
int invs[SQRT_LIM + 10]; /// invs[n] = n^(-1)
int tcaf[SQRT_LIM + 10]; /// tcaf[n] = (n!)^(-1)
void precal_nck(int n = SQRT_LIM)
{
fact[0] = fact[1] = 1;
invs[0] = invs[1] = 1;
tcaf[0] = tcaf[1] = 1;
for (int i = 2; i <= n; ++i)
{
fact[i] = (1LL * fact[i - 1] * i) % MOD;
invs[i] = MOD - 1LL * (MOD / i) * invs[MOD % i] % MOD;
tcaf[i] = (1LL * tcaf[i - 1] * invs[i]) % MOD;
}
}
/// Calculating binomial coefficient queries
int nck(int n, int k)
{
k = min(k, n - k);
if (k < 0) return 0;
long long res = fact[n];
res *= tcaf[k]; res %= MOD;
res *= tcaf[n - k]; res %= MOD;
return res;
}
/// Linear Sieve
vector<int> prime; /// prime list = A000040
bool isPrime[SQRT_LIM + 10]; /// characteristic function = A010051
int lpf[SQRT_LIM + 10]; /// lowest prime factor = A020639
int mu[SQRT_LIM + 10]; /// mobius = A008683
void linear_sieve(int n)
{
if (n < 1) return ;
/// Extension Sieve || You can add something more
memset(lpf, 0, sizeof(lpf[0]) * (n + 1));
fill_n(mu, n + 1, 1);
/// Main Sieve || Without this, you barely able to achive linear complexity
prime.clear();
prime.reserve(n / log(n - 1));
memset(isPrime, true, sizeof(isPrime[0]) * (n + 1));
isPrime[0] = isPrime[1] = false;
for (int x = 2; x <= n; ++x) /// For each number
{
if (isPrime[x]) /// Func[Prime]
{
mu[x] = -1;
lpf[x] = x;
prime.push_back(x);
}
for (int p : prime) /// Func[Prime * X] <- Func[Prime]
{
if (p > lpf[x] || x * p > n) break;
isPrime[x * p] = 0;
lpf[x * p] = p;
mu[x * p] = (lpf[x] == p) ? 0 : -mu[x];
}
}
}
/// Divisor sieve
vector<int> divisors[SQRT_LIM];
void precal_div(int n) /// O(n log n)
{
for (int u = n; u >= 1; --u)
{
divisors[u].clear();
for (int v = u; v <= n; v += u)
divisors[v].push_back(u);
}
}
/// Solving for n, k
long long solve(int n, int k)
{
/// We only care for d that 1 <= d <= sqrt(n)
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t);
precal_div(t);
long long res = 0;
for (int d = 1; d * d <= n; ++d) /// For each fixed p^2
{
long long sum = 0;
for (int p : divisors[d]) /// For each (p | d)
sum += mu[d / p] * nck(p - 1, k - 1);
sum %= MOD;
res += sum * (n / (d * d));
}
res %= MOD;
return res;
}
int main()
{
ios::sync_with_stdio(false);
cin.tie(NULL);
/// Assumming constant p = 10^9 + 7
int n, k;
cin >> n >> k;
cout << solve(n, k);
return 0;
}
vector<int> prime; /// prime list = A000040
bool isPrime[SQRT_LIM + 10]; /// characteristic function = A010051
int lpf[SQRT_LIM + 10]; /// lowest prime factor = A020639
void linear_sieve(int n)
{
if (n < 1) return ;
prime.clear();
prime.reserve(n / log(n - 1));
memset(lpf, 0, sizeof(lpf[0]) * (n + 1));
memset(isPrime, true, sizeof(isPrime[0]) * (n + 1));
isPrime[0] = isPrime[1] = false;
for (int x = 2; x <= n; ++x)
{
if (isPrime[x]) /// Func[Prime]
{
lpf[x] = x;
prime.push_back(x);
}
for (int p : prime) /// Func[Prime * X] <- Func[Prime]
{
if (p > lpf[x] || x * p > n) break;
isPrime[x * p] = 0;
lpf[x * p] = p;
}
}
}
long long res[SQRT_LIM + 10];
long long solve(int n, int k)
{
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t);
memset(res, 0, sizeof(res[0]) * (t + 1));
for (int d = 1; d * d <= n; ++d)
res[d] = nck(d - 1, k - 1);
for (int p : prime)
for (int d = t / p; d > 0; --d)
res[d * p] -= res[d];
long long ans = 0;
for (int d = 1; d * d <= n; ++d)
ans += res[d] * (n / (d * d));
ans %= MOD;
return ans;
}
Complexity
The complexity of the first implementation is $$$O(\sqrt{n} \log \sqrt{n})$$$.
The complexity depends on Möbius function, traverse over all divisor of each number and calculating binomial coefficients.
We can sieve for Möbius function in linear.
We only care about $$$d$$$ that $$$1 \leq d \leq \sqrt{n}$$$.
We can use factorial formula for binomial coefficients as $$$1 \leq k \leq n < p$$$
As we only use $$$d$$$ that $$$d^2 \leq n$$$, let $$$t = \sqrt{n}$$$ for convenient.
We can use linear sieve for $$$\mu{d}$$$ in $$$O(t)$$$.
We can use divisor sieve for finding all divisors $$$p$$$ for each $$$v$$$ in $$$O(t \log t)$$$.
We can calculate the result in $$$O(t \log t)$$$.
Therefore the overall complexity is $$$O(t \log t)$$$.
The complexity of the second implementation is $$$O(\sqrt{n} \log \log \sqrt{n})$$$.
The complexity depends on sieve, calculating result and calculating binomial coefficients.
The sieve and binomial coefficients is obviously $$$O(t)$$$ complexity for $$$t = \sqrt{n}$$$.
The calculating part is just simply related to the sum of inversion of primes.
Therefore we achived $$$O(t \log t)$$$ complexity for $$$t = \sqrt{n}$$$.
Solution for duplicates elements in array
Extra task C
Idea
It is no hard to prove that we can use the same algorithm as described in task A, B or in original task.
Literally what we do up to now just optimize the counting part. The counting, which is the core is as the same as original.
The counting part is only about the number of valid sequences bounded by $$$n$$$, or you can say, the number of possible $$$p^2$$$ you can choose when you fix squarefree number $$$u$$$.
And those stuffs above $$$\mu(n)$$$, $$$\phi(n)$$$, sieving or anything else are just optimizations.
The core of calculating is to find out the number of non-decreasing integer sequence of size $$$k$$$ where numbers are in $$$[1, n]$$$.
$$$\binom{n + k - 1}{k}$$$
and when you fix one number as a similar idea:
$$$s_k(n) = \binom{n + k - 2}{k - 1}$$$
Can you prove it ?
It is standard stars and bars problem.
Instead of thinking the number of way to select $$$a_i$$$.
You can think of the number of value $$$a_i$$$ you have used.
Let $$$x_v$$$ is the number of value $$$v$$$ you used, or the number of $$$a_i = v$$$ in this sequence.
Each sequence can be represented uniquely as a sequence of $$$x_1, x_2, \dots, x_n$$$.
Since we selected $$$k$$$ in total we have $$$x_1 + x_2 + \dots + x_n = k$$$.
Let $$$x_v$$$ is the number of value $$$v$$$ you used, or the number of $$$a_i = v$$$ in this sequence.
For each box $$$v = 1 \dots n$$$, draw $$$x_v$$$ stars ⭐.
You can make $$$k - 1$$$ cut between these box (not cut into the box), and split into $$$k$$$ intervals.
The sum of each interval is the value of selected element.
Since there are $$$n + k - 1$$$ boxes, and you use $$$k - 1$$$ cuts, there are the total of $$$\binom{n + k - 1}{k}$$$ different ways.
Now it is done, just that it !
The idea is the same as what clyring described here but represented in the other way.
Implementation
int fact[SQRT_LIM + 10];
int invs[SQRT_LIM + 10];
int tcaf[SQRT_LIM + 10];
void precal_nck(int n = SQRT_LIM)
{
fact[0] = fact[1] = 1;
invs[0] = invs[1] = 1;
tcaf[0] = tcaf[1] = 1;
for (int i = 2; i <= n; ++i)
{
fact[i] = (1LL * fact[i - 1] * i) % MOD;
invs[i] = MOD - 1LL * (MOD / i) * invs[MOD % i] % MOD;
tcaf[i] = (1LL * tcaf[i - 1] * invs[i]) % MOD;
}
}
int nck(int n, int k)
{
k = min(k, n - k);
if (k < 0) return 0;
long long res = fact[n];
res *= tcaf[k]; res %= MOD;
res *= tcaf[n - k]; res %= MOD;
return res;
}
bool is_squarefree[LIM];
int solve(int n, int k)
{
memset(is_squarefree, true, sizeof(is_squarefree[0]) * (n + 1));
precal_nck(2 * n + 1);
long long res = 0;
for (int i = 1, j; i <= n; ++i) if (is_squarefree[i])
{
for (j = 1; i * j * j <= n; ++j)
is_squarefree[i * j * j] = false;
res += nck(k + j - 2, k);
}
res %= MOD;
return res;
}
const int LIM = 5e6 + 56;
const int SQRT_LIM = ceil(sqrt(LIM) + 1) + 1;
const int MOD = 1e9 + 7;
/// Precalculating factorials under prime modulo
int fact[SQRT_LIM + 10]; /// fact[n] = n!
int invs[SQRT_LIM + 10]; /// invs[n] = n^(-1)
int tcaf[SQRT_LIM + 10]; /// tcaf[n] = (n!)^(-1)
void precal_nck(int n = SQRT_LIM)
{
fact[0] = fact[1] = 1;
invs[0] = invs[1] = 1;
tcaf[0] = tcaf[1] = 1;
for (int i = 2; i <= n; ++i)
{
fact[i] = (1LL * fact[i - 1] * i) % MOD;
invs[i] = MOD - 1LL * (MOD / i) * invs[MOD % i] % MOD;
tcaf[i] = (1LL * tcaf[i - 1] * invs[i]) % MOD;
}
}
/// Calculating binomial coefficient queries
int nck(int n, int k)
{
k = min(k, n - k);
if (k < 0) return 0;
long long res = fact[n];
res *= tcaf[k]; res %= MOD;
res *= tcaf[n - k]; res %= MOD;
return res;
}
/// Linear Sieve
vector<int> prime; /// prime list = A000040
bool isPrime[SQRT_LIM + 10]; /// characteristic function = A010051
int lpf[SQRT_LIM + 10]; /// lowest prime factor = A020639
int mu[SQRT_LIM + 10]; /// mobius = A008683
void linear_sieve(int n)
{
if (n < 1) return ;
/// Extension Sieve || You can add something more
memset(lpf, 0, sizeof(lpf[0]) * (n + 1));
fill_n(mu, n + 1, 1);
/// Main Sieve || Without this, you barely able to achive linear complexity
prime.clear();
prime.reserve(n / log(n - 1));
memset(isPrime, true, sizeof(isPrime[0]) * (n + 1));
isPrime[0] = isPrime[1] = false;
for (int x = 2; x <= n; ++x) /// For each number
{
if (isPrime[x]) /// Func[Prime]
{
mu[x] = -1;
lpf[x] = x;
prime.push_back(x);
}
for (int p : prime) /// Func[Prime * X] <- Func[Prime]
{
if (p > lpf[x] || x * p > n) break;
isPrime[x * p] = 0;
lpf[x * p] = p;
mu[x * p] = (lpf[x] == p) ? 0 : -mu[x];
}
}
}
/// Divisor sieve
vector<int> divisors[SQRT_LIM];
void precal_div(int n) /// O(n log n)
{
for (int u = n; u >= 1; --u)
{
divisors[u].clear();
for (int v = u; v <= n; v += u)
divisors[v].push_back(u);
}
}
/// Solving for n, k
long long solve(int n, int k)
{
/// We only care for d that 1 <= d <= sqrt(n)
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t);
precal_div(t);
long long res = 0;
for (int d = 1; d * d <= n; ++d) /// For each fixed p^2
{
long long sum = 0;
for (int p : divisors[d]) /// For each (p | d)
sum += mu[d / p] * nck(d + k - 2, k - 1);
sum %= MOD;
res += sum * (n / (d * d));
}
res %= MOD;
return res;
}
int main()
{
ios::sync_with_stdio(false);
cin.tie(NULL);
/// Assumming constant p = 10^9 + 7
int n, k;
cin >> n >> k;
cout << solve(n, k);
return 0;
}
int fact[SQRT_LIM + 10];
int invs[SQRT_LIM + 10];
int tcaf[SQRT_LIM + 10];
void precal_nck(int n = SQRT_LIM)
{
fact[0] = fact[1] = 1;
invs[0] = invs[1] = 1;
tcaf[0] = tcaf[1] = 1;
for (int i = 2; i <= n; ++i)
{
fact[i] = (1LL * fact[i - 1] * i) % MOD;
invs[i] = MOD - 1LL * (MOD / i) * invs[MOD % i] % MOD;
tcaf[i] = (1LL * tcaf[i - 1] * invs[i]) % MOD;
}
}
int nck(int n, int k)
{
k = min(k, n - k);
if (k < 0) return 0;
long long res = fact[n];
res *= tcaf[k]; res %= MOD;
res *= tcaf[n - k]; res %= MOD;
return res;
}
vector<int> prime; /// prime list = A000040
bool isPrime[SQRT_LIM + 10]; /// characteristic function = A010051
int lpf[SQRT_LIM + 10]; /// lowest prime factor = A020639
void linear_sieve(int n)
{
if (n < 1) return ;
prime.clear();
prime.reserve(n / log(n - 1));
memset(lpf, 0, sizeof(lpf[0]) * (n + 1));
memset(isPrime, true, sizeof(isPrime[0]) * (n + 1));
isPrime[0] = isPrime[1] = false;
for (int x = 2; x <= n; ++x)
{
if (isPrime[x]) /// Func[Prime]
{
lpf[x] = x;
prime.push_back(x);
}
for (int p : prime) /// Func[Prime * X] <- Func[Prime]
{
if (p > lpf[x] || x * p > n) break;
isPrime[x * p] = 0;
lpf[x * p] = p;
}
}
}
long long res[SQRT_LIM + 10];
long long solve(int n, int k)
{
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t + k);
precal_nck(t + k);
memset(res, 0, sizeof(res[0]) * (t + 1));
for (int d = 1; d * d <= n; ++d)
res[d] = nck(d + k - 2, k - 1);
for (int p : prime)
for (int d = t / p; d > 0; --d)
res[d * p] -= res[d];
long long ans = 0;
for (int d = 1; d * d <= n; ++d)
ans += res[d] * (n / (d * d));
ans %= MOD;
return ans;
}
Complexity
In the first implementation it is obviously linear.
The complexity is depended mainly on sieving part and binomial coefficient precalculation
Sieving can be done in linear
Binomial coefficient can also done been linear assumming $$$p$$$ is a sufficient large prime greater than $$$max(n, k)$$$
The second and third implementation is also easy to show its complexity
Like task B the algorithm is depended on calculating result and binomial coefficient precalculation
Yet we proved calculating result only take $$$O(\sqrt{n} \log \sqrt{n})$$$ [2] and $$$O(\sqrt{n} \log \log \sqrt{n})$$$ [3]
And we need to calculate binomial coefficient upto $$$\binom{d + k - 2}{k - 1}$$$ for $$$d = \left \lfloor \sqrt{n} \right \rfloor$$$ so we need $$$O(\sqrt{n} + k)$$$ precalculation time.
Sadly, since $$$k \leq n$$$. We also conclude that the complexity is $$$O(n)$$$, and even worse it also contains large constant factor compared to that in the first implementation.
But it is still effecient enough to solve problem where $$$k$$$ is small.
You optimize the problem using this
Then you can solve the problem in $$$O(\sqrt{n} \log \log \sqrt{n} + \sqrt{p} \log^q p)$$$ ($$$q$$$ is depended on how you implement it).
Solution when there are no restriction between k, n, p
Extra task D
Idea
So first of all, the result do depend on how you calculate binomial coefficient but they are calculated independently even if you can somehow manage to use the for loop of binomial coeffient go first.
Therefore even if there is no restriction between $$$k, n, p$$$, the counting part and the algorithm doesnt change.
You just need to change how you calculate binomial coefficient, and that is all for this task.
Let just ignore the fact that though this need more detail, but as the blog is not about nck problem I will just make it quick.
For large prime $$$p > max(n, k)$$$
- Just using normal combinatorics related to factorial (since $$$p > max(n, k)$$$ nothing will affect the result)
- For taking divides under modulo you can just take modular inversion (as a prime always exist such number)
- Yet this is standard problem, just becareful of the overflow part
- You can also optimize by precalculating factorial, inversion number and inversion factorial in linear too
For general prime $$$p$$$
- We can just ignore factors $$$p$$$ in calculating $$$n!$$$.
- You also need to know how many times factor $$$p$$$ appears in $$$1 \dots n$$$
- Then combining it back when calculating for the answer.
- If we dont do this $$$n!$$$ become might divides some factors of $$$p$$$.
- By precalculation you can answer queries in $$$O(1)$$$
For squarefree $$$p$$$
- Factorize $$$p = p_1 \times p_2 \times p_q$$$ that all $$$p_i$$$ is prime.
- Ignore all factors $$$p_i$$$ when calculate $$$n!$$$.
- Remember to calculate how many times factors $$$p_i$$$ appear in $$$1 \dots n$$$.
- When query for the answer we just combine all those part back.
- Remember you can just take modulo upto $$$\phi(p)$$$ which you can also calculate while factorizing $$$p$$$.
- Remember that $$$n!$$$ must not divides any factor $$$p_i$$$ otherwise you will get wrong answer.
- By precalculation you can answer queries in $$$O(\log p)$$$
For general positive modulo $$$p$$$
- Factorize $$$p = p_1^{f_1} \times p_2^{f_2} \times p_q^{f_q}$$$ that all $$$p_i$$$ is unique prime.
- We calculate $$$C(n, k)$$$ modulo $$$p_i^{f_i}$$$ for each $$$i = 1 \dots q$$$.
- To do that, we need to calculate $$$n!$$$ modulo $$$p_i^{f_i}$$$ which is described here.
- To get the final answer we can use CRT.
- Yet this is kinda hard to code and debug also easy to make mistake so you must becareful
- I will let the implementation for you lovely readers.
- Yet depends on how you calculate stuffs that might increase your query complexity
- There are few (effective or atleast fully correct) papers about this but you can read the one written here
Implementation
/// SPyofgame linear template for precalculating factorials under large prime modulo
int fact[SQRT_LIM + 10]; /// fact[n] = n!
int invs[SQRT_LIM + 10]; /// invs[n] = n^(-1)
int tcaf[SQRT_LIM + 10]; /// tcaf[n] = (n!)^(-1)
void precal_nck(int n = SQRT_LIM)
{
fact[0] = fact[1] = 1;
invs[0] = invs[1] = 1;
tcaf[0] = tcaf[1] = 1;
for (int i = 2; i <= n; ++i)
{
fact[i] = (1LL * fact[i - 1] * i) % MOD;
invs[i] = MOD - 1LL * (MOD / i) * invs[MOD % i] % MOD;
tcaf[i] = (1LL * tcaf[i - 1] * invs[i]) % MOD;
}
}
/// Calculating binomial coefficient queries
int nck(int n, int k)
{
k = min(k, n - k);
if (k < 0) return 0;
long long res = fact[n];
res *= tcaf[k]; res %= MOD;
res *= tcaf[n - k]; res %= MOD;
return res;
}
vector<int> factor;
int factorize(int n) /// Calculating phi(n) while factorizing (n) in O(sqrt n)
{
factor.clear();
int phi = n;
if (!(n & 1))
{
n >>= __builtin_ctz(n);
factor.push_back(2);
phi -= phi / 2;
}
for (int x = 3; x * x <= n; x += 2)
{
if (n % x == 0)
{
do n /= x; while (n % x == 0);
factor.push_back(x);
phi -= phi / x;
}
}
if (n > 1)
{
factor.push_back(n);
phi -= phi / n;
}
return phi;
}
int f[LIM]; /// f[x] = nck(n, x)
int fact[LIM]; /// n!
int tcaf[LIM]; /// n!^(-1)
int divp[LIM]; /// x but ignore all factors p[i]
int cntp[LIM][LOG_LIM]; /// cntp[x][i] = Number of time factor p[i] appear in 1..x
void precal(int MOD) /// Calculate f[x] for all x = 1 -> n in O(n log mod + sqrt mod)
{
int PHIMOD = factorize(MOD);
for (int x = 1; x <= n; ++x) /// For each part x in n!
{
int &t = divp[x] = x;
for (int i = 0; i < factor.size(); ++i) /// Ignore all factor p[i] of p
{
cntp[x][i] = cntp[x - 1][i];
for (; t % factor[i] == 0; t /= factor[i]) /// Count how many times p[i] appears in 1..n
++cntp[x][i];
}
}
fact[0] = fact[1] = 1;
tcaf[0] = tcaf[1] = 1;
for (int x = 2; x <= n; ++x) /// Finding n! and n!^(-1)
{
fact[x] = (1LL * fact[x - 1] * divp[x]) % MOD;
tcaf[x] = powMOD(fact[x], PHIMOD - 1, MOD);
}
memset(f, 0, sizeof(f[0]) * k);
for (int x = k; x <= n; ++x)
{
/// Calculate nck % p normally
f[x] = fact[x];
mulMOD(f[x], tcaf[k], MOD);
mulMOD(f[x], tcaf[x - k], MOD);
for (int i = 0; i < factor.size(); ++i) /// Bringing those factors back
{
int p = cntp[x][i] - cntp[k][i] - cntp[x - k][i];
f[x] = 1LL * f[x] * powMOD(factor[i], p, MOD) % MOD;
}
}
}
Complexity
In the first implementation it is obviously linear.
Each number is visited once and the transistion only cost $$$O(1)$$$
And for the second implementation.
As you might notice $$$p$$$ is atleast prime, or atmost a squarefree number.
Since the complexity also depends on the factors of $$$p$$$, in the worst case, $$$p = 2 \times 3 \times 5 \times 7 \times 11 \times 13 \times 17 \times 19 \times \dots$$$ having most factors that is about $$$\log(p)$$$.
The total complexity also depended on factorizing $$$p$$$ and calculating $$$\phi{p}$$$.
Yet for normal trial division you can solve in $$$O(\sqrt{p})$$$ time.
So you got $$$O(n \times \log p + \sqrt{p})$$$ in final.
You can also do all of this in $$$O(n \log p) + Õ(p^{1/4})$$$
Though you can still optimize this but by doing that why dont you just go straight up to solve for non squarefree $$$p$$$ too ?
Solution when numbers are also bounded by negative number
Extra task E
Idea
Yet this is the same as extra task C where only the counting part should be changed.
As we only care about integer therefore let not use complex math into this problem.
If there exist a negative number and a positive number, the product will be negative thus the sequence will not satisfied.
Becareful, there are the zeros too.
When the numbers are all unique, or $$$-n \leq a_1 < a_2 < \dots < a_k \leq n$$$
The sequence contain positive integers only.
The sequence contain negative integers only.
The sequence contain a zero, and the rest are positive integers.
The sequence contain a zero, and the rest are negative integers.
Thus give us the formula of $$$task_E(n, k) = 2 \times task_B(n, k) + 2 \times task_B(n, k - 1)$$$.
There cant be both negative number and positive number at the same time
You can flip all the sign to positive, thus negative integers sequence are now the same as positive sequences.
Without zero, the problem has no difference from the original task.
With zero, any product with it is already a perfect square.
Hence zeros dont affect anything other than the remaining number you have not selected.
The sequence contain both negative and positive that satified the problem is not existed as their product is not a perfect square (we only care in integers not complex)
The sequence contain positive integers only:
- This is the original task as positive integers only means $$$1 \leq a_i \leq n$$$.
The sequence contain negative integers only:
- Flip all the sign doesnt change the product for each pairs.
- The problem now is the same as the sequence contain positive integers only.
The sequence contain a zero, and the rest are positive integers:
- The product of a zero with any number is still a perfect square.
- So the zero only reduce the number you have to select.
- Therefore this is same as the original task but now you have to select $$$k - 1$$$ numbers.
The sequence contain a zero, and the rest are negative integers:
- Just also flip the sign, the answer is the same as above.
Remember that when $$$k = 0$$$ the answer is $$$0$$$ otherwise you might somewhat having wrong result for negative number in binomial coefficients formula
So what if I mix the problem with task C too ?
When the numbers can have duplicates, or $$$-n \leq a_1 \leq a_2 \leq \dots \leq a_k \leq n$$$
The sequence contain zeros only.
The sequence contain positive integers only.
The sequence contain negative integers only.
The sequence contain $$$0 < t < k$$$ positive integers, and the rest $$$k - t$$$ are zeros.
The sequence contain $$$0 < t < k$$$ positive integers, and the rest $$$k - t$$$ are zeros.
Yet once again you can simplified it with less cases for easier calculation.
The sequence contain zeros only.
The sequence contain $$$0 < t \leq k$$$ non-zero integers, and the rest $$$k - t$$$ are zeros.
Thus give us the formula of $$$task_E(n, k) = 1 + 2 \times \overset{k}{\underset{t = 1}{\Large \Sigma}} task_B(n, t)$$$.
You might wondering where is the $$$2 \times \dots$$$ come from.
Yet it is the same as unique number cases.
You have the answer for positive numbers are the same as negative number when you flip the sign.
So you have $$$1 \times \dots$$$ for positive, and $$$1 \times \dots$$$ for negative.
Therefore you have $$$2 \times \dots$$$ in total
You might wondering why we dont have a formula like this:
$$$task_E(n, k) = 1 + \overset{k}{\underset{t = 1}{\Large \Sigma}} C^t_k \times task_B(n, t)$$$.
But you know, since the sequence is ordered.
No matter which number you selected as zero there is one and only one way to sort it back.
You see, there is only one case for fully zeros sequence.
But this give you a $$$O(k)$$$ solution.
You can do better with math
Isnt this sum is familiar to some standard math problem related to binomial coefficient ?
This formula is related to $$$\overset{k}{\underset{t = 0}{\Large \Sigma}} \binom{n + t}{t} = \binom{n + k + 1}{k}$$$
$$$s_k(n) = \overset{k}{\underset{t = 1}{\Large \Sigma}} \binom{d + t - 2}{t - 1} \times 2 = \binom{d + k - 1}{k - 1} \times 2$$$
Implementation
long long res[SQRT_LIM + 10];
long long solve(int n, int k)
{
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t);
memset(res, 0, sizeof(res[0]) * (t + 1));
for (int d = 1; d * d <= n; ++d)
{
res[d] += (k >= 1) * nck(d - 1, k - 1) * 2;
res[d] += (k >= 2) * nck(d - 1, k - 2) * 2;
}
for (int p : prime)
for (int d = t / p; d > 0; --d)
res[d * p] -= res[d];
long long ans = 0;
for (int d = 1; d * d <= n; ++d)
ans += res[d] * (n / (d * d));
ans %= MOD;
return ans;
}
And for duplicates (mixed with task C), we have:
bool is_squarefree[LIM];
int brute(int n, int k)
{
memset(is_squarefree, true, sizeof(is_squarefree[0]) * (n + 1));
precal_nck(2 * n + 1);
long long res = 0;
for (int i = 1, j; i <= n; ++i) if (is_squarefree[i])
{
for (j = 1; i * j * j <= n; ++j)
is_squarefree[i * j * j] = false;
res += nck(k + j - 2, k);
}
res %= MOD;
return res;
}
long long solve(int n, int k)
{
long long res = 1;
for (int t = 1; t <= k; ++t)
res += brute(n, t) * 2;
res %= MOD;
return res;
}
long long res[SQRT_LIM + 10];
long long brute(int n, int k)
{
/// We only care for d that 1 <= d <= sqrt(n)
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t + k);
precal_div(t);
long long res = 0;
for (int d = 1; d * d <= n; ++d) /// For each fixed p^2
{
long long sum = 0;
for (int p : divisors[d]) /// For each (p | d)
sum += mu[d / p] * nck(d + k - 2, k - 1);
sum %= MOD;
res += sum * (n / (d * d));
}
res %= MOD;
return res;
}
long long solve(int n, int k)
{
long long res = 1;
for (int t = 1; t <= k; ++t)
res += brute(n, t) * 2;
res %= MOD;
return res;
}
long long res[SQRT_LIM + 10];
long long brute(int n, int k)
{
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t + k);
memset(res, 0, sizeof(res[0]) * (t + 1));
for (int d = 1; d * d <= n; ++d)
res[d] = nck(d + k - 2, k - 1);
for (int p : prime)
for (int d = t / p; d > 0; --d)
res[d * p] -= res[d];
long long ans = 0;
for (int d = 1; d * d <= n; ++d)
ans += res[d] * (n / (d * d));
ans %= MOD;
return ans;
}
long long solve(int n, int k)
{
long long res = 1;
for (int t = 1; t <= k; ++t)
res += brute(n, t) * 2;
res %= MOD;
return res;
}
long long res[SQRT_LIM + 10];
long long solve(int n, int k)
{
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t + k);
memset(res, 0, sizeof(res[0]) * (t + 1));
for (int d = 1; d * d <= n; ++d)
for (int t = 1; t <= k; ++t)
res[d] += nck(d + t - 2, t - 1) * 2;
for (int p : prime)
for (int d = t / p; d > 0; --d)
res[d * p] -= res[d];
long long ans = 1;
for (int d = 1; d * d <= n; ++d)
ans += res[d] * (n / (d * d));
ans %= MOD;
return ans;
}
long long res[SQRT_LIM + 10];
long long solve(int n, int k)
{
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t + k);
memset(res, 0, sizeof(res[0]) * (t + 1));
for (int d = 1; d * d <= n; ++d)
res[d] += nck(d + k - 1, k - 1) * 2;
for (int p : prime)
for (int d = t / p; d > 0; --d)
res[d * p] -= res[d];
long long ans = 1;
for (int d = 1; d * d <= n; ++d)
ans += res[d] * (n / (d * d));
ans %= MOD;
return ans;
}
Complexity
So.. you might be tired of calculating complexity again and again for those are too familar to you.
So I gonna skip this as the proof is as the same as what you can read above.
But as a bonus, you optimize the problem using this
Then you can solve the problem in $$$O(\sqrt{n} \log \log \sqrt{n} + \sqrt{p} \log^q \ p)$$$ ($$$q$$$ is depended on how you implement it).
Conclusion
For unique array, the complexity can be $$$O(\sqrt{n} \log \log \sqrt{n})$$$
For duplicate array, the complexity can be $$$O(\sqrt{n} \log \log \sqrt{n} + k)$$$
For duplicate array and small prime $$$p > max(n, k)$$$, the complexity can be $$$O(\sqrt{n} \log \log \sqrt{n} + \sqrt{p} \log^q \ p)$$$
Solution when numbers are also bounded by a specific range
Extra task F
Idea
We split into 4 cases
Case $$$1$$$: $$$L \leq 0 \leq R$$$
Case $$$2$$$: $$$0 \leq L \leq R$$$
Case $$$3$$$: $$$L \leq R \leq 0$$$
Case $$$4$$$: $$$L > R$$$
You can easilier solve for each cases in linear
It is $$$task_B(|L|, k) + task_B(|L|, k - 1) + task_B(|R|, k) + task_B(|R|, k - 1)$$$
This is what we are going to discuss about.
A simple formula is fox each squarefree $$$u$$$ in $$$1 \dots R$$$.
Count the number of $$$p^2$$$ in $$$[L, R]$$$.
Using combinatorics to count the number of way to select $$$p^2$$$ for each $$$u$$$.
The answer is sum over all result for $$$u$$$.
Same as case 2 when you flip the sign but also remember to swap $$$(L, R)$$$ too.
Just simply $$$0$$$.
Now assumming $$$1 \leq L \leq R$$$, here is how we solve it.
As a simple approach, we can just do like original problem for general $$$k$$$.
We just need to iterative for each fixed squarefree $$$u$$$ and count the number of way to select $$$p^2$$$ as usual but a bit of change.
Highly notice that fixed squarefree $$$u$$$ might also in $$$[1, L - 1]$$$ and not just in $$$[L, R]$$$.
Also remember, just count the $$$p^2$$$ in $$$[L, R]$$$.
By doing so trivially we can have the complexity of $$$O((R - L + 1) \log R)$$$ time and $$$O(R)$$$ space.
Ofcourse we can just optimize it by using squarefree sieve in $$$O(R - L + 1)$$$ time.
Yet again, let use linear sieve and linear precalculation for binomial coefficient too.
We just need $$$O(R - L + 1)$$$ space.
But this lead us to another problem, how can we check if $$$[u]$$$ is squarefree when we literally remove the ability to check for any $$$u < L$$$ ???
A simple approach is just to redefine the squarefree sieve.
You see, when we go to the first $$$u$$$ that not marked in the sieve, we mark all form of $$$u \times p^2$$$ except $$$u$$$.
Why dont we change a little bit, let say, we check for the first $$$u \times c^2$$$ for some $$$c > 0$$$. And then mark all form of $$$u \times p^2$$$ except $$$u \times c^2$$$.
Yes, we just move the $$$u$$$ to the a number $$$u \times c^2$$$ that suitable for it and marked only once for each unique $$$u$$$.
How do we find $$$c$$$ ?
We can take $$$c$$$ as the smallest number that $$$u \times c^2 \geq L$$$.
So by this definition, $$$c = \left \lceil \sqrt{\frac{L}{u}} \right \rceil$$$.
Is it over ? Cant we do better ?
- Yes
- Yes
- Yes
- Yes
Factorizing numbers.
Well, not sure what "bad" you are talking about.
But if it is about the complexity then each number $$$x$$$ in $$$[L, R]$$$ can be factorized in $$$O(\sqrt{x})$$$.
Yet for $$$L \approx 1$$$ this will about to be $$$O(R \sqrt{R} \log{R})$$$ if you implement badly.
Though it is so, let just ignore the time complexity for now.
Yes, you can use pollard rho to factorize in expected time $$$Õ(x^{1/4})$$$.
Also remember to combine it with miller rabin for prime cases.
For more detail, you can read this https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm
Yes, you can sieve for first $$$\sqrt{r}$$$ number for primes.
And for each number in $$$[L, R]$$$ you just need $$$min(\sqrt{x}, \sqrt{r} / \log {r})$$$ time to factorize it.
But in practice this is significant fast.
Though I am not quite sure what is the actual complexity of doing so.
There are rare amount of papers related about this specific topics and even worse that most of it are not well researched.
You can just iterative for each form of $$$u \times p^2$$$ that lies in $$$[L, R]$$$.
Also remember that each number should be visited once.
Not sure what you mean though.
When you factorize $$$x$$$, you can get the form $$$x = u \times p^2$$$.
Just remember dont take the whole $$$p^2$$$ but take $$$p$$$ and iterate using $$$p^2$$$, each step increase $$$p$$$ by one.
But for each prime factor $$$f$$$ in $$$x$$$ that appear more than once, we reduce $$$x$$$ by $$$f^2$$$ and increase $$$p$$$ by $$$f$$$ where $$$p$$$ is initially $$$1$$$.
- Yes
- Yes
- Yes
- Yes
- Yes
- Yes
- Yes
So what is that the better solution and how do we implement it steps by steps ?
Sieve all prime up to $$$\sqrt{R}$$$.
Initialize $$$u_x$$$ — the squarefree part of each number $$$x$$$ in $$$[L, R]$$$ is $$$x$$$ itself.
Also initialize $$$p_x$$$ — the squarefree factor part of each number $$$x$$$ in $$$[L, R]$$$ is simply $$$1$$$.
Now for each prime $$$p$$$ we sieved, you factorize each number $$$x$$$, reduce $$$u_x$$$ by $$$p^2$$$ while increase $$$p_x$$$ by $$$p$$$.
Now for each number $$$x$$$ in $$$[L, R]$$$, you can do squarefree sieve like normal, with the steps of $$$p_x$$$ initially with $$$u_x$$$.
Implementation
Let $$$Z = max(|L|, |R|)$$$
bool is_squarefree[LIM + 10];
int solve(int l, int r, int k)
{
if (l > r) return 0;
if (r < 0) return solve(-r, -l, k);
if (l <= 0 && 0 <= r)
{
long long res = 0LL + taskB(abs(l), k) + taskB(abs(l), k - 1) + taskB(abs(r), k) + taskB(abs(r), k - 1);
while (res >= MOD) res -= MOD;
return res;
}
memset(is_squarefree, true, sizeof(is_squarefree[0]) * (r + l - 1));
precal_nck(r - l + 1);
int tot = r - l + 1;
long long res = 0;
for (int i = 1; i <= r; ++i)
{
int j = sqrt(l / i);
while (i * j * j > l) --j;
while (i * j * j < l) ++j;
if (is_squarefree[i * j * j - l])
{
int cnt = 0;
for (; i * j * j <= r; ++j)
{
++cnt;
--tot;
is_squarefree[i * j * j - l] = false;
}
res += nck(cnt, k);
if (tot == 0) break;
}
}
res %= MOD;
return res;
}
long long res[SQRT_LIM + 10];
long long taskB(int n, int k)
{
if (k == 0) return 0;
if (k == 1) return n;
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t);
memset(res, 0, sizeof(res[0]) * (t + 1));
for (int d = 1; d * d <= n; ++d)
res[d] = nck(d - 1, k - 1);
for (int p : prime)
for (int d = t / p; d > 0; --d)
res[d * p] -= res[d];
long long ans = 0;
for (int d = 1; d * d <= n; ++d)
ans += res[d] * (n / (d * d));
ans %= MOD;
return ans;
}
bool is_squarefree[LIM + 10];
int solve(int l, int r, int k)
{
if (l > r) return 0;
if (r < 0) return solve(-r, -l, k);
if (l <= 0 && 0 <= r)
{
long long res = 0LL + taskB(abs(l), k) + taskB(abs(l), k - 1) + taskB(abs(r), k) + taskB(abs(r), k - 1);
while (res >= MOD) res -= MOD;
return res;
}
int t = ceil(sqrt(r + 1) + 1) + 1;
memset(is_squarefree, true, sizeof(is_squarefree[0]) * (r + l - 1));
precal_nck(r - l + 1);
linear_sieve(t);
long long res = 0;
for (int x = l; x <= r; ++x) if (is_squarefree[x - l])
{
int u = x;
int v = 1;
for (int p : prime)
{
int q = p * p;
if (q > u) break;
while (u % q == 0)
{
u /= q;
v *= p;
}
}
int cnt = 0;
for (; u * v * v <= r; ++v)
{
is_squarefree[u * v * v - l] = false;
++cnt;
}
res += nck(cnt, k);
}
res %= MOD;
return res;
}
long long res[SQRT_LIM + 10];
long long taskB(int n, int k)
{
if (k == 0) return 0;
if (k == 1) return n;
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t);
memset(res, 0, sizeof(res[0]) * (t + 1));
for (int d = 1; d * d <= n; ++d)
res[d] = nck(d - 1, k - 1);
for (int p : prime)
for (int d = t / p; d > 0; --d)
res[d * p] -= res[d];
long long ans = 0;
for (int d = 1; d * d <= n; ++d)
ans += res[d] * (n / (d * d));
ans %= MOD;
return ans;
}
#include <algorithm>
#define all(x) (x).begin(), (x).end()
bool is_squarefree[LIM + 10];
int squarefree[LIM + 10];
int sqf_factor[LIM + 10];
int solve(int l, int r, int k)
{
if (l > r) return 0;
if (r < 0) return solve(-r, -l, k);
if (l <= 0 && 0 <= r)
{
long long res = 0LL + taskB(abs(l), k) + taskB(abs(l), k - 1) + taskB(abs(r), k) + taskB(abs(r), k - 1);
while (res >= MOD) res -= MOD;
return res;
}
int t = ceil(sqrt(r + 1) + 1) + 1;
memset(is_squarefree, true, sizeof(is_squarefree[0]) * (r + l - 1));
precal_nck(r - l + 1);
linear_sieve(t);
for (int x = l; x <= r; ++x)
{
squarefree[x - l] = x;
sqf_factor[x - l] = 1;
}
long long res = 0;
for (int p : prime)
{
for (int q = p * p, x = max(q, (l + q - 1) / q * q); x <= r; x += q)
{
while (squarefree[x - l] % q == 0)
{
squarefree[x - l] /= q;
sqf_factor[x - l] *= p;
}
}
}
for (int x = l; x <= r; ++x)
{
int u = squarefree[x - l];
int p = sqf_factor[x - l];
if (is_squarefree[x - l])
{
int cnt = 0;
for (; u * p * p <= r; ++p)
{
is_squarefree[u * p * p - l] = false;
++cnt;
}
res += nck(cnt, k);
}
}
res %= MOD;
return res;
}
Complexity
Well, you might feel bored of long chain of proving for just a simple complexity, now I just make it quick.
Which part you find hard to understand, you can comment below if you cant accept this as an exercise.
For the first implemenetation it is obviously linear time in $$$max(|L|, |R|)$$$ and linear space in $$$|L - R|$$$
For the second implementation it is harder to prove. Yet it is bounded by the complexity of factorizing each number in range $$$[L, R]$$$.
For the third implementation it is based on the fact that:
- The sum of inversion of prime squared is constant.
- Each number $$$x$$$ will not be factorized more than $$$O(\log(\sqrt{x}))$$$.
- You can precalculate binomial coefficient in linear and query for constant time.
- You can sieve in linear, and only sieve to the square root of $$$R$$$.
- For each number in the range we care, we just need to visit once.
Yet you can furthur optimize it to $$$O\left(\frac{\sqrt{R} \log \log \sqrt{R}}{\log \sqrt{R}} + \frac{R - L}{\pi^2}\right)$$$.
*I am working on a $$$O {\Large ( } Õ(\sqrt{R}) + Õ(\sqrt{R - L}) {\Large ) }$$$ solution right now as it is much harder to apply the same difficult math idea described in task B.
Solution when the product you must find is a perfect cube
Extra task G
Idea
For $$$k < 3$$$, the formula is easy to shown as the second condition always satisfied.
Obviously $$$res = 0$$$.
Obviously $$$res = n$$$.
Obviously $$$res = \frac{n(n-1)}{2}$$$.
Becareful of overflow.
For $$$k > 3$$$, you can prove that every number you selected must share same cubefree therefore just make a cubefree sieve in linear.
$$$a_1 < a_2 < \dots < a_k$$$.
What if $$$a_1 \times a_2 \times a_3$$$ is a perfect cube and also does $$$a_1 \times a_2 \times a_4$$$
Let $$$a_x = b_x \times c_x^3$$$ for $$$b_x$$$ cubefree and $$$c_x$$$ is some integer.
Let $$$b_{12} = b_1 \times b_2$$$
We have $$$b_{12} \times b_3$$$ and $$$b_{12} \times b_4$$$ are all cubes.
The fundamental theorem of arithmetic states that every positive integer can be represented in exactly one way apart from rearrangement as a product of prime with positive integer power hence we can express number as prime factorization form as:
$$$b_x = p_0^{f_0} \times p_1^{f_1} \times \dots$$$ for unique prime $$$p_i$$$ and positive integer $$$f_i$$$.
By definition cubefree $$$b_x$$$ we have $$$f_z \in {0, 1, 2}$$$ for all $$$x, z$$$.
[1] If $$$b_x \times b_y$$$ is a cube and you know $$$b_x$$$ then you can easily express $$$b_y$$$ uniquely (based on the theorem above)
[2] We also have $$$b_{12} \times b_3$$$ and $$$b_{12} \times b_4$$$ are all cubes.
From [1] and [2] you can prove that $$$b_3 = b_4$$$.
Without loss of generality we can conclude that for any different $$$4$$$ numbers you selected $$$a_x, a_y, a_z, a_t$$$.
Then you can easily prove that $$$b_x = b_y = b_z = b_t$$$ for $$$b_p$$$ is cubefree part of $$$a_p$$$
But still, you can apply the same idea used in extra task B to achive better complexity.
Instead of throwing a bunch of math using weird formulas with long chain of theorems and proving stuffs.
We can see the algorithm in the other way then come to the formula.
We have discussed that in extra task B we count things for squarefree.
So we just make a squarefree sieve, and for each fixed squarefree $$$u$$$ we count the number of ways to select $$$p^2$$$.
In order to make it faster, we just reverse the loop, using combinatorics to count $$$u$$$ for each fixed $$$p^2$$$.
For those $$$p > \sqrt{n}$$$ or you can say $$$p^2 > n$$$, the answer is obviously $$$0$$$.
So the complexity is depended on how you count $$$u$$$ for each $$$p^2$$$.
We are discussing that in extra task G we count things for cubefree.
So we just make a cubefree sieve, and for each fixed cubefree $$$u$$$ we count the number of ways to select $$$p^3$$$.
In order to make it faster, we just reverse the loop, using combinatorics to count $$$u$$$ for each fixed $$$p^3$$$.
For those $$$p > \sqrt[3]{n}$$$ or you can say $$$p^3 > n$$$, the answer is obviously $$$0$$$.
So the complexity is depended on how you count $$$u$$$ for each $$$p^3$$$.
So now we come for the formula:
Let $$$f_k^3(n)$$$ is the number of set $$$(a_1, a_2, \dots, a_k, n)$$$ that $$$1 \leq a_1 < a_2 < \dots < a_k \leq n$$$ and $$$(a_1, a_2, \dots, a_k, n)$$$ is a $$$(k+1)$$$-term geometric progression in 3 dimensional space.
Let $$$g_k^3(n)$$$ is the number of set $$$(a_1, a_2, \dots, a_k, n)$$$ that $$$1 \leq a_1 \leq a_2 \leq \dots \leq a_k \leq n$$$ and $$$(a_1, a_2, \dots, a_k, n)$$$ is a $$$(k+1)$$$-term geometric progression in 3 dimensional space.
Let $$$F_k^3(n)$$$ once again, is our answer.
Let $$$s_k^3(n)$$$ is the number of way to choose $$$p^3$$$ among those $$$k$$$ numbers when you fix cubefree $$$u$$$ (though we are doing in reverse).
What is the differences when you count cubefree instead of squarefree ?
$$$s_k^3(n) = C^{n-1}_{k-1}$$$.
Yes, just that simple.
It just count the number of way to select set of size $$$k$$$ among those possibility of choosing $$$p^3$$$.
The formula is not depended by squarefree, cubefree stuffs but only the parameters changes (the "$$$n$$$") ins the formula.
$$$F_k^3(p) = \overset{n}{\underset{p=1}{\Large \Sigma}} f_k^3(p)$$$
Just the same way they does, summarize those $$$f_k^3(p)$$$ means you make every possibilities to fix $$$p^3$$$ among those you can fix.
The calculation is not overlap each other, as for each $$$f_k^3(p)$$$, the sequence is unique to each other as its alway end at $$$p$$$.
$$$f_k^3(n) = F_k^3(n) - F_k^3(n-1) = s_k^3(g_k^3(n)) = \underset{d^3 | n}{\Large \Sigma} (\mu \times s_k^3)(d) = \underset{d^3 | n}{\Large \Sigma} \underset{p | d}{\Large \Sigma} \mu(p) \times s_k^3\left(\frac{d}{p}\right)$$$
Well, the formula isnt that much difference.
The only thing changed is that we try to fix $$$d^3$$$ instead of $$$d^2$$$.
The $$$\mu(n)$$$ function is not depended on squarefree or cubefree, so it shouldnt be in higher dimensional Möbius function.
Even if it is expected to somewhat related to calculate squarefree, it is not how we use in the formula as it just appear when you swap two for loop using dirichlet convolution.
$$$F_k^3(n) = \overset{n}{\underset{p=1}{\Large \Sigma}} f_k^3(p)$$$
$$$= F_k^3(0) + \overset{\left \lfloor \sqrt[3]{n} \right \rfloor}{\underset{d = 1}{\Large \Sigma}}(\mu \times s_k^3)(d) \times \left \lfloor \frac{n}{d^3} \right \rfloor$$$
$$$= \overset{\left \lfloor \sqrt[3]{n} \right \rfloor}{\underset{d = 1}{\Large \Sigma}} \left \lfloor \frac{n}{d^3} \right \rfloor \times \left ( \underset{p | d}{\Large \Sigma} \mu(p) \times s_k^3\left(\frac{d}{p}\right) \right)$$$
So now we got the formula, just apply it to the problem
You can also reduce $$$O(\log n)$$$ factor to $$$O(\log \log n)$$$ too.
Yet in task $$$k = 3$$$ things got differently.
By the theorem used above that if $$$b_x \times b_y$$$ is a cube and $$$b_x, b_y$$$ are cubefree numbers. Then if you know $$$b_x$$$ you can always find one and only one $$$b_y$$$.
Let $$$a_x = b_x \times c_x^3$$$ for $$$b_x$$$ cubefree and $$$c_x$$$ is some integer.
The fundamental theorem of arithmetic states that every positive integer can be represented in exactly one way apart from rearrangement as a product of prime with positive integer power hence we can express number as prime factorization form as:
$$$b_x = p_0^{f_0} \times p_1^{f_1} \times \dots$$$ for unique prime $$$p_i$$$ and positive integer $$$f_i$$$.
By definition cubefree $$$b_x$$$ we have $$$f_z \in {0, 1, 2}$$$ for all $$$x, z$$$.
If $$$b_x \times b_y$$$ is a cube and you know $$$b_x$$$ then you can easily express $$$b_y$$$ uniquely (based on the theorem above)
So instead of a brute-forces, you can fix $$$2$$$ numbers and then you factorize it to the form of cubefree number.
Yet you can know the number of ways to select $$$b_x \times b_y$$$ with precalculation.
Then you can just iterate all over $$$b_z$$$ and count the total of ways to select $$$b_x \times b_y\times b_z$$$.
Since they are ordered, you must becareful, you can update new $$$b_x \times b_y (x < y)$$$ exists whenever your $$$z > y$$$.
Implementation
int cnt[LIM];
vector<int> appear[LIM];
int solveG(int n, int k)
{
linear_sieve(n * n);
for (int i = 1; i <= n; ++i)
appear[i].clear();
for (int i = 1; i <= n; ++i)
{
for (int j = i + 1; j <= n; ++j)
{
int u = 1;
for (int x = i * j; x > 1; )
{
int a = lpf[x], b = a * a * a;
for (; x % b == 0; x /= b);
for (; x % a == 0; x /= a) u *= a;
}
appear[j + 1].push_back(u);
}
}
long long res = 0;
memset(cnt, 0, sizeof(cnt[0]) * (n * n + 1));
for (int i = 1; i <= n; ++i)
{
for (int u : appear[i]) ++cnt[u];
int v = 1;
for (int x = i; x > 1; )
{
int a = lpf[x], b = a * a * a;
for (; x % b == 0; x /= b);
if (x % (a * a) == 0)
{
x /= a * a;
v *= a;
}
else if (x % a == 0)
{
x /= a;
v *= a * a;
}
}
res += cnt[v];
}
res %= MOD;
return res;
}
bool is_cubefree[LIM + 10];
int solve(int n, int k)
{
if (k == 0) return 0;
if (k == 1) return n;
if (k == 2) return (1LL * n * (n - 1) / 2) % MOD;
if (k == 3) return solveG(n, k);
memset(is_cubefree, true, sizeof(is_cubefree[0]) * (n + 1));
precal_nck(n);
long long res = 0;
for (int i = 1, j; i <= n; ++i) if (is_cubefree[i])
{
for (j = 1; i * j * j * j<= n; ++j)
{
is_cubefree[i * j * j * j] = false;
}
res += nck(j - 1, k);
}
res %= MOD;
return res;
}
vector<int> divisor[LIM];
int solveG(int n, int k)
{
linear_sieve(n);
for (int i = 1; i <= n; ++i)
divisor[i].clear();
for (int j = 1; j <= n; ++j)
{
int x = 1;
for (int t = j; t > 1; )
{
if (x > n) break;
int a = lpf[t];
ll b = 1LL * a * a * a;
while (t % b == 0)
{
t /= b;
if (1LL * x * a > n) goto skip;
x *= a;
}
if (t % a == 0)
{
if (1LL * x * a > n) goto skip;
x *= a;
do t /= a; while (t % a == 0);
}
}
for (int i = x; i <= n; i += x)
divisor[i].push_back(j);
skip:{};
}
int res = 0;
for (int i = 1; i <= n; ++i)
{
ll t = 1LL * i * i * i;
for (int x = 0; x < divisor[i].size(); ++x)
{
for (int y = x + 1; y < divisor[i].size(); ++y)
{
int a = divisor[i][x];
int b = divisor[i][y];
int c = t / (a * b);
res += (n >= c && c > b && 1LL * a * b * c == 1LL * i * i * i);
}
}
}
return res;
}
int res[LIM + 10];
int solve(int n, int k)
{
if (k == 0) return 0;
if (k == 1) return n;
if (k == 2) return (1LL * n * (n - 1) / 2) % MOD;
if (k == 3) return solveG(n, k);
int t = ceil(cbrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t);
precal_div(t);
long long res = 0;
for (int d = 1; d * d * d <= n; ++d) /// For each fixed p^2
{
long long sum = 0;
for (int p : divisors[d]) /// For each (p | d)
sum += mu[d / p] * nck(p - 1, k - 1);
sum %= MOD;
res += sum * (n / (d * d * d));
}
res %= MOD;
}
int ceil_sqrt(ll x)
{
int t = sqrt(x);
while (1LL * t * t > x) --t;
while (1LL * t * t < x) ++t;
return t;
}
#define sz(x) int((x).size())
#define all(x) (x).begin(), (x).end()
#define rall(x) (x).rbegin(), (x).rend()
#define lb(x, v) lower_bound(all(x), v) - (x).begin()
#define ub(x, v) upper_bound(all(x), v) - (x).begin()
ll scm[LIM];
vector<int> divisor[LIM];
int solveG(int n, int k)
{
linear_sieve(n);
fill_n(scm, n + 1, 1);
for (int i = 1; i <= n; ++i)
divisor[i].clear();
for (int p : prime)
{
ll q = 1LL * p * p * p;
for (int t = p; ; t *= q)
{
for (int i = t; i <= n; i += t)
scm[i] = (scm[i] > n / p) ? n + 1 : scm[i] * p;
if (1LL * t > n / q) break;
}
}
for (int j = 1; j <= n; ++j)
for (int i = scm[j]; i <= n; i += scm[j])
divisor[i].push_back(j);
int res = 0;
for (int i = 1; i <= n; ++i)
{
ll t = 1LL * i * i * i;
for (int x = 0; x < divisor[i].size(); ++x)
{
int a = divisor[i][x];
int v = ceil_sqrt(t / a);
int y0 = (divisor[i].back() < v) ? sz(divisor[i]) : lb(divisor[i], v);
for (int y = y0 - 1; y > x; --y)
{
int b = divisor[i][y];
int c = t / (a * b);
if (c > n) break;
res += (1LL * a * b * c == t);
}
}
}
return res;
}
int res[LIM + 10];
int solve(int n, int k)
{
if (k == 0) return 0;
if (k == 1) return n;
if (k == 2) return (1LL * n * (n - 1) / 2) % MOD;
if (k == 3) return solveG(n, k);
int t = ceil(sqrt(n) + 1) + 1;
linear_sieve(t);
precal_nck(t);
memset(res, 0, sizeof(res[0]) * (t + 1));
for (int d = 1; d * d * d <= n; ++d)
res[d] = nck(d - 1, k - 1);
for (int p : prime)
for (int d = t / p; d > 0; --d)
res[d * p] -= res[d];
long long ans = 0;
for (int d = 1; d * d * d <= n; ++d)
ans += res[d] * (n / (d * d * d));
ans %= MOD;
return ans;
}
Complexity
The complexity is obviously $$$O(1)$$$ and we cant optimize further more lol.
In the first implementation, we use a cubefree sieve in $$$O(n)$$$ and precalculatings for bionomial coefficients is also $$$O(n)$$$.
In the second and third implementation. We just apply the idea we used in task B.
In the first implementation as you must using factorization, the cost is $$$O(\log n)$$$ for each number, hence you got that complexity.
You can reduce the $$$O(\log n)$$$ part to $$$O(\log \log n)$$$.
You can remove that $$$O(\log \log n)$$$ part too.
If the problem is mixed with task J, then it might be able to improve furthur more, about $$$O(n \sqrt n)$$$ or even linear.
No papers in Russian, Chinese, English related to this task has been found.
Not a single useful sequence related to this sequence $$$(k = q = 3)$$$ has been found.
Yet this is very hard and specific problem related deep into number theory even with tricks related GEF, DFT, FFT, NTT as recommended.
Yet for the second implementations things gone crazy.
The complexity is $$$O\left( \underset{p=1}{\Large \Sigma} \left \lfloor \frac{n}{f(p)} \right \rfloor \right)$$$ for $$$f(p) = min(x \times p | x \times p \text{ is cube and } x \in \mathbb{N}^*)$$$
Ignore the fact that having $$$O(\log x)$$$ of factorizing but can be optimized to ignore that the part
The complexity is $$$O\left( \underset{p=1}{\Large \Sigma} d(p)^2 \right)$$$ for $$$d(p)$$$ is the number of divisors of $$$p^3$$$ that in $$$[1, n]$$$.
Though it might be possible to improve this from $$$O(d(p)^2)$$$ to $$$O(d(p)^{1 + ε})$$$
It has unprovable complexity, though I tried to search for papers, and blogs, even with the help of some GMs I cant find nothing good enough to claim its real complexity.
Yet it might be not the real complexity but also kind of an illusion assumptioning high constant and faking its real higher complexity.
I am making the assumption that the running time is $$$(a\times n^b)$$$ for some constant $$$a, b$$$ as $$$n$$$ approach $$$10^8$$$.
Tested for thousands of samples upto $$$n = 10^{8}$$$, using Power Progression Function Algorithm, the power $$$b$$$ seems to be reduced for larger and larger $$$n$$$ as it approach $$$0.5$$$.
Yet for samples upto $$$n = 10^8$$$ it seems to be $$$(225.014140682692 * x^{0.590337564727864})$$$ but I am not really sure.
It seems possible to be $$$(a \times \sqrt{n})$$$ running time with heavy optimization and far deep into the world of number theory to prove.
Needed significant more time and work though.
And the third implementation is also hard to prove
$$$O\left(\underset{\text{prime } p \leq n}{\Large \Sigma}\ \ \underset{p^{3u+1} \leq n^3}{\Large \Sigma} \left \lfloor \frac{n^3}{p^{3u+1}} \right \rfloor \right)$$$
$$$O\left(\overset{n}{\underset{p = 1}{\Large \Sigma}} \underset{a | p^3}{\Large \Sigma} \left(g(a) + \log\left(\frac{p^3}{a}\right)\right) \right)$$$ for $$$g(a) \leq d(a)$$$ is the number of $$$b \times c$$$ that $$$(b|p^3)$$$ and $$$(a < b \leq c \leq n)$$$
Contribution
Yurushia for pointing out the linear complexity of squarefree sieve.
clyring for fixing typos, and the approach for tasks A, B, C, D, E, G, H, J.
errorgorn for adding details, and the approach for task F, J, M, O, better complexity for C, E, G.
cuom1999 for participating $$$O(n^2)$$$ approach for problem G.
vinfat for participating approach related to factorize $$$p^3$$$ into $$$3$$$ product partions in problem G though failed to achive better complexity (editted: confirmed that the complexity seems to be better now).
Lihwy, jalsol for combinatorics calculation and the proof of stars and bars in task C.
Editorial Slayers Team Monarcle DeMen100ns Duy_e OnionEgg KasenIbaraki FireGhost Shironi for reviewing, fixing typos and feed backs.