I have been exploring the idea of computing Binomial Coefficients $$$\binom{n}{k}\mod m$$$ where $$$n$$$,$$$k$$$ and $$$m$$$ are all 64 bit integers.ie $$$m$$$ is any integer rather than being a prime. I have found few solutions here where they first factorise $$$m$$$ into $$$p_1^{e_1}p_2^{e_2}...p_t^{e_t}$$$ and then proceed for each $$$p_i^k$$$ in a manner similar to this method on math.stackexchange To elborate:
- Express as $$$(\binom{n}{k}\mod p_1^{e_1})\mod p_2^{e_2})$$$ ... $$$\mod p_t^{e_t})$$$
- then strip off all the $$$ p_i $$$ and its epponents from n!, k! and (n-k)!
- and compute $$$netexpo_i =$$$ total exponents of $$$p_i$$$ in n! — total exponents of $$$p_i$$$ in (n-k)! — total exponents of $$$p_i$$$ in k!
- if $$$netexpo_i \ge$$$ total exponents of $$$p_i$$$ in $$$m!$$$ then answer is $$$0$$$, if not proceed
- You Repeat step 2-4 for all $$$p_i$$$ in $$$m$$$.
- and then result is $$$ n! * ModularInverse((n — k)!) * ModularInverse(k!) * (p_1)^{netexpo_1}* (p_2)^{netexpo_2}... (p_t)^{netexpo_t}$$$
- Details of this method for each $$$(\binom{n}{k}\mod p_1^{e_1})$$$ have been discussed on math.stackexchange (at least for cases when each $$$p_i$$$ is 32 bit integer).
However among the solutions on yosupo none of the ones that I checked scale for large 64 bit $$$m$$$. Just changing the data type won't fix as it overshoots memory even when the largest prime factor of $$$m$$$ is 32-bit. So there can be two parts to the problem:
- m is 64-bit but all of its prime factors are 32-bit
- m is 64-bit and one of its prime factor is 64-bit
None of the solutions submitted on yosupo addresses either. My understanding is if they are implementing the algorithm similar to the discussion on math.stackexchange just scaling up datatype should work for the first part (ie $$$m$$$ is 64-bit but all of its prime factors are 32-bit). But to my surprise I found even in those cases it overshoots memory(or perhaps my edited code was buggy).
Any suggestions on how we can modify the code to scale to 64 bit for each of $$$n,k,m $$$ assuming that the largest prime factor of $$$m$$$ is within 32-bit(ie the fist part) I should call it out upfront that I have already gone through https://codeforces.net/blog/entry/65178 , https://codeforces.net/blog/entry/12878 , https://codeforces.net/blog/entry/55311 and https://codeforces.net/blog/entry/53039 .
To summarise all the ideas:
- $$$O(p^2)$$$ Time per Query, $$$O(1)$$$ Memory — initial answer by Aryaman Maithani on math.stackexchange
- $$$O(p)$$$ Time per Query, $$$O(1)$$$ Memory — proved using polynomial that in a subgroup of size $$$p$$$ product of only the constant term remains and other terms vanish . ie $$$\prod_{i=pk+1}^{p(k+1)-1}(p^2+i) \mod p^2 = (p-1)! \mod p^2 $$$. — updated answer by Aryaman Maithani on math.stackexchange
- $$$O(p)$$$ Time preprocessing, $$$O(p^{1/2})$$$ Time per Query, $$$O(p^{1/2})$$$ Memory — idea is to break each block of $$$p$$$ into sub-blocks of size $$$p^{1/2}$$$ each and then preprocess for each blocks of $$$O(p^{1/2})$$$. This idea works because $$$\prod_{i=1}^{t}(p^2+p+i) \mod p^2 = \prod_{i=1}^{t}(i) \mod p^2 $$$. The method is useful when we can't afford $$$O(p)$$$ memory like the following two methods.
- $$$O(p)$$$ Time preprocessing, $$$O(log_{p}{n})$$$ Time per Query, $$$O(p)$$$ Memory — Similar to Granville's idea. Basically the above 3 are also implementation of Granville's but with lesser degree of memorization that what Granville proposed.
- $$$O(p)$$$ Time preprocessing, $$$O(log_2{n})$$$ Time per Query, $$$O(p)$$$ Memory — Similar to Min25's idea, it uses as Stirling number of the first kind and Lagrange's interpolation to implement it. There are few solutions on yosupo use Interpolation/NTT/FFT or some kind of domain transformation, I would put them in this category.
Where $$$p$$$ is the largest prime factor of $$$m$$$ and assing it has $$$e_{i}=2$$$ or lesser