Addendum: Optimized variant of Barrett reduction + proof re: ultimate NTT article

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	157
6	Qingyu	156
7	djm03178	151
7	adamant	151
9	luogu_official	150
10	awoo	147

This is an extension of my article proposing the "ultimate" NTT. Here I will present a simplified and optimized variant of the Barrett reduction algorithm on that page. As it relaxes certain bounds of the standard parameters of the algorithm, I will also present a proof of the correctness of the variant, at least for this specific case.

First, the algorithm:

function mulMod(a: Long, b: Long): Long {
    xh = multiplyHighUnsigned(a, b) // high word of product
    xl = a * b // low word of product

    g = multiplyHighUnsigned( ((xh << 2) | (xl >>> 62)) , BARR_R)

    t = xl - g * MOD - MOD
    t += (t >> 63) & MOD
    return t
}

Notice that the middle routine has been greatly simplified; instead of computing the high and middle words of $$$xr$$$, we simply take the top 64 bits of $$$x$$$, in effect computing $$$\displaystyle \left\lfloor \left\lfloor \frac{x}{2^{62}} \right\rfloor \frac{r}{2^{64}} \right\rfloor$$$ in place of $$$\displaystyle \left\lfloor \frac {xr}{4^{63}} \right\rfloor$$$. Experimentally, this seems to work (and indeed, this variant is how Barrett reduction is normally implemented for multi-word cases), however a naive adjustment of the bounds of the Barrett reduction algorithm would relax the pre-normalization bound of $$$t$$$ to $$$[-m, 2m)$$$. This is bad news, as there is a possibility of overflowing into the sign bit, thus forcing the use of a modulus one bit shorter (e.g. $$$4611686018326724609$$$, $$$g = 3$$$) so that we could easily correct the result with two normalization rounds.

So the following is a proof of correctness for this relaxed algorithm, at least for the very specific use case in the described NTT:

On the Project Nayuki article on Barrett reduction (note that I will use $$$n$$$ instead of $$$m$$$ for the modulus in accordance to the notation on that page), one important lemma is the inequality

$$$\displaystyle\frac{x}{n} - 1 < \frac{xr}{4^k} ≤ \frac{x}{n} \implies \frac{x}{n} - 2 < \left\lfloor \frac{x}{n} - 1 \right\rfloor ≤ \left\lfloor \frac{xr}{4^k} \right\rfloor ≤ \frac{x}{n}$$$

We shall show that this inequality holds even if $$$\dfrac{xr}{4^k}$$$ is replaced by $$$\dfrac{(x-\delta) r}{4^k}$$$, with $$$0 ≤ \delta < 2^{62}$$$, the bits that have been omitted.

First we need to obtain a tighter bound on $$$r$$$. Given that $$$r$$$, $$$k$$$, and $$$n$$$ are fixed in our algorithm, we can verify that $$$\dfrac {4^k} n - r < 2^{-9}$$$. Let this value be $$$\varepsilon$$$. We thus obtain $$$\displaystyle \frac{4^k}{n} - \varepsilon < r < \frac{4^k}{n}$$$

Multiply by $$$x-\delta \geq 0$$$: $$$\displaystyle (x-\delta)\left(\frac{4^k}{n} - \varepsilon\right) ≤ (x-\delta)r ≤ (x-\delta)\frac{4^k}{n}$$$

Given that $$$\delta$$$ is nonnegative we can relax the right bound to $$$x\dfrac{4^k}{n}$$$.

Divide by $$$4^k$$$: $$$\displaystyle (x-\delta)\left(\frac{1}{n} - \frac \varepsilon {4^k}\right) ≤ \frac {(x-\delta)r}{4^k} ≤ \frac{x}{n}$$$

Recompose the leftmost expression: $$$\displaystyle \frac x n - \left(\frac {\varepsilon x} {4^k} + \frac \delta n\right) + \frac{\delta\varepsilon} {4^k} ≤ \frac {(x-\delta)r}{4^k} ≤ \frac{x}{n}$$$

$$$\displaystyle\frac{\delta\varepsilon} {4^k} \geq 0$$$, so relax the bound: $$$\displaystyle \frac x n - \left(\frac {\varepsilon x} {4^k} + \frac \delta n\right) ≤ \frac {(x-\delta)r}{4^k} ≤ \frac{x}{n}$$$

$$$x < n^2 < 4^k \implies \dfrac{x}{4^k} < 1$$$, so further relax the bound: $$$\displaystyle \frac x n - \left({\varepsilon} + \frac \delta n\right) < \frac {(x-\delta)r}{4^k} ≤ \frac{x}{n}$$$

$$$\delta < 2^{62}$$$ while $$$n$$$ is very slightly below $$$2^{63}$$$, so it can be verified that $$$\dfrac \delta n$$$ must be $$$< \dfrac34$$$.

$$$\dfrac34 + \varepsilon < 1$$$, therefore $$$\displaystyle \frac x n - 1 < \frac {(x-\delta)r}{4^k} ≤ \frac{x}{n}$$$

We can thus follow the rest of the proof in the Nayuki article to prove that our $$$t$$$ is in $$$[-n, n)$$$.

Samples (Kotlin, so YMMV on benchmarks for other languages)

74853038 for 993E - Nikita and Order Statistics, 328 ms faster

74852214 for 986D - Perfect Encoding, 452 ms faster

Addendum

There are other applications of this proof besides NTT. For example, I found a safe prime $$$9223372036854771239$$$, which is a good property for a rolling-hash modulus to have, as any number in $$$[2, m-2]$$$ will have multiplicative order at least $$$\dfrac{m-1}{2}$$$. It seems that $$$\varepsilon$$$ and $$$\dfrac\delta n - \dfrac 1 2$$$ will both be rather small for moduli so close below a power of 2.

Rev.	By	When	Δ	Comment
en18	Spheniscine	2020-05-20 04:26:03	2	remove redundant parentheses
en17	Spheniscine	2020-03-31 12:38:10	495	Tiny change: 'her small as these primes are so close ' -> 'her small for moduli so close '
en16	Spheniscine	2020-03-31 11:49:36	184
en15	Spheniscine	2020-03-31 10:26:25	167
en14	Spheniscine	2020-03-31 10:21:20	20
en13	Spheniscine	2020-03-31 09:12:27	4	Tiny change: 'laced by $g = \dfrac{(x-' -> 'laced by $\dfrac{(x-'
en12	Spheniscine	2020-03-31 08:54:58	105	Tiny change: '$t$ to $[-n, 2n)$. This i' -> '$t$ to $[-m, 2m)$. This i'
en11	Spheniscine	2020-03-31 08:47:39	17	Tiny change: 'relax the final bounds of $t$ to' -> 'relax the pre-normalization bound of $t$ to'
en10	Spheniscine	2020-03-31 08:46:44	74
en9	Spheniscine	2020-03-31 08:44:52	97
en8	Spheniscine	2020-03-31 08:43:59	1442	Tiny change: ' < 2^{62}$.\n\nFirst' -> ' < 2^{62}$, the bits that have been omitted.\n\nFirst' (published)
en7	Spheniscine	2020-03-31 08:26:01	226
en6	Spheniscine	2020-03-31 08:22:50	105
en5	Spheniscine	2020-03-31 08:12:26	956	Tiny change: 'c{4^k}{n}$ and forget about it thereafter.' -> 'c{4^k}{n}$.\n\nDivide by $4^k$: '
en4	Spheniscine	2020-03-31 07:39:29	250
en3	Spheniscine	2020-03-31 07:35:05	79
en2	Spheniscine	2020-03-31 07:33:12	211	Tiny change: 'alue be $\epsilon$' -> 'alue be $\varepsilon$.'
en1	Spheniscine	2020-03-31 07:25:50	565	Initial revision (saved to drafts)

Samples (Kotlin, so YMMV on benchmarks for other languages)

Addendum

History