Addendum: Optimized variant of Barrett reduction + proof re: ultimate NTT article

Revision en17, by Spheniscine, 2020-03-31 12:38:10

This is an extension of my article proposing the "ultimate" NTT. Here I will present a simplified and optimized variant of the Barrett reduction algorithm on that page. As it relaxes certain bounds of the standard parameters of the algorithm, I will also present a proof of the correctness of the variant, at least for this specific case.

First, the algorithm:

function mulMod(a: Long, b: Long): Long {
    xh = multiplyHighUnsigned(a, b) // high word of product
    xl = a * b // low word of product

    g = multiplyHighUnsigned( ((xh << 2) | (xl >>> 62)) , BARR_R)

    t = xl - g * MOD - MOD
    t += (t >> 63) & MOD
    return t
}

Notice that the middle routine has been greatly simplified; instead of computing the high and middle words of $$$xr$$$, we simply take the top 64 bits of $$$x$$$, in effect computing $$$\displaystyle \left\lfloor \left\lfloor \frac{x}{2^{62}} \right\rfloor \frac{r}{2^{64}} \right\rfloor$$$ in place of $$$\displaystyle \left\lfloor \frac {xr}{4^{63}} \right\rfloor$$$. Experimentally, this seems to work (and indeed, this variant is how Barrett reduction is normally implemented for multi-word cases), however a naive adjustment of the bounds of the Barrett reduction algorithm would relax the pre-normalization bound of $$$t$$$ to $$$[-m, 2m)$$$. This is bad news, as there is a possibility of overflowing into the sign bit, thus forcing the use of a modulus one bit shorter (e.g. $$$4611686018326724609$$$, $$$g = 3$$$) so that we could easily correct the result with two normalization rounds.

So the following is a proof of correctness for this relaxed algorithm, at least for the very specific use case in the described NTT:

On the Project Nayuki article on Barrett reduction (note that I will use $$$n$$$ instead of $$$m$$$ for the modulus in accordance to the notation on that page), one important lemma is the inequality

$$$\displaystyle\frac{x}{n} - 1 < \frac{xr}{4^k} ≤ \frac{x}{n} \implies \frac{x}{n} - 2 < \left\lfloor \frac{x}{n} - 1 \right\rfloor ≤ \left\lfloor \frac{xr}{4^k} \right\rfloor ≤ \frac{x}{n}$$$

We shall show that this inequality holds even if $$$\dfrac{xr}{4^k}$$$ is replaced by $$$\dfrac{(x-\delta) r}{4^k}$$$, with $$$0 ≤ \delta < 2^{62}$$$, the bits that have been omitted.

First we need to obtain a tighter bound on $$$r$$$. Given that $$$r$$$, $$$k$$$, and $$$n$$$ are fixed in our algorithm, we can verify that $$$\dfrac {4^k} n - r < 2^{-9}$$$. Let this value be $$$\varepsilon$$$. We thus obtain $$$\displaystyle \frac{4^k}{n} - \varepsilon < r < \frac{4^k}{n}$$$

Multiply by $$$x-\delta \geq 0$$$: $$$\displaystyle (x-\delta)\left(\frac{4^k}{n} - \varepsilon\right) ≤ (x-\delta)r ≤ (x-\delta)\frac{4^k}{n}$$$

Given that $$$\delta$$$ is nonnegative we can relax the right bound to $$$x\dfrac{4^k}{n}$$$.

Divide by $$$4^k$$$: $$$\displaystyle (x-\delta)\left(\frac{1}{n} - \frac \varepsilon {4^k}\right) ≤ \frac {(x-\delta)r}{4^k} ≤ \frac{x}{n}$$$

Recompose the leftmost expression: $$$\displaystyle \frac x n - \left(\frac {\varepsilon x} {4^k} + \frac \delta n\right) + \frac{\delta\varepsilon} {4^k} ≤ \frac {(x-\delta)r}{4^k} ≤ \frac{x}{n}$$$

$$$\displaystyle\frac{\delta\varepsilon} {4^k} \geq 0$$$, so relax the bound: $$$\displaystyle \frac x n - \left(\frac {\varepsilon x} {4^k} + \frac \delta n\right) ≤ \frac {(x-\delta)r}{4^k} ≤ \frac{x}{n}$$$

$$$x < n^2 < 4^k \implies \dfrac{x}{4^k} < 1$$$, so further relax the bound: $$$\displaystyle \frac x n - \left({\varepsilon} + \frac \delta n\right) < \frac {(x-\delta)r}{4^k} ≤ \frac{x}{n}$$$

$$$\delta < 2^{62}$$$ while $$$n$$$ is very slightly below $$$2^{63}$$$, so it can be verified that $$$\dfrac \delta n$$$ must be $$$< \dfrac34$$$.

$$$\dfrac34 + \varepsilon < 1$$$, therefore $$$\displaystyle \frac x n - 1 < \frac {(x-\delta)r}{4^k} ≤ \frac{x}{n}$$$

We can thus follow the rest of the proof in the Nayuki article to prove that our $$$t$$$ is in $$$[-n, n)$$$.

Samples (Kotlin, so YMMV on benchmarks for other languages)

74853038 for 993E - Никита и порядковая статистика, 328 ms faster

74852214 for 986D - Идеальное кодирование, 452 ms faster

Addendum

There are other applications of this proof besides NTT. For example, I found a safe prime $$$9223372036854771239$$$, which is a good property for a rolling-hash modulus to have, as any number in $$$[2, m-2]$$$ will have multiplicative order at least $$$\dfrac{m-1}{2}$$$. It seems that $$$\varepsilon$$$ and $$$\dfrac\delta n - \dfrac 1 2$$$ will both be rather small for moduli so close below a power of 2.

Tags fft, ntt, proof

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en18 English Spheniscine 2020-05-20 04:26:03 2 remove redundant parentheses
en17 English Spheniscine 2020-03-31 12:38:10 495 Tiny change: 'her small as these primes are so close ' -> 'her small for moduli so close '
en16 English Spheniscine 2020-03-31 11:49:36 184
en15 English Spheniscine 2020-03-31 10:26:25 167
en14 English Spheniscine 2020-03-31 10:21:20 20
en13 English Spheniscine 2020-03-31 09:12:27 4 Tiny change: 'laced by $g = \dfrac{(x-' -> 'laced by $\dfrac{(x-'
en12 English Spheniscine 2020-03-31 08:54:58 105 Tiny change: '$t$ to $[-n, 2n)$. This i' -> '$t$ to $[-m, 2m)$. This i'
en11 English Spheniscine 2020-03-31 08:47:39 17 Tiny change: 'relax the final bounds of $t$ to' -> 'relax the pre-normalization bound of $t$ to'
en10 English Spheniscine 2020-03-31 08:46:44 74
en9 English Spheniscine 2020-03-31 08:44:52 97
en8 English Spheniscine 2020-03-31 08:43:59 1442 Tiny change: ' < 2^{62}$.\n\nFirst' -> ' < 2^{62}$, the bits that have been omitted.\n\nFirst' (published)
en7 English Spheniscine 2020-03-31 08:26:01 226
en6 English Spheniscine 2020-03-31 08:22:50 105
en5 English Spheniscine 2020-03-31 08:12:26 956 Tiny change: 'c{4^k}{n}$ and forget about it thereafter.' -> 'c{4^k}{n}$.\n\nDivide by $4^k$: '
en4 English Spheniscine 2020-03-31 07:39:29 250
en3 English Spheniscine 2020-03-31 07:35:05 79
en2 English Spheniscine 2020-03-31 07:33:12 211 Tiny change: 'alue be $\epsilon$' -> 'alue be $\varepsilon$.'
en1 English Spheniscine 2020-03-31 07:25:50 565 Initial revision (saved to drafts)