MSVC Compiler intrinsics _umul128 and _udiv128

№	Пользователь	Рейтинг
1	tourist	4009
2	jiangly	3823
3	Benq	3738
4	Radewoosh	3633
5	jqdai0815	3620
6	orzdevinwang	3529
7	ecnerwala	3446
8	Um_nik	3396
9	ksun48	3390
10	gamegame	3386

№	Пользователь	Вклад
1	cry	167
2	Um_nik	163
3	maomao90	162
3	atcoder_official	162
5	adamant	159
6	-is-this-fft-	158
7	awoo	157
8	TheScrasse	154
9	Dominater069	153
9	nor	153

For some algorithms it is necessary to calculate the modulo of the product of two 64-bit integers (rho algorithm, polynomial rolling hash with a 64-bit modulo). This can be done easily on GCC and Clang using the built-in type `__int128`.↵
↵
However, on MSVC this type is not available. [This blog post](https://codeforces.net/blog/entry/96759) explores the options for implementing fast modular multiplication across various platforms. The author presents several approaches to modular multiplication, but only two of those actually work properly on the MSVC compiler. Those implementations are fine, but one of them is somewhat slow while the other uses a Karatsuba-like approach and I found it too complex to be comfortable with simply copy-pasting it into my template.↵
↵
I did a bit of googling and stumbled across the compiler intrinsics [umul128](https://docs.microsoft.com/en-us/cpp/intrinsics/umul128?view=msvc-170) and [udiv128](https://docs.microsoft.com/en-us/cpp/intrinsics/udiv128?view=msvc-170), which can be used to implement modular multiplication in a way that is fairly straightforward.↵
↵
Note that both __int128 and the MSVC compiler intrinsics are only supported on 64-bit architectures. They don't seem to be available on the Codeforces MSVC compiler, but they work fine for me locally. This is good enough for me as I have a cross platform coding environment and I want my code to work when on Windows machines with the Microsoft compiler.↵
↵
Here is the snippet that I use for modular multiplication. Feel free to share any micro-optimizations you might have.↵
↵
~~~~~↵
template <class T>↵
constexpr int sgn(T x) {↵
    return (x > 0) - (x < 0);↵
}↵
↵
long long mod_mul(long long a, long long b, long long m = MOD) {↵
#ifdef _MSC_VER↵
    unsigned long long x = llabs(a);↵
    unsigned long long y = bllabs(b);↵
    unsigned long long h, l, r;↵
    l = _umul128(x, y, &h);↵
    _udiv128(h, l, m, &r);↵
    return asgn(a) * sgn(b) >= 0 ? r : m - r;↵
#else↵
    long long r = a * static_cast<__int128>(b) % m;↵
    return r >= 0 ? r : r + m;↵
#endif↵
}↵
~~~~~↵
↵

Rev.	Кто	Когда	Δ	Комментарий
en7	JaroPaska	2022-08-28 11:23:44	82
en6	JaroPaska	2022-08-28 11:23:21	2	Tiny change: '128(llabs((a)), llabs(b' -> '128(llabs(a), llabs(b'
en5	JaroPaska	2022-08-28 11:22:56	178
en4	JaroPaska	2022-08-28 11:14:33	109
en3	JaroPaska	2022-08-28 04:19:06	16	Tiny change: 'integers (Pollard Rho, polynomi' -> 'integers (rho algorithm, polynomi'
en2	JaroPaska	2022-08-28 04:15:24	2
en1	JaroPaska	2022-08-28 04:14:38	2020	Initial revision (published)

Rev.

Язык

Кто

Когда

Комментарий

en7