Блог пользователя oToToT

Автор oToToT, история, 2 года назад, По-английски

TL; DR: #include <ext/random> and use __gnu_cxx::sfmt19937 just as std::mt19937.

Hi all,

I'm here to share a thing that seems not that well-known in the whole community.

There is a SIMD-oriented Fast Mersenne Twister (SFMT) implementation in GCC's library.
As its name suggested, it's a faster MT-like PRNG implementation based on SIMD.
SFMT is not exactly MT, but it shares some similarities and is much faster than the traditional MT.
If you are interested in more details about it, you could check this paper.

I test the two codes below in the Codeforces' custom test using GNU G++20 11.2.0 (64 bit, winlibs).
The result looks good.

sfmt19937 (998 ms)
mt19937 (3416 ms)

p.s. there's also __gnu_cxx::sfmt19937_64 if you want a 64-bit PRNG.

  • Проголосовать: нравится
  • +84
  • Проголосовать: не нравится

»
2 года назад, # |
Rev. 2   Проголосовать: нравится +32 Проголосовать: не нравится

std::mt19937 is well-known to be slow, so I had implemented a couple of much faster PRNGs here. Replacing x = rnd() with x ^= rnd() and printing x (since somehow the compiler can optimize away the wyhash calls) shows that Wyhash is the fastest (at 686ms), followed by Lehmer (at 858ms), compared to SFMT (at 920ms).

  • »
    »
    2 года назад, # ^ |
    Rev. 3   Проголосовать: нравится 0 Проголосовать: не нравится

    Thanks for these cool PRNG information :)

    Also, if pragma is used, SFMT (~0.87s) can have a similar performance than Lehmer w/o pragma.

    pragma I tested
    • »
      »
      »
      2 года назад, # ^ |
        Проголосовать: нравится +18 Проголосовать: не нравится

      Surprisingly, when we add these pragmas, Wyhash (technically Wyrand) becomes a bit slower (~900ms).

      Just for some more trivia, MT and SFMT both fail certain tests in TestU01 benchmarks (more information here). I am not sure about 32-bit Lehmer, but 64-bit Lehmer passes them, and so does Wyrand (32-bit as well as 64-bit). However, for usual purposes, it seems like SFMT shouldn't pose issues (at least it is strictly better than MT in most aspects).

      • »
        »
        »
        »
        2 года назад, # ^ |
        Rev. 2   Проголосовать: нравится +17 Проголосовать: не нравится

        Personally, I have been thinking of which PRNG to implement (to meet the requirements of C++ RNGs) as a practice. PCG, Splitmix, Xorshift are the simpler ones that I want to implement, but PCG already has one implemented in their library, and Splitmix has one too (Though it only qualifies as a UniformRandomBitGenerator, not a RandomNumberGenerator). I'm fine (as I should be) at implementing Xorshift, but I am well aware of its flaws. Which RNG would you recommend me to implement as a practice?

        UPD: Lehmer64 seems to fit the conditions I wanted, thanks for mentioning it!

  • »
    »
    2 года назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    You are not allowed to view the requested page.

    • »
      »
      »
      2 года назад, # ^ |
      Rev. 2   Проголосовать: нравится +19 Проголосовать: не нравится

      My bad, here's the code I used for benchmarking instead.

      Spoiler

      You would need to change the std::cerr to std::cout for printing the time in custom invocation.

»
2 года назад, # |
Rev. 2   Проголосовать: нравится -35 Проголосовать: не нравится

.

»
2 года назад, # |
  Проголосовать: нравится +3 Проголосовать: не нравится

Out of curiosity, does it have any use cases in competitive programming? I mean, it is good to know and probably to have in a prewritten library, but have anyone ever encountered the need of speeding up the random calls in a competition?

  • »
    »
    2 года назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    I had onced faced a TLE in an ICPC-style competition when I tried to squeeze treaps, and I barely got AC using a simple linear congruential generator. But yeah, it wasn't intended, and I don't think this really matters as much.

  • »
    »
    2 года назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Chances that the official solution only requires some simple data structure while I'm too stupid to solve it with treap. In that case, I might need a faster PRNG to fit my code into the TL.