oToToT's blog

By oToToT, history, 2 years ago, In English

TL; DR: #include <ext/random> and use __gnu_cxx::sfmt19937 just as std::mt19937.

Hi all,

I'm here to share a thing that seems not that well-known in the whole community.

There is a SIMD-oriented Fast Mersenne Twister (SFMT) implementation in GCC's library.
As its name suggested, it's a faster MT-like PRNG implementation based on SIMD.
SFMT is not exactly MT, but it shares some similarities and is much faster than the traditional MT.
If you are interested in more details about it, you could check this paper.

I test the two codes below in the Codeforces' custom test using GNU G++20 11.2.0 (64 bit, winlibs).
The result looks good.

sfmt19937 (998 ms)
mt19937 (3416 ms)

p.s. there's also __gnu_cxx::sfmt19937_64 if you want a 64-bit PRNG.

  • Vote: I like it
  • +84
  • Vote: I do not like it

»
2 years ago, # |
Rev. 2   Vote: I like it +32 Vote: I do not like it

std::mt19937 is well-known to be slow, so I had implemented a couple of much faster PRNGs here. Replacing x = rnd() with x ^= rnd() and printing x (since somehow the compiler can optimize away the wyhash calls) shows that Wyhash is the fastest (at 686ms), followed by Lehmer (at 858ms), compared to SFMT (at 920ms).

  • »
    »
    2 years ago, # ^ |
    Rev. 3   Vote: I like it 0 Vote: I do not like it

    Thanks for these cool PRNG information :)

    Also, if pragma is used, SFMT (~0.87s) can have a similar performance than Lehmer w/o pragma.

    pragma I tested
    • »
      »
      »
      2 years ago, # ^ |
        Vote: I like it +18 Vote: I do not like it

      Surprisingly, when we add these pragmas, Wyhash (technically Wyrand) becomes a bit slower (~900ms).

      Just for some more trivia, MT and SFMT both fail certain tests in TestU01 benchmarks (more information here). I am not sure about 32-bit Lehmer, but 64-bit Lehmer passes them, and so does Wyrand (32-bit as well as 64-bit). However, for usual purposes, it seems like SFMT shouldn't pose issues (at least it is strictly better than MT in most aspects).

      • »
        »
        »
        »
        2 years ago, # ^ |
        Rev. 2   Vote: I like it +17 Vote: I do not like it

        Personally, I have been thinking of which PRNG to implement (to meet the requirements of C++ RNGs) as a practice. PCG, Splitmix, Xorshift are the simpler ones that I want to implement, but PCG already has one implemented in their library, and Splitmix has one too (Though it only qualifies as a UniformRandomBitGenerator, not a RandomNumberGenerator). I'm fine (as I should be) at implementing Xorshift, but I am well aware of its flaws. Which RNG would you recommend me to implement as a practice?

        UPD: Lehmer64 seems to fit the conditions I wanted, thanks for mentioning it!

  • »
    »
    2 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    You are not allowed to view the requested page.

    • »
      »
      »
      2 years ago, # ^ |
      Rev. 2   Vote: I like it +19 Vote: I do not like it

      My bad, here's the code I used for benchmarking instead.

      Spoiler

      You would need to change the std::cerr to std::cout for printing the time in custom invocation.

»
2 years ago, # |
Rev. 2   Vote: I like it -35 Vote: I do not like it

.

»
2 years ago, # |
  Vote: I like it +3 Vote: I do not like it

Out of curiosity, does it have any use cases in competitive programming? I mean, it is good to know and probably to have in a prewritten library, but have anyone ever encountered the need of speeding up the random calls in a competition?

  • »
    »
    2 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I had onced faced a TLE in an ICPC-style competition when I tried to squeeze treaps, and I barely got AC using a simple linear congruential generator. But yeah, it wasn't intended, and I don't think this really matters as much.

  • »
    »
    2 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Chances that the official solution only requires some simple data structure while I'm too stupid to solve it with treap. In that case, I might need a faster PRNG to fit my code into the TL.