grhkm's blog

By grhkm, history, 4 years ago, In English

I am messing around in godbolt.org (compiler), and I tried to compile this code with -O2

unsigned int func(unsigned int x) {
    return x / 7;
}

Surprisingly, this is what I get

func:
        mov     eax, edi             # edi is argument (x). Move x to eax (register)
        imul    rax, rax, 613566757  # eax *= 613566757 (unsigned 64-bit multiplication, rax is 64-bit version of eax)
        shr     rax, 32              # eax >>= 32 (gets upper 32 bit of product)
        sub     edi, eax             # x -= eax
        shr     edi                  # x >>= 1
        add     eax, edi             # eax += x
        shr     eax, 2               # x >>= 2
        ret                          # return x

What is imul rax, rax, 613566757 doing? Where does the constant come from?

Also, this translated code works as intended:

#include<bits/stdc++.h>
using namespace std;
typedef long long ll;

int main() {
        int i = 137;

        long long edi = i;
        long long eax = i;
        eax *= 613566757LL;
        eax >>= 32;
        edi -= eax;
        edi >>= 1;
        eax += edi;
        eax >>= 2;
        cout << i << " " << i / 7 << " " << eax << endl;
}

Thank you very much.

(Originally I had int instead of unsigned int, but it's more complicated)

  • Vote: I like it
  • +41
  • Vote: I do not like it

»
4 years ago, # |
  Vote: I like it +3 Vote: I do not like it
»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it

So, when compiled with -O2, division is as fast as multiplication and isn't as expensive?

  • »
    »
    4 years ago, # ^ |
      Vote: I like it +9 Vote: I do not like it

    As you can see as in the divide by 7 example, it's optimization is not just one multiplication, but one multiplication and a couple of additions and shifts. So it's a little more expansive than multiplication, although not much.

    However, this optimization only works when the divisor is known at compile time, otherwise the compiler cannot figure out the factors, shifts and additions!

    Also, as with all compiler optimizations, you can't expect the compiler to always perform this optimization.