kevlu8's blog

By kevlu8, 3 months ago, In English

GCC has many optimization pragmas that can be prepended to files. Generally, they should speed up your code the same amount as the equivalent command-line argument, however this is not always the case.

Theoretically, you would expect

#pragma GCC optimize("O3")

to optimize your code the same way as

g++ main.cpp -O3

But it doesn't! Let's take a look at an example program:

#pragma GCC optimize("O3")
#include <bits/stdc++.h>
using namespace std;

#define SZ 10000005

int arr[SZ] = {};

int main() {
    iota(arr, arr+SZ, 1);
    int tgt = 19999473;
    unordered_set<int> s(arr, arr+SZ);
    for (int i = 0; i < SZ; i++) {
        if (s.count(tgt-arr[i])) {
            cout << arr[i] << ' ' << tgt-arr[i] << '\n';
            break;
        }
    }
}

This is a pretty simple and well-known solution to a problem. It solves the Two-Sum problem using a hashset.

Here's a chart showing the runtime of the program with and without the pragma (running on Ryzen 7 7700X, compiled with no other arguments, mean of 5 trials):

Optimization Time (s)
None, without pragma ~1.79
None, with pragma ~0.98
-O3, without pragma ~0.36
-O3, with pragma ~0.36

As you can see, the pragma does much worse than the -O3 flag, even though they should be equivalent. Why is this?

Looking into the assembly code generated, we can see that the code generated by the -O3 command-line argument actually does not contain any occurrences of unordered_set, whilst the code generated by the pragma contains loads of occurrences. What does this mean?

This actually tells us that -O3 performs more optimizing (specifically, inlining) than the pragma. This is further demonstrated by the the following example:

#pragma GCC optimize("O3")
static int return5() {
    return 5;
}
int main() {
    return return5();
}

With the pragma, the generated assembly code (simplified) is:

main:
    jmp return5
return5:
    mov eax, 5
    ret

With -O3, the generated assembly code is:

main:
    mov eax, 5
    ret

Anyone can see that return5 should be inlined. It's even a static function! But the pragma doesn't inline it, whilst -O3 does. Even after adding inline-functions,inline-small-functions,inline-functions-called-once to the pragma, it still doesn't get optimized. After adding __attribute__((always_inline)) to the function, it finally gets inlined. Why this is the case is beyond me. Although this is a very minor example, as shown by the first example, these kinds of small improvements matter more and more as the program gets more complex.

There are probably many more examples of optimizations that -O3 does that the pragma doesn't, but the most important thing to take away from this is that the pragma is not quite equivalent to -O3.

Most of the time, there is no reason to use #pragma GCC optimize("O3") over -O3, because you can just modify your compile-time command-line arguments. The only place where this is necessary would be competitive programming, since most judges compile with -O2 and sometimes you're able to squeeze into the time limit by using O3 and avx2.

What can we do with this information? Not much, really. Just be aware that the pragma is not equivalent to -O3, and that you should use -O3 over the pragma whenever possible. However, in situations where specifying -O3 in the command line is not possible, the pragma is a passable alternative.

One final note: make sure that if you use the pragma, you use it at the top of the file, before any includes. If you use it in the middle of the file, it will only apply to the code after the pragma.

Thanks for reading my first blog post! I hope you enjoyed!

  • Vote: I like it
  • +125
  • Vote: I do not like it

»
3 months ago, # |
  Vote: I like it +4 Vote: I do not like it

For your second example, I did see gcc 14 inlines the function, which doesn't happen with previous versions.

I suspect whether in the source file in doc means only the current file, excluding the header it includes, which could explain the first example.

  • »
    »
    3 months ago, # ^ |
    Rev. 2   Vote: I like it +3 Vote: I do not like it

    I did consider the possibility of the pragma only optimizing functions defined in the source file, but it doesn't seem to be the case:

    File header.h:

    int sum100() {
       int sum = 0;
       for (int i = 1; i <= 100; i++) sum += i;
       return sum;
    }
    

    File main.cpp:

    #include "header.h"
    int main() {
        return sum100();
    }
    

    Without the pragma (and only compiling with g++ main.cpp), sum100() actually does the computation, whilst with the pragma it doesn't and just directly returns the value.

»
3 months ago, # |
  Vote: I like it +19 Vote: I do not like it

weren't you in that one github PR?

  • »
    »
    3 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    all hail king *fuck you*

  • »
    »
    3 months ago, # ^ |
      Vote: I like it +8 Vote: I do not like it

    Yup, that's me, for better or for worse...

    • »
      »
      »
      3 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      actually crazy that people know you from that now

»
3 months ago, # |
  Vote: I like it 0 Vote: I do not like it

so we should use this #pragma GCC optimize("O3") or what ?

»
3 months ago, # |
  Vote: I like it +53 Vote: I do not like it

Even after adding inline-functions,inline-small-functions,inline-functions-called-once to the pragma, it still doesn't get optimized.

While you tried all of the inline flags listed in gcc/Optimize-Options, it might not be enough:

Depending on the target and how GCC was configured, a slightly different set of optimizations may be enabled at each -O level than those listed here. You can invoke GCC with -Q --help=optimizers to find out the exact set of optimizations that are enabled at each level.

Only one way to find out the exact set of optimizations!

O0
O3

As you can see, -finline was missing from the optimizer, simply add it as follows:

#pragma GCC optimize("O3,inline")
main.cpp
> g++ main.cpp && time ./a.out
9999468 10000005
./a.out  0.38s user 0.12s system 99% cpu 0.498 total

Yields a similar performance as -O3 flag! For more information please refer to C++ and the -O3 compilation flag

If you want a 1:1 match with -O3 and not just -finline, try to match pragma with g++ -O3 -Q --help=optimizers. For the scope of this comment, it is left as an exercise to the readers.

The only place where this is necessary would be competitive programming, since most judges compile with -O2

The one judge you should worry about is leetcode
  • »
    »
    3 months ago, # ^ |
      Vote: I like it +13 Vote: I do not like it

    Ah, you're right! Can't believe that I forgot about -finline... haha

    In that case, #pragma GCC optimize("O3,inline") would be pretty close to optimal, while still being relatively short.

    I've done some light digging on clang optimizations, but it doesn't seem like clang has anything similar to #pragma GCC optimize. I guess this choice was intentional as LeetCode test data is usually weak, and constant optimization is usually enough to pass with suboptimal solutions. I do find it weird that LeetCode includes a useless pragma at the end of the file though...