GCC has many optimization pragmas that can be prepended to files. Generally, they should speed up your code the same amount as the equivalent command-line argument, however this is not always the case.
Theoretically, you would expect
#pragma GCC optimize("O3")
to optimize your code the same way as
g++ main.cpp -O3
But it doesn't! Let's take a look at an example program:
#pragma GCC optimize("O3")
#include <bits/stdc++.h>
using namespace std;
#define SZ 10000005
int arr[SZ] = {};
int main() {
iota(arr, arr+SZ, 1);
int tgt = 19999473;
unordered_set<int> s(arr, arr+SZ);
for (int i = 0; i < SZ; i++) {
if (s.count(tgt-arr[i])) {
cout << arr[i] << ' ' << tgt-arr[i] << '\n';
break;
}
}
}
This is a pretty simple and well-known solution to a problem. It solves the Two-Sum problem using a hashset.
Here's a chart showing the runtime of the program with and without the pragma (running on Ryzen 7 7700X, compiled with no other arguments, mean of 5 trials):
Optimization | Time (s) |
---|---|
None, without pragma | ~1.79 |
None, with pragma | ~0.98 |
-O3, without pragma | ~0.36 |
-O3, with pragma | ~0.36 |
As you can see, the pragma does much worse than the -O3
flag, even though they should be equivalent. Why is this?
Looking into the assembly code generated, we can see that the code generated by the -O3
command-line argument actually does not contain any occurrences of unordered_set
, whilst the code generated by the pragma contains loads of occurrences. What does this mean?
This actually tells us that -O3
performs more optimizing (specifically, inlining) than the pragma. This is further demonstrated by the the following example:
#pragma GCC optimize("O3")
static int return5() {
return 5;
}
int main() {
return return5();
}
With the pragma, the generated assembly code (simplified) is:
main:
jmp return5
return5:
mov eax, 5
ret
With -O3
, the generated assembly code is:
main:
mov eax, 5
ret
Anyone can see that return5
should be inlined. It's even a static function! But the pragma doesn't inline it, whilst -O3
does. Even after adding inline-functions,inline-small-functions,inline-functions-called-once
to the pragma, it still doesn't get optimized. After adding __attribute__((always_inline))
to the function, it finally gets inlined. Why this is the case is beyond me. Although this is a very minor example, as shown by the first example, these kinds of small improvements matter more and more as the program gets more complex.
There are probably many more examples of optimizations that -O3
does that the pragma doesn't, but the most important thing to take away from this is that the pragma is not quite equivalent to -O3
.
Most of the time, there is no reason to use #pragma GCC optimize("O3")
over -O3
, because you can just modify your compile-time command-line arguments. The only place where this is necessary would be competitive programming, since most judges compile with -O2
and sometimes you're able to squeeze into the time limit by using O3
and avx2
.
What can we do with this information? Not much, really. Just be aware that the pragma is not equivalent to -O3
, and that you should use -O3
over the pragma whenever possible. However, in situations where specifying -O3
in the command line is not possible, the pragma is a passable alternative.
One final note: make sure that if you use the pragma, you use it at the top of the file, before any includes. If you use it in the middle of the file, it will only apply to the code after the pragma.
Thanks for reading my first blog post! I hope you enjoyed!
For your second example, I did see gcc 14 inlines the function, which doesn't happen with previous versions.
I suspect whether
in the source file
in doc means only the current file, excluding the header it includes, which could explain the first example.I did consider the possibility of the pragma only optimizing functions defined in the source file, but it doesn't seem to be the case:
File header.h:
File main.cpp:
Without the pragma (and only compiling with
g++ main.cpp
),sum100()
actually does the computation, whilst with the pragma it doesn't and just directly returns the value.weren't you in that one github PR?
all hail king *fuck you*
Yup, that's me, for better or for worse...
actually crazy that people know you from that now
so we should use this #pragma GCC optimize("O3") or what ?
While you tried all of the inline flags listed in gcc/Optimize-Options, it might not be enough:
Only one way to find out the exact set of optimizations!
As you can see,
-finline
was missing from the optimizer, simply add it as follows:Yields a similar performance as
-O3
flag! For more information please refer to C++ and the -O3 compilation flagIf you want a 1:1 match with
-O3
and not just-finline
, try to match pragma withg++ -O3 -Q --help=optimizers
. For the scope of this comment, it is left as an exercise to the readers.Not only do they use
#pragma GCC optimize ("O2")
instead of adding-O2
they added it at the bottom of the merged code. You can see it yourself by adding the following snippet:If you click on the icon next to C++ on leetcode submit UI you will see this:
So it is compiled with clang++ and
clang++ main.cpp -finline && time ./a.out
doesn't help improving the runtime. Does anyone know a similar command tog++ -Q --help=optimizers
forclang++
?Ah, you're right! Can't believe that I forgot about
-finline
... hahaIn that case,
#pragma GCC optimize("O3,inline")
would be pretty close to optimal, while still being relatively short.I've done some light digging on clang optimizations, but it doesn't seem like clang has anything similar to
#pragma GCC optimize
. I guess this choice was intentional as LeetCode test data is usually weak, and constant optimization is usually enough to pass with suboptimal solutions. I do find it weird that LeetCode includes a useless pragma at the end of the file though...