Why so much runtime inconsistencies among different C++17 systems

#	User	Rating
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

I was wondering what is a good estimate for the upper bound on the number of operations, which can be done under a second. So I decided to test and come up with a result. But the more I test on different systems, the more confusing it becomes for me. Below is the code which I used for testing, and I found widely varying results:

#include <iostream>
#include <ctime>
using namespace std;
#define int long long

const int N = 9e6;
int a[N], b[N];

signed main()
{
    int tt = 1000*clock()/CLOCKS_PER_SEC;

    for(int t=1; t<=100; t++)
    {
        for(int i=0; i<N; i++)
            a[i] = i%897896;
    }

    cout<<1000*clock()/CLOCKS_PER_SEC-tt<<"ms\n";
}

The results are as following. Note, I have a 64 bit Linux OS, with g++ 9.3.0

On my system with g++ -std=c++17 : 2436 ms
On my system with g++ -std=c++17 -static -O2 : 1551 ms
On Codeforces Custom Test with C++17 : 4641 ms
On Codeforces Custom Test with C++17 (x64) : 892 ms
On Windows 8.1 (x64 VBox) with C++14 : 2015 ms

I wanted to ask, what is the reason behind such a drastic variation among different systems? Also, what should be a good estimate for the number of operations under a second?

Comments (7)

Write comment?

box

5 years ago, # |

+14

First of all, if you don't submit as x64, it uses 32 bit calculations. Here, you did "#define int long long“, so every calculation you do is a 64 bit calculation, which is VERY slow under a 32 bit system (which is the reason for 4641ms). Codeforces x64 is extremely quick because the Codeforces grading servers are extremely quick, and if you don't waste time on massive calculations it will almost always be quicker than your computer. Codeforces also turns on O2 by default.

On your system, if you don't turn on O2, it calculates the answer rather naively, so it's not that fast. But, when you turn on O2, it makes the mod calculations much faster, which is the reason for the speedup.

I don't know how to explain the Windows time difference.

→ Reply

B0JACK

5 years ago, # ^ |

Thank you for your reply. So in your opinion, what should be a good estimate for the number of operations for a second? What estimate do you use? And do you change your estimate on different judges?

Gassa

Picture yourself with a 4-digit calculator, having to multiply two 4-digit numbers. That's roughly what Codeforces-x86 is facing here, with that modulo operation. No wonder it's more than 4 times faster with Codeforces-x64.

On a side note, one does not both use #define int long long and know the number of operations their program typically does in a second. Choose one please.

Thank you for your reply. I am sorry, but I was not able to understand the side note thing. Why can't we know the number of operations if one uses long long instead of int?

We can actually, my remark is an exaggeration. Point is, the attitude of "let's just use the largest data type everywhere" is at odds with knowing the number of operations in a second more or less exactly.

More specifically, many of the contest environments are still 32-bit. In extreme cases like the one mentioned in your post, it can mean more than 4x drop in performance, compared to compiling and running the same program in a 64-bit environment.

Among other concerns, requiring 2x more memory may mean there are more cache misses, which slows the program down too. This slowdown will depend too much on the details though, there is no universal recipe.

Anyway, jef provided a much more practical answer to your original question already.

jef

← Rev. 2 →

+32

"What should be a good estimate for the number of operations under a second?" — On the order of $$$10^9$$$. It doesn't get more precise than that. There are a lot of factors affecting how many operations a second you get, such as caching, compiler optimizations and branch prediction just to name a few. Not all operations take the same amount of time to execute. And of course different systems can run at very different speeds.

If you want to know whether your algorithm will run in time, just plug in the maximum input sizes to your complexity. If you get less than $$$10^8$$$, you're probably fine. If you get $$$5*10^8$$$ it starts to get dangerous. $$$10^9$$$ has a good chance of TLE, and anything above that will very likely TLE. What I just described is for typical implementations. You'll need to adjust accordingly for stuff such as bitsets, messy and complicated implementations, and algorithms that have bad complexity but are fast in practice. It all comes down to experience really. There's no single number of "operations" which you can do in a second because it all depends.

Thank you for your extensive answer, will always keep these things in mind.

B0JACK's blog