I was wondering what is a good estimate for the upper bound on the number of operations, which can be done under a second. So I decided to test and come up with a result. But the more I test on different systems, the more confusing it becomes for me. Below is the code which I used for testing, and I found widely varying results:
#include <iostream>
#include <ctime>
using namespace std;
#define int long long
const int N = 9e6;
int a[N], b[N];
signed main()
{
int tt = 1000*clock()/CLOCKS_PER_SEC;
for(int t=1; t<=100; t++)
{
for(int i=0; i<N; i++)
a[i] = i%897896;
}
cout<<1000*clock()/CLOCKS_PER_SEC-tt<<"ms\n";
}
The results are as following. Note, I have a 64 bit Linux OS, with g++ 9.3.0
- On my system with g++ -std=c++17 :
2436 ms
- On my system with g++ -std=c++17 -static -O2 :
1551 ms
- On Codeforces Custom Test with C++17 :
4641 ms
- On Codeforces Custom Test with C++17 (x64) :
892 ms
- On Windows 8.1 (x64 VBox) with C++14 :
2015 ms
I wanted to ask, what is the reason behind such a drastic variation among different systems? Also, what should be a good estimate for the number of operations under a second?
First of all, if you don't submit as x64, it uses 32 bit calculations. Here, you did "#define int long long“, so every calculation you do is a 64 bit calculation, which is VERY slow under a 32 bit system (which is the reason for 4641ms). Codeforces x64 is extremely quick because the Codeforces grading servers are extremely quick, and if you don't waste time on massive calculations it will almost always be quicker than your computer. Codeforces also turns on O2 by default.
On your system, if you don't turn on O2, it calculates the answer rather naively, so it's not that fast. But, when you turn on O2, it makes the mod calculations much faster, which is the reason for the speedup.
I don't know how to explain the Windows time difference.
Thank you for your reply. So in your opinion, what should be a good estimate for the number of operations for a second? What estimate do you use? And do you change your estimate on different judges?
Picture yourself with a 4-digit calculator, having to multiply two 4-digit numbers. That's roughly what Codeforces-x86 is facing here, with that modulo operation. No wonder it's more than 4 times faster with Codeforces-x64.
On a side note, one does not both use
#define int long long
and know the number of operations their program typically does in a second. Choose one please.Thank you for your reply. I am sorry, but I was not able to understand the side note thing. Why can't we know the number of operations if one uses long long instead of int?
We can actually, my remark is an exaggeration. Point is, the attitude of "let's just use the largest data type everywhere" is at odds with knowing the number of operations in a second more or less exactly.
More specifically, many of the contest environments are still 32-bit. In extreme cases like the one mentioned in your post, it can mean more than 4x drop in performance, compared to compiling and running the same program in a 64-bit environment.
Among other concerns, requiring 2x more memory may mean there are more cache misses, which slows the program down too. This slowdown will depend too much on the details though, there is no universal recipe.
Anyway, jef provided a much more practical answer to your original question already.
"What should be a good estimate for the number of operations under a second?" — On the order of $$$10^9$$$. It doesn't get more precise than that. There are a lot of factors affecting how many operations a second you get, such as caching, compiler optimizations and branch prediction just to name a few. Not all operations take the same amount of time to execute. And of course different systems can run at very different speeds.
If you want to know whether your algorithm will run in time, just plug in the maximum input sizes to your complexity. If you get less than $$$10^8$$$, you're probably fine. If you get $$$5*10^8$$$ it starts to get dangerous. $$$10^9$$$ has a good chance of TLE, and anything above that will very likely TLE. What I just described is for typical implementations. You'll need to adjust accordingly for stuff such as bitsets, messy and complicated implementations, and algorithms that have bad complexity but are fast in practice. It all comes down to experience really. There's no single number of "operations" which you can do in a second because it all depends.
Thank you for your extensive answer, will always keep these things in mind.