int vs long long for vectorization in 64 bit architecture

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	157
6	Qingyu	155
7	djm03178	151
7	adamant	151
9	luogu_official	150
10	awoo	147

We know that keeping aside extra memory usage, long long is faster than int in x64

It’s possible to access data items that are narrower than the width of a machine’s data bus, but it’s typically more costly than accessing items whose widths match that of the data bus. For example, when reading a 16-bit value on a 64-bit machine, a full 64 bits worth of data must still be read from memory. The desired 16-bit field then has to be masked off and possibly shifted into place within the destination register.

But if we are also depending on vectorization for speeding up execution, ints perform better

I say this based on the fact that vectorization width is higher with 32 bits integers

For example, in the following example:

    for (int i = 0; i < n; i++)
        c[i] = a[i] + b[i];

With #define int long long

main.cpp:114:5: remark: vectorized loop (vectorization width: 2, interleaved count: 2) [-Rpass=loop-vectorize]
    for (int i = 0; i < n; i++)

Without #define int long long

main.cpp:114:5: remark: vectorized loop (vectorization width: 4, interleaved count: 2) [-Rpass=loop-vectorize]
    for (int i = 0; i < n; i++)

So normal access speeds up but vectorization slows down with 64 bit integers, then what would be the final outcome? would 64 bit integer be better or 32 bit?

Comments (14)

Write comment?

ecnerwala

4 years ago, # |

+13

For essentially all these constant-factor tradeoffs, the answer is "it depends on your workload". The golden rule of performance optimization is that you have to benchmark your actual workload/code, and tradeoffs like this are why.

→ Reply

codemastercpp

4 years ago, # ^ |

I see, that's a pain

Oh I forgot to mention: there are nontrivial microarchitectural differences between different processors too, and they're typically impossible to fully comprehend from first principles, so you have to benchmark your actual code on the actual judge too!

So it's a must to benchmark on custom invocation if I am depending on these types of optimizations. Thanks.

Btw after submitting to code forces for a long time have you gotten a general idea of what kind of stuff generally performs better in their system or is it really just "benchmark, don't assume"

Well competitive programming typically does not require heavy performance optimization; any reasonable implementation should pass (as long as setters are picking good time limits). I have a little performance intuition, but I'm pretty often wrong about more complex things.

ramchandra

+10

int32_t is generally better for arrays because of vectorization and smaller memory footprint (which helps with caching)

I think int64_t might be faster for a single variable, but I'm not sure.

Did some benchmarking. Turns out int64_t is not faster than int32_t on x86-64, at least in terms of incrementing a variable.

defnotmee

probably that test isn't that valuable because incrementing a variable doesn't move data around (aka use the data bus)

MrDindows

It's hard for me to imagine any real case when int64 calculations will be faster than int32.

I've actually noticed better execution speed on codeforces with 64bit Integers on 64bit compiler more than a few times.

oversolver

Because it is gcc9 vs gcc7, not 64bit vs 32bit

I always thought otherwise, just today I realized this. wow thanks

balalaika

Here

Okay, maybe that's true in this example. However, if you replace the summation type by the same as the data type, the results are significantly different. So "everything in int32" still wins.

https://quick-bench.com/q/neGxBwoIbwcie6juMrShJSERPAw

codemastercpp's blog