kustizus's blog

By kustizus, history, 2 months ago, In English

In C++ programming, structs are commonly used to group related variables together. However, I realize that the order of declaration of struct members can have a significant impact on the memory size that the struct occupies. This is due to memory alignment in C++, which can lead to the insertion of padding (unused memory) between struct members.

In this post, we will explore how to arrange struct members in a way that optimizes memory usage and minimizes unnecessary padding.

What is Memory Alignment and Padding?

Memory alignment refers to how data types are stored in memory at specific byte boundaries. On most systems, certain data types such as int, long long, char, and bool are required to be stored at memory addresses that are divisible by their respective sizes.

For example:

int usually requires 4-byte alignment.

long long requires 8-byte alignment.

char and bool can be stored at any memory address.

If struct members are not aligned properly, the compiler may insert padding to ensure each data type adheres to its alignment requirements. This padding results in wasted memory.

Example of a Non-Optimized Struct

Let’s look at a simple struct with members declared in a non-optimal order:

#include <bits/stdc++.h>
using namespace std;

struct {
    int b;        // 4 bytes
    char c;       // 1 byte
    long long a;  // 8 bytes
    bool d;       // 1 byte
} st;

signed main() {
    cout << sizeof(st) << endl;  // Output the size of the struct
    return 0;
}

In this struct:

int b takes 4 bytes.

char c takes 1 byte, but the compiler will add 3 bytes of padding after it to align long long a to an address divisible by 8.

long long a takes 8 bytes and requires 8-byte alignment.

bool d takes 1 byte, but the compiler adds 7 bytes of padding after it to ensure the struct’s total size is a multiple of 8 bytes.

Total size:

4 (int) + 1 (char) + 3 (padding) + 8 (long long) + 1 (bool) + 7 (padding) = 24 bytes

Optimizing the Struct

To reduce padding and optimize memory usage, we can rearrange the struct members so that larger types are declared first, minimizing the need for padding. Here’s how we can improve the struct:

#include <bits/stdc++.h>
using namespace std;

struct {
    long long a;  // 8 bytes
    int b;        // 4 bytes
    char c;       // 1 byte
    bool d;       // 1 byte
} st;

signed main() {
    cout << sizeof(st) << endl;  // Output the size of the struct
    return 0;
}

Optimized Result:

In this optimized struct:

long long a takes 8 bytes and requires 8-byte alignment. Placing it at the beginning ensures no padding is added after it.

int b takes 4 bytes and requires 4-byte alignment. Placing it after long long ensures no padding.

char c takes 1 byte and requires 1-byte alignment. Placing it after int b results in no padding.

bool d takes 1 byte and requires 1-byte alignment. It fits perfectly after char c, with no padding needed.

Finally, the compiler adds 2 bytes of padding after it to ensure the struct’s total size is a multiple of 8 bytes.

Alignment of the struct: The overall size of the struct must also be divisible by the alignment of the member with the highest alignment requirement. If a struct contains a member like long long (which requires 8-byte alignment), then the total size of the struct must be a multiple of 8 to comply with this alignment requirement.

Total size:

8 (long long) + 4 (int) + 1 (char) + 1 (bool) + 2 (padding) = 16 bytes

Why This Struct Is More Optimized

1. Proper memory alignment: By placing the largest types (long long) first, we ensure no padding is required between members.

2.Reduced padding: Arranging members in decreasing order of size and alignment minimizes the need for padding, resulting in more efficient memory usage.

3. Optimized memory usage: This struct now has a total size of 16 bytes, saving 8 bytes compared to the original version.

Note

In the case of a struct like this:

struct {
    double x;
    int a[3];
    int b;
};

We will declare double x before int a[3], even though the total memory size of a[3] is 12 bytes and x is 8 bytes. This is because double has a larger alignment requirement than int, and placing x first ensures proper alignment.

Therefore, the declaration order depends on the size of the data type rather than the total memory size of the elements.

Conclusion

When working with structs in C++, it's not just about grouping related variables together but also about optimizing memory usage. By arranging members in decreasing order of size and alignment, you can reduce padding and save memory, which is essential for memory-sensitive applications.

We hope this post helps you understand how memory optimization works with structs in C++ and how to make your programs more efficient!

  • Vote: I like it
  • +153
  • Vote: I do not like it

»
2 months ago, # |
  Vote: I like it +7 Vote: I do not like it

My code was getting memory limit excited at an issue and I accidentally discovered it. :V

  • »
    »
    2 months ago, # ^ |
    Rev. 3   Vote: I like it +6 Vote: I do not like it

    On most systems, certain data types such as int, long long, char, and bool are required to be stored at memory addresses that are divisible by their respective sizes.

    Probably the compilers are just being conservative The compilers are required to produce aligned addresses (An object type imposes an alignment requirement on every object of that type in C++). But using unaligned addresses are possible and hardly cause performance differences with modern CPUs . Will you code pass with #pragma pack(1) without the struct change?

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      It has passed, but according to some information I found on the internet, #pragma pack(1) can have a Performance Impact and Lead to Loss of Compiler Optimizations.

      • »
        »
        »
        »
        7 weeks ago, # ^ |
        Rev. 2   Vote: I like it 0 Vote: I do not like it

        what about using __attribute((packed))__ ? Does it have any performance issues, since using it gives the size of your struct 14 bytes?

        • »
          »
          »
          »
          »
          7 weeks ago, # ^ |
            Vote: I like it +6 Vote: I do not like it

          It certainly could. Misaligned pointers are UB according to C standard so it entirely depends on what your compiler does in your specific case since you're using a compiler extension attribute.

      • »
        »
        »
        »
        7 weeks ago, # ^ |
        Rev. 2   Vote: I like it 0 Vote: I do not like it

        In the context of CP I would be curious to see if this can cause any TLEs. In a simple test I added -march=skylake to check the case for SIMD, whether with it, the existence of #pragma pack(1) or not produces the same asm except the memory offsets.

    • »
      »
      »
      2 months ago, # ^ |
        Vote: I like it +9 Vote: I do not like it

      Anon Tokyo! (excuse my weird point of attention)

    • »
      »
      »
      7 weeks ago, # ^ |
        Vote: I like it +6 Vote: I do not like it

      The compilers are following the standard. It's UB so it can work perfectly fine on most modern hardware, but compilers are made to be compatible with (for a reasonable definition of "all") all hardware.

      • »
        »
        »
        »
        7 weeks ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Thanks for pointing out this, I updated my comment.

»
2 months ago, # |
  Vote: I like it +70 Vote: I do not like it

On memory optimization associated with structs, when we use a data type larger than the required range, eg. using int to store a positive integer less than 1e5, there is a useful trick to cut down memory usage:

struct Node {
    ull mx : 18, ls : 23, rs : 23; 
};

Such a struct only takes up 64 bits or 8 bytes of memory, yet it gives us 3 usable variables (on the condition that they don't exceed their respective upper limits).

  • »
    »
    2 months ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    This is useful knowledge

  • »
    »
    2 months ago, # ^ |
      Vote: I like it -10 Vote: I do not like it

    Aliens magic, wow

  • »
    »
    7 weeks ago, # ^ |
      Vote: I like it +40 Vote: I do not like it

    And how does memory access to such variables work? You can't just movq those, there needs to be extra overhead. Even more complicated is store.

    • »
      »
      »
      7 weeks ago, # ^ |
      Rev. 2   Vote: I like it +16 Vote: I do not like it

      https://godbolt.org/z/45GPWGdzv

      load adds a shr and and, store adds 4 extra bitwise operations. I'm curious how much this overhead is compared to the cache locality savings, I would think it's basically negligible.

      • »
        »
        »
        »
        7 weeks ago, # ^ |
          Vote: I like it +16 Vote: I do not like it

        The usual rule is negligible with random access, big with sequential access. Exact performance somewhat affected by specific CPU's cache parameters.

»
7 weeks ago, # |
  Vote: I like it +6 Vote: I do not like it

does this also happen in the classes ?!

  • »
    »
    7 weeks ago, # ^ |
      Vote: I like it +6 Vote: I do not like it

    Yes. In C++, classes are just structs with private as the default access specifier.