Блог пользователя kustizus

Автор kustizus, история, 2 месяца назад, По-английски

In C++ programming, structs are commonly used to group related variables together. However, I realize that the order of declaration of struct members can have a significant impact on the memory size that the struct occupies. This is due to memory alignment in C++, which can lead to the insertion of padding (unused memory) between struct members.

In this post, we will explore how to arrange struct members in a way that optimizes memory usage and minimizes unnecessary padding.

What is Memory Alignment and Padding?

Memory alignment refers to how data types are stored in memory at specific byte boundaries. On most systems, certain data types such as int, long long, char, and bool are required to be stored at memory addresses that are divisible by their respective sizes.

For example:

int usually requires 4-byte alignment.

long long requires 8-byte alignment.

char and bool can be stored at any memory address.

If struct members are not aligned properly, the compiler may insert padding to ensure each data type adheres to its alignment requirements. This padding results in wasted memory.

Example of a Non-Optimized Struct

Let’s look at a simple struct with members declared in a non-optimal order:

#include <bits/stdc++.h>
using namespace std;

struct {
    int b;        // 4 bytes
    char c;       // 1 byte
    long long a;  // 8 bytes
    bool d;       // 1 byte
} st;

signed main() {
    cout << sizeof(st) << endl;  // Output the size of the struct
    return 0;
}

In this struct:

int b takes 4 bytes.

char c takes 1 byte, but the compiler will add 3 bytes of padding after it to align long long a to an address divisible by 8.

long long a takes 8 bytes and requires 8-byte alignment.

bool d takes 1 byte, but the compiler adds 7 bytes of padding after it to ensure the struct’s total size is a multiple of 8 bytes.

Total size:

4 (int) + 1 (char) + 3 (padding) + 8 (long long) + 1 (bool) + 7 (padding) = 24 bytes

Optimizing the Struct

To reduce padding and optimize memory usage, we can rearrange the struct members so that larger types are declared first, minimizing the need for padding. Here’s how we can improve the struct:

#include <bits/stdc++.h>
using namespace std;

struct {
    long long a;  // 8 bytes
    int b;        // 4 bytes
    char c;       // 1 byte
    bool d;       // 1 byte
} st;

signed main() {
    cout << sizeof(st) << endl;  // Output the size of the struct
    return 0;
}

Optimized Result:

In this optimized struct:

long long a takes 8 bytes and requires 8-byte alignment. Placing it at the beginning ensures no padding is added after it.

int b takes 4 bytes and requires 4-byte alignment. Placing it after long long ensures no padding.

char c takes 1 byte and requires 1-byte alignment. Placing it after int b results in no padding.

bool d takes 1 byte and requires 1-byte alignment. It fits perfectly after char c, with no padding needed.

Finally, the compiler adds 2 bytes of padding after it to ensure the struct’s total size is a multiple of 8 bytes.

Alignment of the struct: The overall size of the struct must also be divisible by the alignment of the member with the highest alignment requirement. If a struct contains a member like long long (which requires 8-byte alignment), then the total size of the struct must be a multiple of 8 to comply with this alignment requirement.

Total size:

8 (long long) + 4 (int) + 1 (char) + 1 (bool) + 2 (padding) = 16 bytes

Why This Struct Is More Optimized

1. Proper memory alignment: By placing the largest types (long long) first, we ensure no padding is required between members.

2.Reduced padding: Arranging members in decreasing order of size and alignment minimizes the need for padding, resulting in more efficient memory usage.

3. Optimized memory usage: This struct now has a total size of 16 bytes, saving 8 bytes compared to the original version.

Note

In the case of a struct like this:

struct {
    double x;
    int a[3];
    int b;
};

We will declare double x before int a[3], even though the total memory size of a[3] is 12 bytes and x is 8 bytes. This is because double has a larger alignment requirement than int, and placing x first ensures proper alignment.

Therefore, the declaration order depends on the size of the data type rather than the total memory size of the elements.

Conclusion

When working with structs in C++, it's not just about grouping related variables together but also about optimizing memory usage. By arranging members in decreasing order of size and alignment, you can reduce padding and save memory, which is essential for memory-sensitive applications.

We hope this post helps you understand how memory optimization works with structs in C++ and how to make your programs more efficient!

  • Проголосовать: нравится
  • +153
  • Проголосовать: не нравится

»
2 месяца назад, # |
  Проголосовать: нравится +7 Проголосовать: не нравится

My code was getting memory limit excited at an issue and I accidentally discovered it. :V

  • »
    »
    2 месяца назад, # ^ |
    Rev. 3   Проголосовать: нравится +6 Проголосовать: не нравится

    On most systems, certain data types such as int, long long, char, and bool are required to be stored at memory addresses that are divisible by their respective sizes.

    Probably the compilers are just being conservative The compilers are required to produce aligned addresses (An object type imposes an alignment requirement on every object of that type in C++). But using unaligned addresses are possible and hardly cause performance differences with modern CPUs . Will you code pass with #pragma pack(1) without the struct change?

    • »
      »
      »
      2 месяца назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      It has passed, but according to some information I found on the internet, #pragma pack(1) can have a Performance Impact and Lead to Loss of Compiler Optimizations.

      • »
        »
        »
        »
        7 недель назад, # ^ |
        Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

        what about using __attribute((packed))__ ? Does it have any performance issues, since using it gives the size of your struct 14 bytes?

        • »
          »
          »
          »
          »
          7 недель назад, # ^ |
            Проголосовать: нравится +6 Проголосовать: не нравится

          It certainly could. Misaligned pointers are UB according to C standard so it entirely depends on what your compiler does in your specific case since you're using a compiler extension attribute.

      • »
        »
        »
        »
        7 недель назад, # ^ |
        Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

        In the context of CP I would be curious to see if this can cause any TLEs. In a simple test I added -march=skylake to check the case for SIMD, whether with it, the existence of #pragma pack(1) or not produces the same asm except the memory offsets.

    • »
      »
      »
      2 месяца назад, # ^ |
        Проголосовать: нравится +9 Проголосовать: не нравится

      Anon Tokyo! (excuse my weird point of attention)

    • »
      »
      »
      7 недель назад, # ^ |
        Проголосовать: нравится +6 Проголосовать: не нравится

      The compilers are following the standard. It's UB so it can work perfectly fine on most modern hardware, but compilers are made to be compatible with (for a reasonable definition of "all") all hardware.

»
2 месяца назад, # |
  Проголосовать: нравится +70 Проголосовать: не нравится

On memory optimization associated with structs, when we use a data type larger than the required range, eg. using int to store a positive integer less than 1e5, there is a useful trick to cut down memory usage:

struct Node {
    ull mx : 18, ls : 23, rs : 23; 
};

Such a struct only takes up 64 bits or 8 bytes of memory, yet it gives us 3 usable variables (on the condition that they don't exceed their respective upper limits).

  • »
    »
    2 месяца назад, # ^ |
    Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

    This is useful knowledge

  • »
    »
    2 месяца назад, # ^ |
      Проголосовать: нравится -10 Проголосовать: не нравится

    Aliens magic, wow

  • »
    »
    7 недель назад, # ^ |
      Проголосовать: нравится +40 Проголосовать: не нравится

    And how does memory access to such variables work? You can't just movq those, there needs to be extra overhead. Even more complicated is store.

    • »
      »
      »
      7 недель назад, # ^ |
      Rev. 2   Проголосовать: нравится +16 Проголосовать: не нравится

      https://godbolt.org/z/45GPWGdzv

      load adds a shr and and, store adds 4 extra bitwise operations. I'm curious how much this overhead is compared to the cache locality savings, I would think it's basically negligible.

      • »
        »
        »
        »
        7 недель назад, # ^ |
          Проголосовать: нравится +16 Проголосовать: не нравится

        The usual rule is negligible with random access, big with sequential access. Exact performance somewhat affected by specific CPU's cache parameters.

»
7 недель назад, # |
  Проголосовать: нравится +6 Проголосовать: не нравится

does this also happen in the classes ?!

  • »
    »
    7 недель назад, # ^ |
      Проголосовать: нравится +6 Проголосовать: не нравится

    Yes. In C++, classes are just structs with private as the default access specifier.