Template Metaprogramming in C++ Part 1 / Infinity

Правка en2, от Everule, 2023-02-16 22:08:00

This (possibly series of) blog(s) should serve as in introduction to template metaprogramming in C++, and how you can use it to write more reusable and efficient snippets of code. You need to know nothing beforehand, and I believe its actually much simpler than most people think it is, looking at all the features you have in it. I will also be mentioning which version of C++ you need to use each mentioned feature, if it is recent enough.

Consider a rudimentary function like std::max. How do you even write std::max?. Well you could do


int max(int a, int b){ if(a > b) return a; else return b; }

Well you might use it on floats too. Why not add


float max(float a, float b){ if(a > b) return a; else return b; }

You very quickly realise this is a horrible idea. There has to be some way to say, I want to do this, but for any types.

Here is where template comes in. template is where you write the list of type parameters of the function. For example, you could consider the first function to be max with a type parameter of int, and the second function to be max with a type parameter of float.

So we could write


template<typename T> T max(T a, T b){ if(a < b) return a; else return b; }

Now writing max<int> tells the compiler, that you want the function max with a type parameter of T = int. It is important to mention, that to the compiler max<int> and max<float> are two completely different functions, just like the previous code. The only difference is that these functions are autogenerated.

Now let us try writing some more utility functions.

A lot of the times, we want to work with a list of elements, rather than just 2 elements like what std::max does. For example find the first occurence of element $$$x$$$ in some list $$$L$$$. C++ opted for generality, and it represents a list as a pair of iterators, the start and end iterator. They work like this — You can dereference the start iterator, which is equivalent to getting the first element of the list. You can also increment the start iterator, which is equivalent to going to the next element of the list. If the start iterator becomes equal to the end iterator, it means that the list is over, and dereferencing an end iterator is usually undefined behavior, and you should avoid it. So std::find works like this (there are lot of optimisations hidden behind it).

//ForwardIterator is the type of the iterator, and T is the type of the value we check equality with
template<typename ForwardIterator, typename T> 
ForwardIterator find_first(ForwardIterator start, ForwardIterator end, T value){
    while(start != end){ //Checks if the iterator hasn't reached the end
        if(*start == value) return start; //"*start" dereferences an iterator, and returns the value
        ++start; //++start increments an iterator to the next value
   }
   return start;
}

The underlying idea is, you increment start if you didn't find match, and once you do, you return start. According to convention, if there was no match, you return end, which is equal to start at the last line.

So we can use it like this


std::vector<int> a; std::vector<int>::iterator it = find_first(a.begin(), a.end(), 0);

It is not very convenient to write std::vector<int>::iterator. The compiler already knows the return type of find_first because of the parameters, so it doesn't make sense for us to rewrite it. This is where the auto keyword comes in useful.

You can use

std::vector<int> a;
auto it = find_first(a.begin(), a.end(), 0);

Where auto is deduced from the return type of find_first.

Let's say now, you wanted to find the first element larger than value in find_first. All of a sudden, the equality check doesn't seem as general as you would like. Maybe instead of checking equality, we should let the user decide the if condition. But you can't pass code into a function. This is where lambdas come in. A lambda is a function that acts as variable. That is a bit abstract, so let's try an example.


auto add_values = [](int a, int b /* parameters */) -> int /* return type */ { return a + b; }; auto mult_values = [](int a, int b) -> long long { return 1ll * a * b; }; assert(add_values(2, 3) == 5); assert(mult_values(2, 3) == 6ll);

add_values and mult_values are lambdas, which can be used as a function using the () operator. Internally, a lambda is just a local struct, with the () operator overloaded.


struct __add_values_impl{ int operator()(int a, int b){ return a + b; } }; struct __mult_values_impl{ long long operator()(int a, int b){ return 1ll * a * b; } }; __add_values_impl add_values; __mult_values_impl mult_values; assert(add_values(2, 3) == 5); assert(mult_values(2, 3) == 6ll);

That's why its important to notice each lambda has a distinct type, and you cannot assign one lambda to another. But they can be passed around in functions, which is what make them particularly useful.


template<typename ForwardIterator, typename F> ForwardIterator find_first(ForwardIterator start, ForwardIterator end, F condition){ while(start != end){ if(condition(*start)) return start; //condition(*start) calls the function condition, with the value in the iterator ++start; } return start; }

Now you can put just about anything in condition. For example, you could do

vector<int> a;
auto it = std::find_first(a.begin(), a.end(), [](int val) -> bool {
    return val > 10; // You don't need to assign a lambda to a variable, you can just write it directly as parameter
});

and it will store an iterator to the first element that is larger than 10.

Let's consider now, what if didn't want 10 specifically, we might want some variable like x instead. This is the second thing lambdas can do. They can capture variables in their surroundings.

vector<int> a;
int x;
auto it = std::find_first(a.begin(), a.end(), [x /* capturing x */ ](int val) -> bool {
    return val > x;
});

The [] is to store the captured variables. [x] represents capturing x by value. This means x will be copied inside the lambda. You can also capture x by reference, by writing [&x] instead. If you capture something by reference, then editing it will change the value of your variable. Keep that in mind.

To prevent needing to capture too many variables, lambdas also come with capture defaults. If you use [&], it will capture all used variables by reference, and if you use [=], it will capture all used variables by value. As an example, you can look at this

int x = 1, y = 2;
auto func = [x, &y](int z) -> int {
    return x + y + z;
};
assert(func(2) == 5);
x = 3;
assert(func(2) == 5);
y = 3;
assert(func(2) == 6);

Internally, you can think it works like this

struct __func_impl{
   int x;
   int& y;
   __func_impl(int _x, int& _y) : x(_x), y(_y) {} // This is notation of initialising x = _x, and y = _y.
   int operator()(int z){
       return x + y + z;
   }
};
int x = 1, y = 2;
__func_impl func(x, y);
assert(func(2) == 5);
x = 3;
assert(func(2) == 5);
y = 3;
assert(func(2) == 6);

You've learnt the basics of template metaprogramming! You can start writing most basic ideas you have now, using the template, auto, typename keywords. This is one of the most powerful features of C++, and that's why its part $$$1$$$ out of $$$\infty$$$.

История

 
 
 
 
Правки
 
 
  Rev. Язык Кто Когда Δ Комментарий
en3 Английский Everule 2023-02-16 22:10:34 66
en2 Английский Everule 2023-02-16 22:08:00 235
en1 Английский Everule 2023-02-16 22:04:34 7573 Initial revision (published)