When solving AtCoder Grand Contest 044 Problem B, I wrote some O(n^3) code with n<=500 like:
typedef vector<int> vi;
vector<vi> dirs({{0,1}, {0,-1}, {-1,0}, {1,0}});
...
for (auto dir : dirs){ // run O(n^3) times
...
}
and get TLE. On my pc it runs within 4900ms. During the contest I have no idea why.(dumb me) After the contest I looked at other people's solution and finally get accepted solution, just add an '&' after 'auto':
for (auto& dir : dirs){ // run O(n^3) times
...
}
480 ms on my pc! 10 times faster!
I did more tests:
for (int dd = 0; dd<4; ++dd){ // run O(n^3) times
vi dir = dirs[dd];
4900 ms
for (int dd = 0; dd<4; ++dd){ // run O(n^3) times
vi& dir = dirs[dd];
480 ms
vi dir;
for (int dd = 0; dd<4; ++dd){ // run O(n^3) times
dir = dirs[dd];
1800 ms
Maybe vector is too big, I tried pair:
vector<pair<int, int>> dirs({{0,1}, {0,-1}, {-1,0}, {1,0}});
for (auto dir : dirs){ // run O(n^3) times
370 ms, no difference if I add ‘&’, maybe because pair of int is the same size as 64 bit pointers.
Final conclusion: Seems like
for (auto dir : dirs)
is equivalent to
for (int dd = 0; dd<4; ++dd){ // run O(n^3) times
vi dir = dirs[dd];
that will construct a vector and assignment another vector to it every iteration. which cost much more time than simple data structures or reference.
Auto comment: topic has been updated by iynaur87 (previous revision, new revision, compare).
Change click-bate title to something meaningful like "I should always use const auto& inside for loops".
Thanks, edited according to your suggestion.
the reason why copying a vector is slow is because it needs to allocates space on the heap where as the pair<int, int> can be placed on the stack because it has a constant size.