When solving AtCoder Grand Contest 044 Problem B, I wrote some O(n^3) code with n<=500 like:
typedef vector vi; vector dirs({{0,1}, {0,-1}, {-1,0}, {1,0}}); ... for (auto vi : dirs){ // run O(n^3) times ... }
and get TLE. On my pc it runs within 4900ms. During the contest I have no idea why.(dumb me) After the contest I looked at other people's solution and finally get accepted solution, just add an '&' after 'auto':
for (auto& vi : dirs){ // run O(n^3) times ... } 480 ms on my pc! 10 times faster!
I did more tests: for (int dd = 0; dd<4; ++dd){ // run O(n^3) times vi dir = dirs[dd]; 4900 ms
for (int dd = 0; dd<4; ++dd){ // run O(n^3) times vi& dir = dirs[dd]; 480 ms
vi dir; for (int dd = 0; dd<4; ++dd){ // run O(n^3) times dir = dirs[dd]; 1800 ms
Maybe vector is too big, I tried pair: vector<pair<int, int>> dirs({{0,1}, {0,-1}, {-1,0}, {1,0}}); for (auto dir : dirs){ // run O(n^3) times 370 ms, no difference if I add ‘&’, maybe because pair of int is the same size as 64 bit pointers.
Final couclusion: Seems like for (auto vi : dirs) is equivalent to for (int dd = 0; dd<4; ++dd){ // run O(n^3) times vi dir = dirs[dd]; that will construct a vector and assignment another vector to it every iteration. which cost much more time than simple data structures or referance.