ll res = -oo;
for(int k=1; k<=n; k++)
for(int i=1; i<=n-k+1; i++)
for(int j=1; j<=n-k+1; j++)
{
f[i][j][k] = f[i][j][k - 1] + a[i + k - 1] * b[j + k - 1];
res = max(res, f[i][j][k]);
}
cout << res;
ll res = -oo;
for(int i=1; i<=n; i++)
for(int j=1; j<=n; j++)
for(int k=1; k<=n; k++)
{
if(k + i > n || k + j > n) break;
f[i][j][k] = f[i][j][k - 1] + a[i + k - 1] * b[j + k - 1];
res = max(res, f[i][j][k]);
}
cout << res;
These code seem not to be differ too much but execution time have much differ. Example in case n = 500, code1 run in 3103ms but code 2 run only 789ms. What is the reason ? Pls explain to me. Thanks.
I didn't read the code but it is probably cache misses https://stackoverflow.com/questions/9936132/why-does-the-order-of-the-loops-affect-performance-when-iterating-over-a-2d-arra https://en.wikipedia.org/wiki/Loop_interchange
From my little understanding, arrays are arranged row order in continuos blocks of memory. i.e. the first row is in a block of memory, then the second row immediately continues from where the first one left off and so on...
This makes it easier to go to the next row in a loop that is ordered row by row by just going to the next position in memory eveytime making it cache friendly. If ordered the other way around, it takes more time as you have to continuosly make jumps in memory position when switching to different rows which isn't cache friendly
Hey, can you please put the codes inside the block
ok
Auto comment: topic has been updated by ExpectoPatronum (previous revision, new revision, compare).
Auto comment: topic has been updated by ExpectoPatronum (previous revision, new revision, compare).
pro vjp idol
Because of how memory works
Whenever you read 4 bytes, you actually load 64 bytes into cpu caches and basically read other 15 values for free (MUCH cheaper than from memory) if you read contigious ranges of memory
But if you random jump, you lose advantages of caches, because they're limited and always filled with data you currently don't need and you have to read each value from memory directly