Hi everyone, I have just tried to solve the problem 161D.
If I use a matrix dp[50010][510]
, I get a tle verdict, even if the time complexity of the solution is $$$O(nk)$$$, $$$nk < 10^8$$$ and the constant factors are quite small. But if I use a matrix dp[510][50010]
and I swap the indices, I get ac with a time of 498 ms (much less than the time limit).
Why does it happen?
Thanks
Submission with tle verdict: 73781168
Submission with ac verdict: 73781989
When the inner dimension of the array is close to a large power of two, accessing
a[i][j]
for differenti
but samej
plays badly with processor cache. Oversimplifying a bit, cache lines are addressed by the last few bits of the actual memory address, soa[i][j]
for differenti
but samej
compete for the same slots in the cache. As a result, the program needs accessing the actual RAM instead of the processor cache too often, which is a few times slower.There's an excellent talk by Scott Meyers that discusses this particular issue.