Hassan_Fathi's blog

By Hassan_Fathi, history, 7 months ago, In English

Hello guys!

I was solving some problems on dp on subsets and I found this strange one

this got AC, but that gets TLE, and the only difference between them is swaping the inner 2 loops (mask and j).

I'm really curious to know why :D

  • Vote: I like it
  • +13
  • Vote: I do not like it

»
7 months ago, # |
  Vote: I like it +11 Vote: I do not like it

I guess it's a compiler optimization

In your code if $$$g[i][j]$$$ is false, it won't do anything in the loop.

I guess GCC adds if (!g[i][j]) break; to the end of line 19

  • »
    »
    7 months ago, # ^ |
      Vote: I like it +14 Vote: I do not like it

    Great, but this condition will never happen if the entire grid = 1

    • »
      »
      »
      7 months ago, # ^ |
        Vote: I like it +19 Vote: I do not like it

      You're right, it might be because of random-access and you are defining vectors inside the function (i.e. heap/stack thing).

      But I don't think so because I resubmitted with defining in global and no difference.

      • »
        »
        »
        »
        7 months ago, # ^ |
          Vote: I like it +13 Vote: I do not like it

        Thanks for your time, I think now you are curious too :D

»
7 months ago, # |
  Vote: I like it +36 Vote: I do not like it

Second code gets TLE'd because of some cache shenanigans.

My thoughts that CPU in first submit prefetches full cache line, so access to $$$newdp[mask]$$$ is almost free, but in the other submit, it can be washed out by accessing $$$g[i][j]$$$.

And AFAIK, arrays with len that are powers of two are quite bad for performance reasons on some CPUs, see this link for reference.

  • »
    »
    7 months ago, # ^ |
      Vote: I like it +23 Vote: I do not like it

    I thought that cache stuff don't affect complexity that much.

    Now I got it thanks :D

    • »
      »
      »
      7 months ago, # ^ |
        Vote: I like it +6 Vote: I do not like it

      Technically you're right. It doesn't affect the complexity at all. It does affect the constant factor though, which is what's causing the 2.5x speedup here.

      • »
        »
        »
        »
        7 months ago, # ^ |
          Vote: I like it +4 Vote: I do not like it

        No words come after yours sir.

»
7 months ago, # |
  Vote: I like it +18 Vote: I do not like it

similar thing has happened with me once on cf. i think it is due to more cache miss rate in second one but can't guess why it's happening