Блог пользователя amirhoseinfar1385

Автор amirhoseinfar1385, история, 3 дня назад, По-английски

Hello everyone! Today, I encountered a very strange problem with question e1. I have one submission which is this: [link] And another submission which is this: [link] The only difference between the two is in line 123 and it's a very small difference, but the first one is accepted in 1.8 seconds, and the second one in 6 seconds. There is a huge difference. I wanted to know if anyone knows the reason for this big difference; it doesn't make any sense at all. That line executes about 100,000,000 times. I know it's a lot, but it doesn't make sense to me that this change would add 4 seconds to my code. I want to know if there could be another reason or if that's really all there is to it.

  • Проголосовать: нравится
  • +10
  • Проголосовать: не нравится

»
3 дня назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Auto comment: topic has been updated by amirhoseinfar1385 (previous revision, new revision, compare).

»
3 дня назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

This looks like cache misses. Accessing dp[u][v][j] for sequential j is cheap. Accessing dp[u][alle[j].u][j], dp[alle[j].v][v][j], dp[u][alle[j].v][j], dp[alle[j].u][v][j] is much more expensive since they're at various places in memory and those places can only be known at runtime, so no optimisation is possible.

  • »
    »
    3 дня назад, # ^ |
    Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

    An interesting thing happened that I thought I'd share. I had a submission, this link: [link] Instead of taking the minimum, I used || and saw no improvement in time. But then I used:

    #pragma GCC optimize("O3,unroll-loops")
    #pragma GCC target("avx2")
    

    and it Reduced the time by about 1.5 seconds. It's strange because it definitely couldn't vectorize, as you mentioned, it is determined by the inputs and can't optimize this.

    • »
      »
      »
      3 дня назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      You can check assembly with -fverbose-asm. O3 can sometimes make very aggressive, bad optimisations, and it often results in worse runtime from what I found out when I tried to use it all the time.

»
3 дня назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Auto comment: topic has been updated by amirhoseinfar1385 (previous revision, new revision, compare).