Блог пользователя aza-zil

Автор aza-zil, история, 5 лет назад, По-английски

Hey! In the last contest I was getting TLE on a problem, I thought my solution was not optimal, now I tried to solve it but using a loop to clear all elements of the array instead of "memset" -which I was using- and I got AC!! So which one is better ??

UPD: found that the TLE then using "memset" was because there was a mistake with the size of the array (which I don't know why I got TLE not RTE or something like that!!). But anyway, the time for the solution when using a "for" loop is like " 156 ms ", while it is " 1294 ms " when using "memset !!!

  • Проголосовать: нравится
  • 0
  • Проголосовать: не нравится

»
5 лет назад, # |
  Проголосовать: нравится +9 Проголосовать: не нравится

your loop resets values in range [0, n], but memset one resets the whole array which is very large.

»
5 лет назад, # |
  Проголосовать: нравится +3 Проголосовать: не нравится

Auto comment: topic has been updated by aza-zil (previous revision, new revision, compare).

»
5 лет назад, # |
  Проголосовать: нравится +21 Проголосовать: не нравится

Change

memset ( ctrN , 0 , sizeof ctrN ) ;
memset ( ctrM , 0 , sizeof ctrM ) ;

To:

memset ( ctrN , 0 , n * sizeof(ll) ) ;
memset ( ctrM , 0 , m * sizeof(ll) ) ;
  • »
    »
    5 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Is there any significant difference between i) memset ( ctrN , 0 , sizeof ctrN ) ; and ii) memset ( ctrN , 0 , n * sizeof(ll) ) ; // assuming n is size of the array ctrN

    • »
      »
      »
      5 лет назад, # ^ |
      Rev. 4   Проголосовать: нравится 0 Проголосовать: не нравится

      In ii) you need to know that variable n is a size of array and ll is a type of element. In i) you can know nothing. No difference in runtime because third parameter just number, equal to num of bytes we need to set.

      I'm always using std::fill(all(arr), 0); in code with #define all(x) std::begin(x), std::end(x). For this example, with Ofast runtime is same, but std::fill more generic and can be applied for std::vector with same runtime and for std::array with same runtime.

      For array on stack there is a bug on codeforces, because stack limited by 256 MB even for problems with memory limit 512 MB, 768 MB, 1024 MB, but vector allocates memory on heap and works well.

      Examples of bug with stack limit
»
5 лет назад, # |
Rev. 3   Проголосовать: нравится +3 Проголосовать: не нравится

FYI:

  • The time consumed by memset on $$$n$$$ bytes is $$$a_1 n + c_1$$$, where $$$c_1$$$ is the overhead of calling the function and maybe performing some additional checks and calling internal functions, which is kinda high, and $$$a_1$$$ is kinda low, since it's optimised as far as possible. One big optimisation is that on 64-bit architectures, you can set 64 bits (8 bytes) at once, or 4 bytes for 32-bit architectures.
  • The time consumed by a for loop is $$$a_2 n + c_2$$$, where $$$c_2$$$ is just the overhead of loop initialisation and $$$a_2$$$ is a normal for loop constant. This is already pretty fast, but if you're setting chars (just one byte at once), it will be several times slower. There's no significant difference if you're setting longs.
  • The third alternative, only usable when you're setting a fixed number of bytes, is a for loop inside an inline templated function, where $$$n$$$ and possibly the value you're setting (but this isn't very important) are known at compile time, so it can be massively optimised, unnecessary variables like loop index and movement between registers is eliminated, loop unrolling is done and there's no function call overhead. For $$$n$$$ divisible by $$$8$$$ (detected during compilation) on 64-bit machines, it should take something like $$$2+3n/8$$$ or $$$2+n/2$$$ processor cycles if the only function argument is the starting pointer. It's just copy starting pointer from register r1 to register r2, add n to r2, check if r1 == r2, if not: store the 64-bit value (e.g. 0) we're setting to position [r1] in memory, add 8 to r1, jump to "check".

Basically, use memset if you're calling it on big arrays a few times and you're setting the same value for each byte. Use a for loop if you're not setting the same value for each byte (don't use memset to set an int to 0xFF00FF00) or if you're using it very often on smaller arrays. Use a templated for loop if you can afford a fixed nice value of $$$n$$$.