Блог пользователя generic_placeholder_name

Автор generic_placeholder_name, история, 3 года назад, По-английски

In this contest, while I was browsing through FST submissions, I encountered many TLEs on test 8 in B. Here are some examples: 142848311, 142836654. The solutions appeared to be completely correct. What happened here?

The problem was that these contestants used memset before every test. If you didn't know, memset(a, 0, sizeof(a)) is linear in the size of the array a.

This is why these contestant TLEd. Every testcase, they did $$$1000000$$$ unnecessary operations.

The lesson here? Stop using memset to reset stuff when there are many testcases. Resetting by hand works just fine. Or know how memset actually works, so that you don't TLE.

  • Проголосовать: нравится
  • +197
  • Проголосовать: не нравится

»
3 года назад, # |
  Проголосовать: нравится +106 Проголосовать: не нравится

Just avoid global variables like the plague in problems with multiple test cases.

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится +104 Проголосовать: не нравится

    Just avoid global variables like the plague in problems with multiple test cases.

    • »
      »
      »
      3 года назад, # ^ |
        Проголосовать: нравится +302 Проголосовать: не нравится

      Just avoid global variables like the plague in problems with multiple test cases.

      • »
        »
        »
        »
        3 года назад, # ^ |
          Проголосовать: нравится +17 Проголосовать: не нравится

        Problems with multiple test cases are actually very nice, they make pretests stronger and queues shorter. Also, if you write a local brute tester, you can run your code against a brute force faster.

    • »
      »
      »
      3 года назад, # ^ |
        Проголосовать: нравится +68 Проголосовать: не нравится

      I am not going to write vectors of vectors of vectors of vectors to solve 4-dimensional DPs...

      • »
        »
        »
        »
        3 года назад, # ^ |
          Проголосовать: нравится +11 Проголосовать: не нравится

        Btw, did you know that you can write vector dp(n, vector(m, 0))? Still $$$O(d)$$$ words "vector", but much better than $$$O(d^2)$$$. However, it is still slower than plain arrays because of multiple pointer jumps.

      • »
        »
        »
        »
        3 года назад, # ^ |
          Проголосовать: нравится +79 Проголосовать: не нравится

        I am not going to write vectors of vectors of vectors of vectors to solve 4-dimensional DPs...

      • »
        »
        »
        »
        3 года назад, # ^ |
          Проголосовать: нравится 0 Проголосовать: не нравится

        Use something like this tensor class by ecnerwala. Now you can completely avoid global variables!

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится -6 Проголосовать: не нравится

    Covid deniers would definitely not follow your advice

»
3 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Thank you for the advice.Can you please let me know what you mean by resetting by hand?I mean like if i reset the values using a for loop then i am also going to take linear time.Is there a better way?

»
3 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Also, if you use functions like void done() to clear your static arrays, don't forget to call it every time after printing the answer. I made 3 submissions with this mistake in problem E. Don't do that.

»
3 года назад, # |
  Проголосовать: нравится +29 Проголосовать: не нравится

Another simple mistake I often see is not having #define int long long in every submission

»
3 года назад, # |
  Проголосовать: нравится +14 Проголосовать: не нравится

On a more serious note, memset is not the problem here, you can just use memset(a, 0, n * sizeof(int)). It's just having ANY global variables. >99% of problems with multiple test cases don't require having data that is shared between testcases and is supposed to be in some part cleaned. I always have a struct that encompasses everything I need for a particular test case and that is a hard barrier preventing me from getting any kind of WAs resulting from not cleaning the data between test cases. The downside is that I need to resize every container with appropriate size at the beginning of every testcase, but that is not bigger effort than cleaning it. One can of course forget about resizing it appropriately what is a mistake comparable to forgetting about cleaning it, but if you don't resize then you get runtime error locally on sample test, which is fixed in 10 seconds (cause with proper local flags you are explicitly told that the runtime was caused by this particular container being empty), while forgetting about cleaning it leads to WAs on pretests/systests (and it may not be obvious not cleaning the data is its reason)

»
3 года назад, # |
  Проголосовать: нравится +56 Проголосовать: не нравится

Embrace lambdas so you never have to use a global variable ever again!

Example of a submission to a multitest graph problem with no global scope used.

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится +15 Проголосовать: не нравится

    You can also get rid of that annoying auto &self parameter, by just typing the type explicitly rather than using auto:

    Spoiler
    • »
      »
      »
      3 года назад, # ^ |
        Проголосовать: нравится +46 Проголосовать: не нравится

      So what you're using here is actually something different, std::function, and there's surprisingly a performance difference between the two. std::function tends to have significantly more overhead. Try running this simple example in custom invocation for example:

      Code
      Output

      As an extra note, I've benchmarked all the different styles of recursion before in the past (normal recursion, std::function, passing self, using Y combinators). Normal recursion is of course still the fastest, but passing self has surprisingly competitive performance in exchange for minimal additional code.

      • »
        »
        »
        »
        3 года назад, # ^ |
        Rev. 6   Проголосовать: нравится 0 Проголосовать: не нравится

        A bit off topic,
        But why both the ways gets a runtime error for n=3e6 in custom invocation when I used C++ 17 (64 bit).
        Is there some optimization in C++ 20 for recursion ??
        UPD1 : Actually the self type recursion gives runtime error only for C++17 (64 bit) and works fine on other compilers.
        UPD2 : I think the reason may be that C++ 17 (64 bit) default recursion depth is set very low which doesn't allow this much depth in recursion. Then my question is how to increase recursion depth limit.
        UPD3 : I guess I will have to stop using C++17 (64 bit) because there is no way we can increase stack space for OJ, Sorry for pinging you.

        • »
          »
          »
          »
          »
          3 года назад, # ^ |
          Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

          Yeah sorry I don't know either. All I know is that passing self has less overhead, but I'm not very knowledgeable about the intricacies between different compilers.

          About increasing recursion depth, all C++ compilers on Codeforces, such as C++20 and C++17 compile with 256 mb of stack size, so there shouldn't be a significant difference between max recursion depth in the two.

          • »
            »
            »
            »
            »
            »
            3 года назад, # ^ |
              Проголосовать: нравится 0 Проголосовать: не нравится

            Yeah that's what I didn't understand
            That even if the stack size for all compilers are same why C++ 17 (64 bit) give Runtime Error.

          • »
            »
            »
            »
            »
            »
            3 года назад, # ^ |
              Проголосовать: нравится +40 Проголосовать: не нравится

            The reason behind less overhead with lambdas is the following:

            1. std::function is a type-erased implementation of function objects (and hence it is useless for competitive programming unless you want to do something like make vectors of "lambdas", which is impossible, so you need to use something like std::function instead), and it requires heap allocations and extraneous pointer indirections, which makes it slower.

            2. Lambdas are implemented as anonymous structs with operator(), i.e., they are callable (they're called functors). Using auto in the parameter list makes them a generic lambda, which is analogous to structs with templated operator(). When you call stuff like dfs(dfs, root, color), type deduction can be done at compile-time, and due to the lack of indirections, the compiler is able to optimize it much better.

      • »
        »
        »
        »
        3 года назад, # ^ |
          Проголосовать: нравится +22 Проголосовать: не нравится

        Ok, didn't know about this, thanks! I have two questions:

        1. What is behind auto then, if it's not std::function?

        2. Does this higher overhead of std::function really matter in practice? (i.e. can it actually cause my solution to get TLE in a contest problem?)

        • »
          »
          »
          »
          »
          3 года назад, # ^ |
            Проголосовать: нравится 0 Проголосовать: не нравится
          1. To my knowledge, it's a unique unnamed class type so you can only declare it with auto.
          2. Probably not, but I don't like writing function parameters in both function and lambda.
        • »
          »
          »
          »
          »
          3 года назад, # ^ |
            Проголосовать: нравится +10 Проголосовать: не нравится

          I guess it might matter in problems like this where they ask you to run DFS up to $$$10^6$$$ recursion depth. Though it shouldn't be a huge difference as long as problem authors are reasonable with time limits.

        • »
          »
          »
          »
          »
          3 года назад, # ^ |
            Проголосовать: нравится +21 Проголосовать: не нравится

          As far as overhead is concerned, using std::function got me TLE once, so I stay away from it as much as possible. Similar things happen when you use std::function in segment tree implementations and so on.

          • »
            »
            »
            »
            »
            »
            3 года назад, # ^ |
              Проголосовать: нравится +12 Проголосовать: не нравится

            So is it safe to use it for simple stuff that only makes $$$O(n)$$$ recursive calls?

            • »
              »
              »
              »
              »
              »
              »
              3 года назад, # ^ |
                Проголосовать: нравится +10 Проголосовать: не нравится

              Personally speaking, I don't use std::function for competitive programming purposes at all, but your mileage may vary. If $$$n$$$ is not too large, $$$O(n)$$$ recursive calls might be fine.

              • »
                »
                »
                »
                »
                »
                »
                »
                3 года назад, # ^ |
                  Проголосовать: нравится -10 Проголосовать: не нравится

                I used to use std::function, but because of some reasons (like not being able to set default arguments), I was told to use y_combinator (Although we need to have some extra code to use it). What are your views on it? Should we use it?

                • »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  3 года назад, # ^ |
                    Проголосовать: нравится +1 Проголосовать: не нравится

                  y_combinator is just a wrapper for another way of getting something similar to the "recursive" lambda trick, so it should be fine. In most cases it will be as fast as the recursive lambdas (the only issue I know of is that sometimes those functions don't get inlined, but I have never heard of anyone getting TLE using that).

»
3 года назад, # |
Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

I think memset is faster than the iterative assign: memset(a, 0, n * sizeof *a);

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится +55 Проголосовать: не нравится

    Yes but if that's the difference between AC and TLE and your whole solution is O(N) then you can just curse the problem setter and move on :)

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится +37 Проголосовать: не нравится

    Using std::fill is much better than using memset btw.

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится +23 Проголосовать: не нравится

    The compiler will usually optimize the iterative assign to a memset anyway, unless there is some reason it isn't correct to do so. So it really doesn't matter.