generic_placeholder_name's blog

By generic_placeholder_name, history, 3 years ago, In English

In this contest, while I was browsing through FST submissions, I encountered many TLEs on test 8 in B. Here are some examples: 142848311, 142836654. The solutions appeared to be completely correct. What happened here?

The problem was that these contestants used memset before every test. If you didn't know, memset(a, 0, sizeof(a)) is linear in the size of the array a.

This is why these contestant TLEd. Every testcase, they did $$$1000000$$$ unnecessary operations.

The lesson here? Stop using memset to reset stuff when there are many testcases. Resetting by hand works just fine. Or know how memset actually works, so that you don't TLE.

| Write comment?
»
3 years ago, # |
  Vote: I like it +106 Vote: I do not like it

Just avoid global variables like the plague in problems with multiple test cases.

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +104 Vote: I do not like it

    Just avoid global variables like the plague in problems with multiple test cases.

    • »
      »
      »
      3 years ago, # ^ |
        Vote: I like it +302 Vote: I do not like it

      Just avoid global variables like the plague in problems with multiple test cases.

      • »
        »
        »
        »
        3 years ago, # ^ |
          Vote: I like it +17 Vote: I do not like it

        Problems with multiple test cases are actually very nice, they make pretests stronger and queues shorter. Also, if you write a local brute tester, you can run your code against a brute force faster.

    • »
      »
      »
      3 years ago, # ^ |
        Vote: I like it +68 Vote: I do not like it

      I am not going to write vectors of vectors of vectors of vectors to solve 4-dimensional DPs...

      • »
        »
        »
        »
        3 years ago, # ^ |
          Vote: I like it +11 Vote: I do not like it

        Btw, did you know that you can write vector dp(n, vector(m, 0))? Still $$$O(d)$$$ words "vector", but much better than $$$O(d^2)$$$. However, it is still slower than plain arrays because of multiple pointer jumps.

      • »
        »
        »
        »
        3 years ago, # ^ |
          Vote: I like it +79 Vote: I do not like it

        I am not going to write vectors of vectors of vectors of vectors to solve 4-dimensional DPs...

      • »
        »
        »
        »
        3 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Use something like this tensor class by ecnerwala. Now you can completely avoid global variables!

  • »
    »
    3 years ago, # ^ |
      Vote: I like it -6 Vote: I do not like it

    Covid deniers would definitely not follow your advice

»
3 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Thank you for the advice.Can you please let me know what you mean by resetting by hand?I mean like if i reset the values using a for loop then i am also going to take linear time.Is there a better way?

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +16 Vote: I do not like it

    Do something like this:

    for(int i = 0; i < n; i++) a[i] = 0; //reset a[i]
    

    This is linear in n and not in the array size.

    • »
      »
      »
      3 years ago, # ^ |
        Vote: I like it +6 Vote: I do not like it

      You can also memset for definite size Like memset(a,0,n * sizeof(int));

      • »
        »
        »
        »
        3 years ago, # ^ |
          Vote: I like it +8 Vote: I do not like it

        Or

        memset(a,0,n * sizeof(a[0]))
        

        for generic types of a[].

        • »
          »
          »
          »
          »
          3 years ago, # ^ |
            Vote: I like it 0 Vote: I do not like it
          memset(a,0,(n + 100) * sizeof(a[0]))
          

          And allocate the array of size MAXN + 500.

»
3 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Also, if you use functions like void done() to clear your static arrays, don't forget to call it every time after printing the answer. I made 3 submissions with this mistake in problem E. Don't do that.

»
3 years ago, # |
  Vote: I like it +29 Vote: I do not like it

Another simple mistake I often see is not having #define int long long in every submission

»
3 years ago, # |
  Vote: I like it +14 Vote: I do not like it

On a more serious note, memset is not the problem here, you can just use memset(a, 0, n * sizeof(int)). It's just having ANY global variables. >99% of problems with multiple test cases don't require having data that is shared between testcases and is supposed to be in some part cleaned. I always have a struct that encompasses everything I need for a particular test case and that is a hard barrier preventing me from getting any kind of WAs resulting from not cleaning the data between test cases. The downside is that I need to resize every container with appropriate size at the beginning of every testcase, but that is not bigger effort than cleaning it. One can of course forget about resizing it appropriately what is a mistake comparable to forgetting about cleaning it, but if you don't resize then you get runtime error locally on sample test, which is fixed in 10 seconds (cause with proper local flags you are explicitly told that the runtime was caused by this particular container being empty), while forgetting about cleaning it leads to WAs on pretests/systests (and it may not be obvious not cleaning the data is its reason)

»
3 years ago, # |
  Vote: I like it +56 Vote: I do not like it

Embrace lambdas so you never have to use a global variable ever again!

Example of a submission to a multitest graph problem with no global scope used.

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +15 Vote: I do not like it

    You can also get rid of that annoying auto &self parameter, by just typing the type explicitly rather than using auto:

    Spoiler
    • »
      »
      »
      3 years ago, # ^ |
        Vote: I like it +46 Vote: I do not like it

      So what you're using here is actually something different, std::function, and there's surprisingly a performance difference between the two. std::function tends to have significantly more overhead. Try running this simple example in custom invocation for example:

      Code
      Output

      As an extra note, I've benchmarked all the different styles of recursion before in the past (normal recursion, std::function, passing self, using Y combinators). Normal recursion is of course still the fastest, but passing self has surprisingly competitive performance in exchange for minimal additional code.

      • »
        »
        »
        »
        3 years ago, # ^ |
        Rev. 6   Vote: I like it 0 Vote: I do not like it

        A bit off topic,
        But why both the ways gets a runtime error for n=3e6 in custom invocation when I used C++ 17 (64 bit).
        Is there some optimization in C++ 20 for recursion ??
        UPD1 : Actually the self type recursion gives runtime error only for C++17 (64 bit) and works fine on other compilers.
        UPD2 : I think the reason may be that C++ 17 (64 bit) default recursion depth is set very low which doesn't allow this much depth in recursion. Then my question is how to increase recursion depth limit.
        UPD3 : I guess I will have to stop using C++17 (64 bit) because there is no way we can increase stack space for OJ, Sorry for pinging you.

        • »
          »
          »
          »
          »
          3 years ago, # ^ |
          Rev. 2   Vote: I like it 0 Vote: I do not like it

          Yeah sorry I don't know either. All I know is that passing self has less overhead, but I'm not very knowledgeable about the intricacies between different compilers.

          About increasing recursion depth, all C++ compilers on Codeforces, such as C++20 and C++17 compile with 256 mb of stack size, so there shouldn't be a significant difference between max recursion depth in the two.

          • »
            »
            »
            »
            »
            »
            3 years ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            Yeah that's what I didn't understand
            That even if the stack size for all compilers are same why C++ 17 (64 bit) give Runtime Error.

          • »
            »
            »
            »
            »
            »
            3 years ago, # ^ |
              Vote: I like it +40 Vote: I do not like it

            The reason behind less overhead with lambdas is the following:

            1. std::function is a type-erased implementation of function objects (and hence it is useless for competitive programming unless you want to do something like make vectors of "lambdas", which is impossible, so you need to use something like std::function instead), and it requires heap allocations and extraneous pointer indirections, which makes it slower.

            2. Lambdas are implemented as anonymous structs with operator(), i.e., they are callable (they're called functors). Using auto in the parameter list makes them a generic lambda, which is analogous to structs with templated operator(). When you call stuff like dfs(dfs, root, color), type deduction can be done at compile-time, and due to the lack of indirections, the compiler is able to optimize it much better.

      • »
        »
        »
        »
        3 years ago, # ^ |
          Vote: I like it +22 Vote: I do not like it

        Ok, didn't know about this, thanks! I have two questions:

        1. What is behind auto then, if it's not std::function?

        2. Does this higher overhead of std::function really matter in practice? (i.e. can it actually cause my solution to get TLE in a contest problem?)

        • »
          »
          »
          »
          »
          3 years ago, # ^ |
            Vote: I like it 0 Vote: I do not like it
          1. To my knowledge, it's a unique unnamed class type so you can only declare it with auto.
          2. Probably not, but I don't like writing function parameters in both function and lambda.
        • »
          »
          »
          »
          »
          3 years ago, # ^ |
            Vote: I like it +10 Vote: I do not like it

          I guess it might matter in problems like this where they ask you to run DFS up to $$$10^6$$$ recursion depth. Though it shouldn't be a huge difference as long as problem authors are reasonable with time limits.

        • »
          »
          »
          »
          »
          3 years ago, # ^ |
            Vote: I like it +21 Vote: I do not like it

          As far as overhead is concerned, using std::function got me TLE once, so I stay away from it as much as possible. Similar things happen when you use std::function in segment tree implementations and so on.

          • »
            »
            »
            »
            »
            »
            3 years ago, # ^ |
              Vote: I like it +12 Vote: I do not like it

            So is it safe to use it for simple stuff that only makes $$$O(n)$$$ recursive calls?

            • »
              »
              »
              »
              »
              »
              »
              3 years ago, # ^ |
                Vote: I like it +10 Vote: I do not like it

              Personally speaking, I don't use std::function for competitive programming purposes at all, but your mileage may vary. If $$$n$$$ is not too large, $$$O(n)$$$ recursive calls might be fine.

              • »
                »
                »
                »
                »
                »
                »
                »
                3 years ago, # ^ |
                  Vote: I like it -10 Vote: I do not like it

                I used to use std::function, but because of some reasons (like not being able to set default arguments), I was told to use y_combinator (Although we need to have some extra code to use it). What are your views on it? Should we use it?

                • »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  3 years ago, # ^ |
                    Vote: I like it +1 Vote: I do not like it

                  y_combinator is just a wrapper for another way of getting something similar to the "recursive" lambda trick, so it should be fine. In most cases it will be as fast as the recursive lambdas (the only issue I know of is that sometimes those functions don't get inlined, but I have never heard of anyone getting TLE using that).

»
3 years ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

I think memset is faster than the iterative assign: memset(a, 0, n * sizeof *a);

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +55 Vote: I do not like it

    Yes but if that's the difference between AC and TLE and your whole solution is O(N) then you can just curse the problem setter and move on :)

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +37 Vote: I do not like it

    Using std::fill is much better than using memset btw.

  • »
    »
    3 years ago, # ^ |
      Vote: I like it +23 Vote: I do not like it

    The compiler will usually optimize the iterative assign to a memset anyway, unless there is some reason it isn't correct to do so. So it really doesn't matter.