polosatic's blog

By polosatic, history, 2 weeks ago, translation, In English

Recenlty I was solving this problem. It was obvious strongly connected components. I wrote them and got TL14: 277474774.

I knew that push_backing to 1e6 vectors is slow due to the dynamic reallocation. I know the only way to handle it: precompute the size of each vector, and reinterpret vector<vector> as a static array. It was awful to implement, but I did it and got TL38: 277476691

Then I tried many things like removing the vector of Euler tour, adding pragmas, but have not passes the test 38. So, there are 2 questions for experienced optimizers:

  1. Is there another, simpler way to speed up vector push_backs, without precomputing each vector's size?

  2. Why does my final solution get TL, and how to speed up it? Constraints up to $$$10^6$$$ are enough to pass in this problem I think.

UPD: #define vector basic_string is a useful optimization. EDIT: do not use this define if you use vector<vector<int>> vec instead of vector<int> vec[N].

UPD2: found this blog. There are some explanations.

  • Vote: I like it
  • +46
  • Vote: I do not like it

»
2 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

If you know beforehand approximately how much space you need, you can probably call std::vector::reserve

  • »
    »
    2 weeks ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Much time ago I have the same issue and tried vector.reserve(), but it also gave me TL. But that time precomputing of all sizes helped my solution to pass.

    • »
      »
      »
      2 weeks ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Well, vectors in general is slower than static array, I'm not very sure about the specifics of it but seems like it's memory allocation mechanics are more complicated

»
2 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

Probably not the most useful idea, however does emplace_back work?

  • »
    »
    2 weeks ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I tried it when I met this issue in the first time. It also did not work, same as vector.reserve().

  • »
    »
    2 weeks ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Despite the way of constructing an object, push_back actually has the same implementation as emplace_back.

»
2 weeks ago, # |
  Vote: I like it +39 Vote: I do not like it

Mushrooms function works in sqrt(x)/2

For the vectors, the fastest way is to use something similar to bucket sort, which is to merge all vectors into one array.

  • »
    »
    2 weeks ago, # ^ |
    Rev. 2   Vote: I like it +18 Vote: I do not like it

    Wow, my bad. I did k=sqrt(x*2) and got AC.

    Vectors: 278599566 1625ms

    Array: 278600523 921ms

    Now I know that there is no other way to speed up push_backs. Thank you.

»
2 weeks ago, # |
  Vote: I like it +16 Vote: I do not like it

My vector filled copy paste code for SCCs ACd in around half of the TL, so that probably isn't the cause unless there's some compiler specific wizardry going on. I'm pretty sure the comment above hit the spot: the mushrooms function is the cause.

»
2 weeks ago, # |
  Vote: I like it +34 Vote: I do not like it

Use basic_string instead of vector. They are exactly the same, but basic_string is faster. (though the optimization isn't significant)

  • »
    »
    2 weeks ago, # ^ |
    Rev. 3   Vote: I like it +10 Vote: I do not like it

    Wow! This is the thing I wanted to know. I added #define vector basic_string and it runs in 1186ms instead of 1625! It is 37% faster with just one define! I have never heard it! Before rewriting vector to one array, I will first try this optimization.

    278616547

    UPD: do not use this define if you use vector<vector<int>> vec instead of vector<int> vec[N].

    • »
      »
      »
      2 weeks ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      27%

      • »
        »
        »
        »
        2 weeks ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        1625/1186 = 1.37015177065767284992 1186/1625 = 0.72984615384615384615

        • »
          »
          »
          »
          »
          2 weeks ago, # ^ |
            Vote: I like it +1 Vote: I do not like it

          Yes, but still 27%, not 37%.

          • »
            »
            »
            »
            »
            »
            2 weeks ago, # ^ |
            Rev. 4   Vote: I like it +36 Vote: I do not like it

            "A is $$$p\%$$$ faster than B" means "the speed of A is $$$p\%$$$ greater than one of B", or, equivalently, $$$v_A = \left(1 + \frac{p}{100}\right)v_B$$$ where $$$v_A$$$ is the speed of A and $$$v_B$$$ is the speed of B. The most natural way to define the speed of a solution is, given it's repeated several times, say that it is equal to its execution frequency, thus $$$v_A = \frac{1}{t_A}$$$ and $$$v_B = \frac{1}{t_B}$$$. So we get $$$\frac{1}{t_A} = \left(1 + \frac{p}{100}\right)\frac{1}{t_B}$$$ and $$$p = \left(\frac{t_B}{t_A} - 1\right) \cdot 100$$$, so $$$p = \frac{1625\text{ ms}}{1186\text{ ms}} \approx 37$$$ is the correct approach.

            Taking $$$p = \left(1 - \frac{t_A}{t_B}\right) \cdot 100$$$ does not really make sense here, because, for instance, when we say that A is $$$100\%$$$ faster than B, we mean that A is twice as fast as B rather than that A is momentary.

            One can say that $$$1625\text{ ms}$$$ solution is $$$27\%$$$ slower than $$$1186\text{ ms}$$$ solution, it is fine. And if someone says that B is $$$100\%$$$ slower than A, they do mean that A is momentary (or B is infinitely long).

»
2 weeks ago, # |
  Vote: I like it +9 Vote: I do not like it

Use this.

pAAst3Q.png

  • »
    »
    2 weeks ago, # ^ |
      Vote: I like it +18 Vote: I do not like it

    I used similar approach in the submission got TL38.

»
2 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

I suggest you also run a profiler on your solution to know exactly what you can optimize. Sometimes things take longer than you expect, even when asymptotics is ok.

I used samply several times, for example.

»
2 weeks ago, # |
  Vote: I like it +7 Vote: I do not like it

push_back isn't slow, push_back is exactly as fast as intended. Reallocating memory according to your design is slow. You needed to redesign it, which is the standard solution to this problem.

»
2 weeks ago, # |
  Vote: I like it +8 Vote: I do not like it

Allocation-friendly way to store graphs: https://codeforces.net/blog/entry/5195
No need for 2d-vectors.

»
2 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

You can check out my submission here, I guess ? 278711068