djm03178's blog

By djm03178, 3 weeks ago, In English

I experienced something weird while solving 2050F - Maximum modulo equality. Basically, I tried solving it by decomposing the array into $$$\mathcal{O}(\sqrt{n})$$$ blocks. I submitted several of them each using difference size of blocks, and they passed in 2s during the contest. Surprisingly, in system tests, all of my submissions got TLE on my hack test that was used to hack other similar solutions that worked much slower in the original tests.

It felt really weird, because the most basic gcd implementation we learn is something like this:

int G(int a, int b)
{
    return b ? G(b, a % b) : a;
}

Although a single call for this function takes $$$\mathcal{O(\log{x}})$$$ time where $$$x$$$ is the value of the input, when we repeatedly apply this fuction $$$k$$$ times where either $$$a$$$ or $$$b$$$ is the result of a previous call on this function, it should only take $$$\mathcal{O}(k+\log{x})$$$ because the depth of the recursion gets deeper only when the result gets smaller. Thus, my solution should only take $$$\mathcal{O}((q+\sqrt{n})(\sqrt{n}+\log{x}))$$$ time, which should be fast enough to run in 5s.

However, the result shows that it's not the case. Check out these three versions (note that I used using ll = int; so the divisions aren't that slow):

The first one uses std::gcd, the second one uses __gcd which is a gcc extension, and the third one uses the G function I wrote above. The first one got TLE, while the other two took only around 2 seconds. So, why did this happen?

I asked about this to some people, and the conclusion was that std::gcd is maybe using binary gcd algorithm, which still takes time proportional to the number of significant bits of the input, even if the result remains large. For example, when res = 0 initially, repeatedly calling res = gcd(res, (int)1e9) will keep taking time proportional to $$$\log{10^9}$$$ each time, while the traditional Euclidian algorithm will only call itself recursively once each time. Conclusively, my solution with std::gcd was likely to be $$$\mathcal{O}((q+\sqrt{n})(\sqrt{n}\log{x}))$$$, which is too slow indeed.

qwerasdfzxcl suggested me using b = gcd(a % b, b) instead of b = gcd(a, b), and it does work: 295441081, and it's a lot faster than the other versions! I was told that it will bound at $$$\mathcal{O}(k+\log^2{x})$$$, but I don't quite understand this well so if anyone has idea on this, please do comment. and the theory is that assuming gcd returns $$$b$$$ immediately when $$$a\%b=0$$$, any other case will be that $$$b$$$ is at least halved after this gcd call and this happens only $$$\mathcal{O}(\log{x})$$$ times.

Lessons learned: Do not naively use std::gcd when we need to repeatedly perform gcd on its result. Alternatively, using C++17 works, presumbaly because it used Euclidan algorithm back then. C++23 doesn't work.

  • Vote: I like it
  • +198
  • Vote: I do not like it

»
3 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

Oh

»
2 weeks ago, # |
  Vote: I like it +9 Vote: I do not like it

Pragmas also help: 295462909.

»
2 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

useful knowledge!

»
2 weeks ago, # |
Rev. 4   Vote: I like it +36 Vote: I do not like it

This is insane... It is the first time I see the new C++ version being slower than the old one :) And from the case of gcd(0, 1e9) it looks like it's not just a random competitive programming quirk but a real worsening of the time complexity. I actually find it rather odd. One can implement binary gcd to work in $$$O(\log \min(a, b))$$$ time but from your example apparently it doesn't do that. What we use to bound the time complexity of the repeated application of gcd is that the time complexity of the Euclid's algorithm is $$$O(\log (\min(a, b) / \gcd(a, b)))$$$. This I agree is an obscure time complexity that only competitive programmers could care about. But the difference between $$$O(\log \min(a, b))$$$ and $$$O(\log \max(a, b))$$$ is actually significant...

I would also like to point out that __gcd(x, 0) is undefined behaviour, so this problem is indeed annoying. I guess the only good solution is to always just use your own gcd function. But that's just sad...

»
2 weeks ago, # |
Rev. 2   Vote: I like it +8 Vote: I do not like it

qwerasdfzxcl's trick is a co-called hybrid approach: reference. Based on the benchmarks form the cited blog, it is the fastest approach.

»
2 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

Why did u use sqrt decomposition ?

Isn't something like segment tree or sparse table much faster ?

  • »
    »
    2 weeks ago, # ^ |
    Rev. 2   Vote: I like it +6 Vote: I do not like it

    If you ask me why... because of my skill issue? :)

    The segment tree solution just didn't come to me for some reason at that time, and later in the contest I found that segment tree would be faster so I re-wrote the solution with segment tree, so I was actually able to keep AK despite that 6 of my submissions FST'ed.

»
2 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

It's so amazing but how can we use it usefully?