creativegiant's blog

By creativegiant, history, 22 months ago, In English

Hey,

I was just up solving Codeforces Round #837 (Div. 2). And, despite optimizing the solution to problem C to the greatest extent possible in terms of time complexity, my java code still gives TLE on test case 3, whereas the same implementation with the same logic works perfectly in C++.

My solution: here C++ solution: here

Even Magenta Cobra is unable to solve the problem in java: his solution

The worst part is that these scenarios are not uncommon; in fact, I experience them on a regular basis. I would greatly appreciate it if you could advise me on a workaround for this.

Be epic, Ayush

  • Vote: I like it
  • +12
  • Vote: I do not like it

| Write comment?
»
22 months ago, # |
  Vote: I like it 0 Vote: I do not like it

If you think this is common for Java users, just wait until you hear about it from python users. This is a very common problem in competitive programming. In my countries NOI (singapore), after the second question or so (out of 5), the problem setters include the fact, within the problem statement, that THEY THEMSELVES have no solution in JAVA OR PYTHON that can solve the problem within the time constraints. This basically means that if someone uses Java or python in Singapore, it is impossible for them to solve the last 3 questions. In order to use Java, depending on the problem, it can be either easy, hard, or impossible to optimise the constant factor of ur solution enough for it to be accepted. And really, i think its hard to accept this, but the only way to really avoid problems like this is to use another language like c++. But imo, i learned c++ in less than one weeks so it is a very easy and intuitive language to learn. (at least at a basic level.

  • »
    »
    22 months ago, # ^ |
      Vote: I like it +8 Vote: I do not like it

    Yeah, I think I will have to learn C++ as well if I want to do competitive programming long-term.

»
22 months ago, # |
  Vote: I like it +21 Vote: I do not like it

Tangential.

Here is a mostly direct translation of the C++ solution you mention, 184752199, into D language: 186740798. And it already takes 1.7 seconds instead of 2.8 seconds (probably, unordered_set is to blame in C++).

So, a tongue-in-cheek advice: if you want to use a language other than C++, use D, not Java :) !

  • »
    »
    22 months ago, # ^ |
      Vote: I like it +11 Vote: I do not like it

    My advice to the OP would be to use the right tool for the job. If you have a hammer in your hand, everything looks like a nail, but at least you don't want to hammer a screw, right? Same story. In my case, Ruby is my secondary language. Ruby is typically a godly language in number theory, especially thanks to its module named "Prime". However, it is not a very fast language. I would rate it as on par with (or sometimes slower than) Python. Truth is, every language has its pros and cons. (Yes, even Java has its pros. For anyone who's curious, ever heard of java.awt.geom?) It's every coder's duty to select the right tool for the job, and the experience would be optimal if and only if you use the right tool.

    • »
      »
      »
      22 months ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Well said.

      In all seriousness, learning another language also helps to code in one's "main" language as well. Some neat ways to do stuff, best practices, other programming paradigms — they are sometimes applicable in the "main" language.

      My comment was about a funny and rare case when something (D this time) is faster than the coveted C++. And I saw it rarely happen with Java and Python as well.

      • »
        »
        »
        »
        22 months ago, # ^ |
          Vote: I like it +14 Vote: I do not like it

        This particular C++ solution is resistant to Hash DoS because of a custom hash function and your D solution is vulnerable. So it's not apples to apples comparison. Associative arrays in D can be hacked:

        • »
          »
          »
          »
          »
          22 months ago, # ^ |
          Rev. 2   Vote: I like it 0 Vote: I do not like it

          Sure thing.

          Again, my point was not to produce the best solution. Rather, I translated the actual solution code (not the surrounding hacks) almost line-by-line. My hope was that it would still be mostly readable by the one who understands the original, and works in a similar fashion. Code being faster is actually an unexpected byproduct.

          To take it to an extreme. Suppose I produce a solution in the language which looks like a bunch of Unicode emojis, but it's 0.2 seconds faster yet. That won't help the current discussion in any way, correct?

          • »
            »
            »
            »
            »
            »
            22 months ago, # ^ |
              Vote: I like it -8 Vote: I do not like it

            My point is that your translation is not exact. You are loading the array as 32-bit ints instead of 64-bit ints and your hash is not hardened against hacks. It's a line-by-line conversion, but not exact conversion. These minor differences do matter.

            BTW, your submission seems to be nevertheless difficult or impossible to hack
    • »
      »
      »
      22 months ago, # ^ |
        Vote: I like it +3 Vote: I do not like it

      Actually, the thing that worries me the most about Java is not its speed, but rather the unpredictable behavior of TLE, such as how, even in the most complex scenes, it will typically work fine, but that, 1 out of 20 times, it will fail and you won't even realize that Java, and not you, screwed up here. This gives me restless dreams of what would happen if this happened during a contest, I will be screwed :')

  • »
    »
    22 months ago, # ^ |
      Vote: I like it +19 Vote: I do not like it

    Rust does the same in less than 1.4 seconds: 186751469.

    Use Rust, not D :) !

    • »
      »
      »
      22 months ago, # ^ |
        Vote: I like it +8 Vote: I do not like it

      For a really fair comparison D language needs a 64-bit LDC compiler, which is currently missing on the Codeforces platform. That would make it 64-bit vs. 64-bit and LLVM backend vs. LLVM backend.

      The currently available DMD compiler is famous for its very fast compilation time, but it can't generate fast code. Using DMD instead of LDC or GDC is similar to using TinyCC instead of Clang or GCC. I don't think that many people on Codeforces would be happy to have TinyCC as the only available option for C language. Even though TinyCC is famous for various compilation speed records, such as booting the Linux kernel from sources in less than 15 seconds.

      • »
        »
        »
        »
        22 months ago, # ^ |
          Vote: I like it +8 Vote: I do not like it

        In a really fair comparison C++ would probably outperform both D and Rust, while being still almost as readable, although, with Rust it is not 100% clear. Another point is that one has to have about 10 years of experience to be able to write clear, reliable and extendable C++-code without any assistances that other languages may provide.

        Also, it was mostly a joke.

        • »
          »
          »
          »
          »
          22 months ago, # ^ |
            Vote: I like it +13 Vote: I do not like it

          When using exactly the same code generation backend and implementing the same algorithm, the performance of C++, Rust and D is expected to be roughly the same. They all are top tier programming language choices. Some performance differences are surely possible, but the cause of such differences can be usually identified and corrected in each particular case. Or at least explained.

          For example, ABC279_F submissions sorted by execution time show that D code compiled by LDC 1.20.1 (37 ms) is near the top. I just took one of the faster C++ submissions and converted it to D. Something like this is also possible with Rust if anyone bothers to do this. Similar comparisons can be also done for the other problems on AtCoder.

          But what really matters is the syntax of the language itself, safety, the standard library, etc. And of course, the availability on various competitive programming platforms. C++ is available everywhere and is also much less likely to be misconfigured.

          • »
            »
            »
            »
            »
            »
            22 months ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            When using exactly the same code generation backend and implementing the same algorithm, the performance of C++, Rust and D is expected to be roughly the same.

            It is not necessarily true. Firstly, it is not true because of RAII vs GC. They just shine in different cases and RAII usually fits competitive programming purposes a bit better. That is a reason why Java-TL was a thing for such a long time.

            And even with C++ vs Rust it is not so clear. I know a problem where making one of parameters template makes the code run six times faster. It is just not possible to do in Rust, because it simply does not have integer templates (or generics, for true crustaceans). And even besides that there are differences in how compilers can optimize the code. I don't want to go in-depth on theoretical foundations of those differences, I rather show a practical example of unexpected speed loss with exactly the same algorithm written in two languages (for which I still don't know the reason).

            • »
              »
              »
              »
              »
              »
              »
              22 months ago, # ^ |
                Vote: I like it +8 Vote: I do not like it

              D language supports RAII and can work without GC too, but this is not necessary. GC is generally disliked in interactive applications, such as games. That's because GC tends to postpone memory deallocations and does them as big batches whenever it decides to collect garbage (causing occasional pauses at unpredictable times). But for competitive programming purposes this doesn't really matter (unless the MLE limit is very tight).

              Java/Kotlin/PyPy are slow primarily because of the startup/warmup overhead of JIT. JIT generates native code at runtime and this costs precious CPU cycles. If an application is very short lived (such as competitive programming solutions), then a significant fraction of time is spent on generating the native code (from the sources or from bytecode) rather than actually executing it. If the competition rules were changed to check the combined compilation time + run time against the time limit, then GNU C++ wouldn't be the best choice anymore (losing to PyPy, LuaJIT, TinyCC and DMD in many cases).

              As for your practical example. It just confirms what I mentioned before. C++, Rust and D code can be equally fast when doing the same task. There's a reason for any observed big difference and most of such differences can be corrected/workarounded. Profiling tools can be used to get some stats for your slow vs. fast Rust code and this works better than just making random guesses. You mentioned the "unexpected 60% slowdown which is not present in C++". But it is possible to observe big slowdowns/speedups in C++ solutions too by tweaking really minor things or using a different compiler version. There are many blogs on Codeforces about this.

              • »
                »
                »
                »
                »
                »
                »
                »
                22 months ago, # ^ |
                  Vote: I like it +8 Vote: I do not like it

                D language supports RAII and can work without GC too

                Yes, but then, AFAIK, you have to give up the standard library as well, so it does not really count. Also, you can add GC in C++ too, but who in their full sanity would do that?

                As for your practical example. It just confirms what I mentioned before. C++, Rust and D code can be equally fast when doing the same task.

                My point was not that they can't, but that it does not always happen (also, I would argue that in the example with templates they can't). And although you can question my choice of profiling tool (which was unix time, btw), the effect persists even in your experiments and it gives almost 30% slowdown, that I personally still can't explain (and do you?). The problem is not that it cannot be corrected, but that it has no apparent reason. In C++ you definitely can achieve huge slowdowns/speedups by minor tweaking your code, much bigger than 60%, but slowdowns without visible reasons do not really happen too much. At least I personally never seen an example that I can't explain after reading the code and the assembly.

            • »
              »
              »
              »
              »
              »
              »
              22 months ago, # ^ |
                Vote: I like it +1 Vote: I do not like it

              On the contrary, my impression is that the horrors of GC are greatly exaggerated. For people fearing it too much, D's GC collection cycles can be disabled altogether at program start, making it essentially heap allocation without freeing. But 99%+ of the time, your competitive programming solution won't be able to trigger GC collection anyway.

  • »
    »
    22 months ago, # ^ |
      Vote: I like it +8 Vote: I do not like it

    Ugh, just when I thought I was making progress with C++ tutorials, you made me realize I have to learn D too. When will the learning ever end?

    • »
      »
      »
      22 months ago, # ^ |
        Vote: I like it -10 Vote: I do not like it

      You should better understand his actual purpose in saying that, he doesn't really mean you must learn D for CP. Instead, he means that, sometimes, very great languages are outperformed by mostly unpopular choices.

»
22 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Not related to the topic of the blog but related to the problem mentioned in the blog.

Why is 185759665 this code passing and 185759576 this code giving TLE verdict.

  • »
    »
    22 months ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    In this code:

    if(m[sieve[j]]==2){
    cout<<"YES\n";
    return;
    }
    

    You are accessing m[sieve[j]] many times which takes much more time than simple O(1) operations.

    • »
      »
      »
      22 months ago, # ^ |
        Vote: I like it +8 Vote: I do not like it

      Thanks. Got it. If I move this check part in the if(a[i]%sieve[j]==0) block then it passes.

»
22 months ago, # |
  Vote: I like it 0 Vote: I do not like it

There is a technical term for this class of problems. We call it financial illiquidity.

»
22 months ago, # |
Rev. 3   Vote: I like it 0 Vote: I do not like it

Don't use Java unless you are Petr or Egor or uwi or eatmore or SecondThread.(Just kidding)

»
22 months ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

You loop over too many numbers unnecessarily and in the wrong order leading to bad cache performance. You also unnecessarily use long instead of int in some places.

Here's a Java solution that passes test case #3: https://codeforces.net/contest/1771/submission/186823679 But it gets WA on test case 33. Finding that bug is left as an exercise for the reader. ;)

Edit: And here's the same Java solution that gets AC https://codeforces.net/contest/1771/submission/186824168.

  • »
    »
    22 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Thanks my man really appreciate it, I will try to improve based on those.

    Can you tell me the logic behind using the prime array size equal to 3041?

    • »
      »
      »
      22 months ago, # ^ |
        Vote: I like it +9 Vote: I do not like it

      Oh yeah I just ran the prime sieve once for $$$\sqrt{10^9}$$$ and noted that there were 3041 primes, and then I fine tuned the size of the array to be as cache friendly as possible. :)

      The most common operation in the solution is looping over all primes (one memory access per prime) and doing the modulo check for each number. So we want that list of primes to be as cache friendly as possible (i.e. we want it to fit in cache memory).

      Some possible optimizations include:

      • Do a Miller-Rabin prime check on the number. If the answer is No, then there can at most be 3401 composite numbers, which means the remaining $$$10^5 - 3401$$$ numbers are all primes. And if the number is a prime we can skip the loop.
      • The above doesn't help us much though if $$$t = 10^5 / 3401$$$ and each test case is just a bunch of composite numbers. In that case, your best optimization bet is to just use Pollard-Rho.