Пожалуйста, прочтите новое правило об ограничении использования AI-инструментов. ×

Блог пользователя maomao90

Автор maomao90, 5 часов назад, По-английски

Disclaimer: This is a rant about Meta Hacker Cup and may not contain any useful information.

Meta Hacker Cup is one of the biggest annual programming competitions, but it has the strangest submission format, unlike any other online judge. Why does it have to be so different? Let’s take a look at how Meta Hacker Cup 2024 Round 1 went for maomao90

The time now is 1:00 AM in Singapore. The contest starts, and maomao90 begins solving the problems.

The time now is 1:41 AM. maomao90 has solved problems A, B, and C without much trouble and starts working on problem D. After quickly coming up with a theoretical solution, maomao90 begins coding.

The time now is 2:05 AM. The code is ready and passes the sample tests. maomao90 proceeds to validate the solution.

The time now is 2:06 AM. Validation passes, and the input zip file is downloaded.

The time now is 2:07 AM. maomao90 runs the code on the final test.

Image showing assertion failed

Oh no! How did the code pass validation tests but fail the final test with an assertion error? Panicking, maomao90 scrambles to debug the code.

The time now is 2:11 AM. Five minutes have passed since downloading the zip file. maomao90 fails to debug his code and is no longer allowed to submit problem D. maomao90 wasted 30 minutes of his time and is left frustrated and in tears.

Problem 1: Why is the validation test so weak?

Is the validation test intentionally weak, or is it a mistake by the problem setter?

If it’s intentional, what's the goal? To make participants suffer? Brute-force algorithms often pass validation easily but take far longer than five minutes for the final test. Why is that?

Problem 2: Why are participants allowed only a single 5-minute attempt?

Almost every other online judge allows multiple submissions when your solution is incorrect. Why does Meta Hacker Cup limit participants to just one try?

One possible reason is that if someone's code takes more than 5 minutes to run, they can wait until their code finishes running before making a second attempt and AC the problem even though their solution took much longer than 5 minutes to finish running. However, there's an easy solution to this:

  • Instead of one input file, create three strong input files, each worth $\frac{1}{3}$​ of the total points.
  • Allow participants to download each input file individually, with a 5-minute submission window for each file.
  • This way, if a participant fails to submit for the first input file, they can still debug and submit for the second and third, potentially earning $$$\frac{2}{3}$$$ of the total points.
  • This approach would also strengthen the final test with three times more input data.

The time now is 2:12 AM. After a brief crying session, maomao90 starts on problem E.

The time now is 3:41 AM. maomao90 validates the solution for problem E but lacks confidence after the disaster with problem D.

The time now is 3:44 AM. After a final check, maomao90 downloads the zip file and runs the code for the final test.

The time now is 3:45 AM. maomao90 submits the output for problem E. There’s nothing else to do now, as problem D can’t be submitted. maomao90 is tired and wants to sleep, but at the same time, maomao90 wants to know whether his final output is correct. Unfortunately, the final verdict will only released after the contest...

Problem 3: Why is the final verdict delayed until after the contest?

Is it to reduce server load by judging only after the contest ends? The server doesn’t even need to compile or run code~--- it only has to compare two text files. Is that really too much for the server during the contest?

If the final verdict were provided immediately, along with the solution proposed in Problem 2, the contest experience would be far more pleasant. Yet, after 14 years, there’s still no improvement in the grading system. Why is that? Even Codeforces is experimenting with pretest=system test to prevent "Fail System Test" issues.

The time now is 4:00 AM. The contest finally ends, and maomao90 can check if he solved problem E correctly. Thankfully, it was accepted and he celebrates.

The time now is 4:01 AM. Looking at the leaderboard, maomao90 sees the number of WAs on problem D.

image

So many red crosses! maomao90 laughs, realizing many others faced the same weak validation issues on problem D.

Problem 4: Why doesn’t Meta Hacker Cup follow other online judges and run the code for us?

The ultimate solution to all these problems is simple: adopt the standard system used by most online judges, where participants submit their code, and the platform compiles and runs it. Why hasn’t Meta Hacker Cup implemented this?

Codeforces held its first round in 2010, using the current code submission system, and Meta Hacker Cup started in 2011. Why did Meta Hacker Cup opt for this convoluted system of downloading password-encoded zip files instead of following the code submission system that Codeforces uses?

Please upvote this blog if you faced similar issues or agree with the solutions mentioned. Hopefully, Meta will consider these suggestions and improve the system in the future. :(

  • Проголосовать: нравится
  • +111
  • Проголосовать: не нравится

»
3 часа назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

I have to do nothing but upvote. I solved B (at least I thought so) but missed n = 3 and n = 4. They weren't even in validation cases. I slept 30 minutes before the end of contest due to sleep and in the morning BOOM! I didn't even qualify for Round 2 :(

I mean life does give suprises and I'm ok with it. It taught me to stress test more but a little bit fault goes to the system as well... I hope that they improve their system so no other contestant faces what I faced

»
3 часа назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

They called me a hater. Its 2024 and a CP contest is still using such a silly submission format.

»
3 часа назад, # |
  Проголосовать: нравится +11 Проголосовать: не нравится

I don't agree with the point about weak validation test, since it's supposed to be just a sanity check for the format I guess, but everything else is 100% true.

Idea about several input files is especially great, since it opens up possibility to make 3 inputs with different complexity (easy\medium\hard) and encourage participants to write solutions for hard problems even if they are not optimal, so that they can at least get points for an easy input.

Also another minor issue is the requirement to have a facebook account, makes no sense. Meta already has standalone sites like metacareers for example which has separate and more simple account system, not sure why it can't be implemented here.

  • »
    »
    3 часа назад, # ^ |
      Проголосовать: нравится -12 Проголосовать: не нравится
    since it's supposed to be just a sanity check for the format I guess

    That's stupid. You could as well use the samples for "sanity check".

    • »
      »
      »
      3 часа назад, # ^ |
        Проголосовать: нравится +15 Проголосовать: не нравится

      Making validation input almost equal to test like it's done on CF makes life easier for those who can't test their solutions properly. Instead of complaining about it maybe just git gud.

»
3 часа назад, # |
  Проголосовать: нравится +1 Проголосовать: не нравится

Hi, I agree with the "Why doesn’t Meta Hacker Cup follow other online judges and run the code for us?" part. I setup my entire system for the competition, including codes to increase stack size etc. I coded the solution for problem A, and upon downloading and opening the final input file (which was very large), my system crashed. By the time I could figure out a solution or switch to another device, the timer had already ended.

»
3 часа назад, # |
Rev. 2   Проголосовать: нравится +14 Проголосовать: не нравится

Are you familiar with how Google Code Jam used to operate?

If MHC were to change from the current format, I would prefer MHC to go the way of old Google Code Jam (with small and large input cases, and you still run your code locally on your own machine) instead of new Google Code Jam (where you submitted code for evaluation.)

  • »
    »
    3 часа назад, # ^ |
      Проголосовать: нравится +5 Проголосовать: не нравится

    Hmm why do you prefer running locally rather than submitting code?

    • »
      »
      »
      2 часа назад, # ^ |
        Проголосовать: нравится +15 Проголосовать: не нравится

      It's fun and unusual. It's also double fun when you get an assertion failure while running on the final test, and have to fix it within a few minutes.

      Additionally, I believe there's an important educational aspect.

    • »
      »
      »
      110 минут назад, # ^ |
        Проголосовать: нравится +10 Проголосовать: не нравится

      In addition to what KAN said, there's the rather unique fact that you have access to the test cases.

      This enables you, for example, to verify for yourself exactly how strong D's validation tests are (it should be obvious they are weak if you read them), and then test yourself any edge cases that you think might be missing in that coverage.

      Also, you get to look at the full input, which helps a lot during those 5-minute debug scrambles (an "RTE" verdict is not all you have to go off, which helps a lot). Also, you get to check if the input contains any edge cases you ignored: this is not often helpful, but there has been at least one occasion in which I noticed a case with n=0 in the input, realized I handled that incorrectly, and edited a "2" to a "1" in my output file before submitting for the AC.

»
2 часа назад, # |
  Проголосовать: нравится +19 Проголосовать: не нравится

I very much enjoy the aspect of 6-minute timer and sometimes scrambling to fix the solution. It's my favorite part of Hacker Cup.

  • »
    »
    2 часа назад, # ^ |
      Проголосовать: нравится -19 Проголосовать: не нравится

    It's comments like yours that make them still retain this submission format. Just because you're high rated, probably with a better pc and likely able to solve more problems even if your submission timer expires doesn't mean everyone will feel the same.

»
2 часа назад, # |
  Проголосовать: нравится +15 Проголосовать: не нравится

If you look at the topcoder format it is not that different. Only sample tests to test your code on, only know the verdict at the end of the contest. You can submit multiple times, but the only reason why you would resubmit is if you have found a hacking test on your own. Still people enjoyed topcoder for a long time. It's different from CF, true, but noqt bad. What makes it bad is if you just approach it as if it's CF and submit immediately once you pass samples/validation. In this format testing your code pays off, or you can make a gamble and submit for a faster solve time but less guarantee. There's strategy to this.

In problem D it was obvious validation was very weak as you could see there were two more cases added except the sample, no big cases and D could have lots of casework and edgecases. I was also gambling on D and lost, but I was not mad about not solving.

  • »
    »
    96 минут назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Hmm I guess it was my bad for not taking a look at the validation test and assuming that it would be strong enough.

»
98 минут назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Seriously, what is with all the loser mentalities around here? It’s a fair format, which has advantages and disadvantages (like any other formats). Why are you blaming the contest for your sub par performance?

  • »
    »
    97 минут назад, # ^ |
      Проголосовать: нравится -11 Проголосовать: не нравится

    It is far from a fair format, but go on we are losers.

    • »
      »
      »
      88 минут назад, # ^ |
        Проголосовать: нравится +1 Проголосовать: не нравится

      Once you stop blaming external factors and start improving yourself, only then you'll have a chance of being good at anything. Pretests were weak? You should have implemented a more careful solution. Assertion failed with 5 minutes left on the clock? You better fix it.

      Just like OP showed in his blog post, it's not only him that had this issue (nor would it have been a problem if he was the only one with it). The tests were correct, his solution was incorrect, and he should be frustrated at himself for not solving the problem correctly instead of the contest for not babying him into solving the problem.

      • »
        »
        »
        »
        81 минуту назад, # ^ |
        Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

        I didn't read any of that because I was referring specifically to the submission format.

        There are alot factors that make it different from the regular submission. Decompressing huge files, timer, huge input files, very computer spec dependent, etc. Even though I longer participate, I don't see any advantages this submission format gives as you said in the comment I replied to.

      • »
        »
        »
        »
        48 минут назад, # ^ |
          Проголосовать: нравится +1 Проголосовать: не нравится

        What about me having a worse computer? Do i spend thousands of dollars to fix that too?

        • »
          »
          »
          »
          »
          29 минут назад, # ^ |
            Проголосовать: нравится 0 Проголосовать: не нравится

          Although I'd be more than happy to take this debate, my comment and this blog is not about that.

»
76 минут назад, # |
  Проголосовать: нравится +5 Проголосовать: не нравится

I don't think we should depend on the test cases to tell whether our solution is correct or not.

From my perspective, in programming contests, an online judge cannot look at the code and prove its correctness, so the less bad option we have is to run it against thousands of test cases. That does not mean it is the ideal way of checking the correctness.

(I missed my B only for $$$n=4$$$, I'm sad)

One thing I don't like about this format is that many people with a powerful system can use parallel programming/ multithreading, and not every place has access to fast internet so the downloading of files might be very slow for some people.

»
33 минуты назад, # |
Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

Some other issues with this format:

1) you cannot tell whether your code will actually run in time or not....by this i mean you LITERALLY cant. It depends completely on whether they try to fuck you by giving 100 tests of the same type intended to TLE your solution, or they are benevolent and only give 2 such tests.

2) there is a big difference on the basis of how powerful your computer is, especially when multithreading which depends on the number of cores you have. Take a look at comments about last round E

3) the contest format is very knowledge-heavy if you want to prevent having any issues. You have to know how to safely increase your stack size, how to multithread different test cases, run big files locally, etc. Some of you might think its not an issue and that participants should be capable to deal with such issues. I vehemently disagree. Its like setting a prefix sum problem and then putting updates on it so you have to now copypaste a segment tree. Adds nothing except for preventing the people without that useless piece of knowledge from solving the problem.

This contest format is very outdated.....last year i refused to take part in R3 because the problem quality was so bad and i did not want to bother myself with bad judging format along with bad problems. MHC really isnt worth it unless you are a top 25 participant.

  • »
    »
    5 минут назад, # ^ |
    Rev. 3   Проголосовать: нравится 0 Проголосовать: не нравится

    1) Just assume the worst. Your solution should be able to handle the case with "100 tests of the same type intended to TLE your solution". If you opened the test case and you are not confident about that, you're taking a gamble. You might win or you might not. Same when submitting a squeezed $$$O(n \sqrt n)$$$ solution for a problem with $$$n = 500000$$$.

    2) If you could multithread your code and you didn't (and you realized that you're in the situation I described in 1), then you should reflect into how to mitigate that for further contests. I somehow doubt that E couldn't be solved on a mid-level computer with multithreading. And if you think multithreading code is not a skill that you should invest in, then I'm sorry for you. It is one of the most important aspects of writing performant code in real life. As a side note, I was really sad to see Distributed Code Jam go away, I think it was so unique in its format and style, that it was one of the most exciting events to invest my time and learning into.

    3) I understand your point, but you have to realize that that's your opinion. I am on the other side of this. I do think "participants should be capable to deal with such issues". I don't think running big inputs, setting stack sizes, making code paralellizable, and even using data structures to solve problems are "useless pieces of knowledge". You do. This is okay. What is not okay is advocating that all programming contests should reflect your opinion. There should always be room for variety.

    Your last paragraph sums it up pretty nicely. If you don't like the format, and you don't feel like it's worth it to improve yourself on it, you can always not participate. But do understand that it has its place in the community, and a lot of people would be sad if next year's contest would be just another CF round.