Petr's blog

By Petr, 12 years ago, In English

In the discussion at http://codeforces.net/blog/entry/6870#comment-125383 it became apparent that Jacob has submitted a solution for Round 172 Problem E that passed all system tests, and was not challenged, but in the end he didn't get the score for it because a tester has decided to stress-test it against a correct solution and add tests where it fails to the system test.

I think this is an awful, awful decision, as it goes against the spirit of progamming competitions: automatic judging and complete objectivity it gives.

There are so many ways this can discriminate against Jacob:

  • Had I submitted a solution for problem E, the tester might not suspect it would be a wrong greedy (because I'm in top 10 by rating) and would not add tests, and I'd get the points.

  • Maybe some people who opted for solving problem D instead also submitted wrong solutions, but since there were more than one of them, the testers did not think to stress-test them.

  • Had he submitted his solution right before the end of the contest, the tester might not decide to read it because he'd have other things to do.

But these points are minor compared to the main point, which I want to reiterate:

  • This breaks everyone's faith that the competition is fair and objective. The beauty and appeal of programming competitions, in my opinion, relies on this one aspect: automatic judging. Please don't take it away from us.

In this particular situation, I propose to remove the tests in question and give Jacob the score for this problem. In the future, I propose to keep to the formal procedure — use the tests prepared before the contest plus successful hacks. This is the only way to make sure the competition is not subjective.

I don't blame the particular author or tester for this problem — I actually applaud them for bringing this problem to light and explaining their actions clearly. Thanks a lot! Hopefully we can learn from this situation and avoid similar issues in the future.

  • Vote: I like it
  • +448
  • Vote: I do not like it

| Write comment?
»
12 years ago, # |
Rev. 18   Vote: I like it +18 Vote: I do not like it

Yes, I stand in Petr on this point ... Now I look back it seems more better if we give a hack as soon as possible, but not adding tests! But We were lack of experience about this issue, and didn't know we as a manager, can also hack just like other participant.

The tests are so weak, I didn't consider the random data's property, the tester also has responsibility, in fact, a few of them can solve this one. I hope that will give all of us a lesson about this issues, manager must aware of this in the future.

But in this situation, I'm afarid it'll be improper if we set his solution as a correct solution, because his code didn't maintains the derivative, and only work when all the delta is closed to a or b. It will be unfair for other participant and irresponsible for himself if we give him the score for this problem.

We are trying to discuss this point in more detail in the editorial.

Anyway, I felt so guilty about this. And I really appreciate his first-E last-A strategy during the contest. It is not only courage but a confident of the personal strength. After all, without him, the contest will become boring and boring ...

How can I do something to compensate you ... Jacob, please tell me ~~

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +109 Vote: I do not like it

    Actually, I think that hacking him as a manager is just as wrong :) Judges should not take any decisions based on submitted solutions, judging should be fully automatic.

»
12 years ago, # |
  Vote: I like it +48 Vote: I do not like it

No matter what will be decided, I suggest these tests shouldn't be removed from practice. It affects only your solving abilities and the fewer wrong solutions pass during practice the better.

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +11 Vote: I do not like it

    Sure, I agree completely here.

»
12 years ago, # |
Rev. 2   Vote: I like it +40 Vote: I do not like it

It's my fault.I truly apologize for this,and promise I won't do this any more. Although I don't think Jacob should get the score..

I made 4 generators:gen gen6000 genab gen6000ab,and first three generators can only made weak data.

I don't have any experience.I only put 4 "gen6000ab" in all 100 test!!However,I put 80 "gen",the weakest generator...

His solution accepted these 4 "gen6000ab".But when I added 6 "gen6000ab",his solution got 3 WA..test 101 is one of them.

It's my fault because my data is too weak,but actually,I didn't made his solution WA on propose(I do not mean I don't want to do it,just...for example,I didn't run generators for 1000 times or make "anti" data."gen6000ab" is added before the contest).

And why I do it?Because the solution to this problem is so magical,and I'm so selfish that I really don't want to see a wrong solution got AC..

At last,I apologize for this again.Wish you all forgive me.

  • »
    »
    12 years ago, # ^ |
      Vote: I like it -37 Vote: I do not like it

    "And why I do it?Because the solution to this problem is so magical,and I'm so selfish that I really don't want to see a wrong solution got AC.."

    You should have thought about that before the contest!

    Let's take my case — I solved problem C, and made a stupid mistake in calculating the depth of a tree node. It was only a small issue, the main idea was something completely different and my whole work is lost because of this mistake.

    The same situation is here, you have prepared a beautiful problem and magical solution, but you failed to prepare a good tests. Even though the tests aren't the main part, you failed it and you should suffer the consequences!

    You had the time before the contest as well as I had during it. We both failed and I don't understand who gave you the right to change the reality. In this case I also ask to remove the test 52 (which was in fact not the official test, but a successful hack made in 1:58:14) and rejudge the contest — it will change the score of 3 people in div 1 and 1 in div 2.

    • »
      »
      »
      12 years ago, # ^ |
        Vote: I like it +19 Vote: I do not like it

      Your case is different. It's a tradition that in algorithm competitions participants are supposed to get everything right, but not just the idea. Your solution is supposed to fail.

      • »
        »
        »
        »
        12 years ago, # ^ |
        Rev. 2   Vote: I like it +15 Vote: I do not like it

        Problem setters are supposed to let those who solved the problem properly get the score. IMO it does not matter when and how they accomplish the goal.

      • »
        »
        »
        »
        12 years ago, # ^ |
          Vote: I like it -35 Vote: I do not like it

        It's exactly the same: Tester had his time before the contest and he fixed his mistake during it. I had my time during the contest and want to fix my mistake after it. He added the test after he saw the problem with his work whilst I want to remove the test after I saw the problem with my work. We are all equal in terms of community so if he is able to do such things then I also want.

        BTW, I don't think it's consistent with the rules — I would like to see them. Could you give me the link to the current version?

        • »
          »
          »
          »
          »
          12 years ago, # ^ |
            Vote: I like it +22 Vote: I do not like it

          So you say that every competitor can remove the killing test and got accepted ? If so that would be a really good contest system!

          • »
            »
            »
            »
            »
            »
            12 years ago, # ^ |
              Vote: I like it -56 Vote: I do not like it

            Simply write sarcastic comments here are of no use. I think no one has said these words, so mind your own words pls.

        • »
          »
          »
          »
          »
          12 years ago, # ^ |
          Rev. 2   Vote: I like it +3 Vote: I do not like it

          Firstly I don't seem to have mentioned anything about the rules. If it is not consistent with the current rules, personally I suggest we consider to have them modified.

          Secondly the testers add the tests to make it fair for the other contestants. If you remove the test just because your test fail on it, it's not fair for the others. Anyway those who get the implementation right should get higher ranks than those who get mistakes during contest.

        • »
          »
          »
          »
          »
          12 years ago, # ^ |
            Vote: I like it +17 Vote: I do not like it

          You can't just simply say that they are exactly the same. The Competitor and Jury has different goal and different responsibility, they are not the same anyway.

          For Competitor, they should write the correct code, and For Jury the should ONLY MAKE CORRECT CODE PASS, That's it.

          • »
            »
            »
            »
            »
            »
            12 years ago, # ^ |
              Vote: I like it -9 Vote: I do not like it

            Then they should add the test before the contest.

            Wrong solutions pass the problem is also the arts of programming contest. To write a wrong code to pass all the tests aren't easy, they must exploit weakness in problem-writers' mind. And I find these solutions magical.

            When you go to university you will learn some courses about approximation algorithms, they all have their own uses. ACM-style contests that only AC could get score isn't the most scientific at all, especially in CF it's not like ACM you can know whether you have been doing right when submitting. It's a mixture of OI and ACM, and a clumsy one.

      • »
        »
        »
        »
        12 years ago, # ^ |
          Vote: I like it -14 Vote: I do not like it

        And one more thing — my solution wasn't supposed to fail. I would say that in this case the writer should specify constraints to disable writing stupid test cases.

        Actually the test doesn't present a tree on which my program fails. If you reorder the edges then it passes. You can't find any tree on which my program fails. And that was the goal of the task. This was showed by all test cases prepared by testers. Program should work for the tree! The only problem is when you give some strange order of edges whilst authors didn't intend to fail solutions because of different orders of tree edges. I checked the other 3 solutions and probably they don't work on this tree but in my case I had an extremely bad luck. Firstly because the hack appeared at 1:58:14 and secondly because normal ordering of edges (from root to leave — it's a path) would make fail hacked program and the other (probably wrong) solutions but not mine. My program should basically pass in terms of common sense.

        It's similar situation to SQL injection. Of course portal should be prevented before this kind of attacks but the world would be much better if we didn't have hackers trying to do this.

        • »
          »
          »
          »
          »
          12 years ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          I viewed your code. The input is a chain, which is obviously a tree. And your code seems to be wrong.

        • »
          »
          »
          »
          »
          12 years ago, # ^ |
            Vote: I like it +3 Vote: I do not like it

          I've viewed your code ... Your code is extremely wrong bcz it assume that the father will be the first in the adjacent list.

          I admit the test-data is too weak, I am so sorry about it.

          • »
            »
            »
            »
            »
            »
            12 years ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            Man, extremely wrong...?

            This is my mistake in calculating the depth of a tree node. It's a basic algorithm and the task wasn't about calculating the depth of a tree nodes...

            For each tree you can find such permutation of a tree edges that this code passes.

            If you gave the tree in the other form:

            "Line 1: int n — the number of nodes in the tree. Lines 2..n, in line i there will be an int pi (1 <= pi < i) which denotes the parent of the vertex i in the tree."

            Then my program would pass all possible inputs.

            • »
              »
              »
              »
              »
              »
              »
              12 years ago, # ^ |
                Vote: I like it +11 Vote: I do not like it

              But indeed the input data is any kind of tree in any ordering, your program simply can deal with this special form of tree.The idea is right but the implementation is wrong.

            • »
              »
              »
              »
              »
              »
              »
              12 years ago, # ^ |
                Vote: I like it +43 Vote: I do not like it

              Nobody cares about satellite crash reason. Nuclear war could has been started 30 years ago because of early-warning system malfunction. Developers didn't think about sunlight reflections, but it nevertheless happened.

              IMHO your program have to work on all possible test cases that fall under constraints. Not only on some 'good ones'. Even if meteor is possible, your program should handle it correctly.

            • »
              »
              »
              »
              »
              »
              »
              12 years ago, # ^ |
                Vote: I like it +40 Vote: I do not like it

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +29 Vote: I do not like it

    Maybe I'm not eligible to say what I'm going to say, because I'm just an Expert (1501), but hope you all will consider my words.

    After what Seter has said (s/he wants us to forgive him/her), I feel that we must all do so. Codeforces is not , in my opinion, something ' do or die '. I feel its for fun and its somekind of a sport. After so much discussion, I feel that such events will not occur in future. But please think from Seter's viewpoint. S/he is asking for our forgiveness, and let us all forgive him/her, to make Codeforces a long-lasting and happy community again. Even Jacob himself forgives Seter (please have a look at the comments of xiaodao's blog which announces round #172). Hope you all will consider my words.

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +1 Vote: I do not like it

    As I wrote in my original post, I'm not blaming you at all!

    My post was written so that we can discuss this issue and get a common denominator for the future :)

    Thanks a lot for preparing the contest!

  • »
    »
    12 years ago, # ^ |
      Vote: I like it -24 Vote: I do not like it

    I think that we are all missing the problem.

    The real problem is the generator.

    Seter used "gen6000ab" as name, which is BELOW 9000.

    He should have used "gen9001ab" and it would be generating test cases which are OVER 9000 and all this would not have happened.

»
12 years ago, # |
  Vote: I like it +21 Vote: I do not like it

Here's a similar question that MikeMirzayanov has brought up offline:

What about bugs in tests that are discovered because some team submits a correct solutions but gets a WA, then judges look at the team's output and realize it's actually correct? Should we fix those?

Apparently my above reasoning leads to a "no" answer — since this is subjective, too (a lower-rated team getting a WA would probably not attract attention). However, the real-world practice has always been a "yes", for example my contests in Petrozavodsk did have rejudges caused by teams submitting correct solutions while my solutions were wrong.

TopCoder's answer to this question has usually been "yes, but the contest becomes unrated", although that was not true in a recent match.

I don't have a good answer here. Somehow a "yes" here doesn't sound as awful as the today's situation to me, but I can't explain why.

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +7 Vote: I do not like it

    Well, this is significantly different question. "A lower-rated team getting a WA would probably not attract attention" — even if it would not attract attention during the contest, bug will be found later, and contest will become unrated (the only sensible outcome for the contest where the problem was bugged and it influenced significantly results of some teams). So IMO it is objective enough.

    Besides, incorrect jury solution is fatal for the contest, and fixing it quickly is usually the only way to save the day; while weak tests are not that terrible.

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +43 Vote: I do not like it

    In my opinion it's awry to compare a contest in Petrozavodsk and CF Round because of different rules. The main point in CF Round is if you get AC during a round nobody guaranteed you that you will get a score as a result.

    If it's a ACM-ICPC contest, all of solutions are tested on the same tests during all the time, it's a part of the rules. But CF rules differ from it. Nobody guarantees or promise anything to you before the system test.

    Adding tests during a round is a bad style, but no more, I think.

    • »
      »
      »
      12 years ago, # ^ |
      Rev. 2   Vote: I like it +27 Vote: I do not like it

      Adding test on the basis of knowledge, that some not top ranked participant passed current system tests of hard problem is generally unfair, not only bad style

    • »
      »
      »
      12 years ago, # ^ |
        Vote: I like it +13 Vote: I do not like it

      I think it could be even good to add tests, if stress-testings were applied to all solutions passing main tests, and each newly-found test (that forces solution to fail) was added to systest.

      Unfortunately, it looks impossible to do all this with each solution. (BTW, it's rather question of "we want to see our results just now", that any other reason...) So I'm forced to agree with topicstarter Petr: it's unfair to change systest depending on stress-testing of some solutions.

  • »
    »
    12 years ago, # ^ |
      Vote: I like it -10 Vote: I do not like it

    Agree. Contests must follow their own guides. For example, when someone did something wrong but since there are some weaknesses in law, his action didn't violate the law. Then he should go unpunished. This is Procedural justice

»
12 years ago, # |
  Vote: I like it +37 Vote: I do not like it

I believe adding test cases against a specific approach to be acceptable (however, it's better to do so early enough) — in fact, such practice is not uncommon among problem setters of serious informatics competitions in Slovakia.

I do not, however, agree with adding test cases in such a way that a specific solution would be sure to get a WA. The goal of programming competitions is not to get a correct solution, but to get a solution which passes all tests, and is therefore "correct enough".

So, were it up to me, feel free to add some test cases against Jacob's idea, but do it before looking at his program :D

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +8 Vote: I do not like it

    Very nice advice! I think it would be better if I had decided to just change some weak generator "gen.exe" to strong generator "gen6000ab.exe" instead of adding one manual data after looking at his solution,even if it's also generated by "gen6000ab.exe" :)

  • »
    »
    12 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Right. I think it's very common to prepare testcases against different wrong solutions before the contest — it's only adding tests on the basis of a submitted solution that I have a problem with.

»
12 years ago, # |
  Vote: I like it -18 Vote: I do not like it

I am absoulutely agree with you. The situation when you can cheaply change yours tests during the contest allow (and sometimes force) testers to not properly care about tests berore the contest. It is not good and results in such situations like with Jacob's problem E.

»
12 years ago, # |
Rev. 2   Vote: I like it +53 Vote: I do not like it
First I should apologize bcz I didn't test the Problem E carefully as a Tester so didn't point out the test-data is somehow weak.

But I still should make my point here, let us supposed, if there's a problem, but the test data is too weak, and some simple wrong greedy passed. Then it is really fair? Just to say, a good coder would realize quickly the simple greedy is wrong, and of course they won't try it. But an normal coder may simply try it for nothing to lose. What's more, maybe one good coder type like 200~300 lines code for the right solution, but an wrong greedy with 30 lines also passed, is that really fair?

You are right that a programming contest should be objective.BUT THE ROCK-PAPER-SCISSORS IS THE MOST OBJECTIVE ONE! why don't we just throw a dice to decide the winner, how objective it is?

I just mean that it should not only be objective, but also should be fair, which mean only let the correct solution passed.
  • »
    »
    12 years ago, # ^ |
      Vote: I like it +48 Vote: I do not like it

    For not dragging philosophy here, I should make it clear that

    I understand objective by "Everyone is treated the same",

    and fair by "Everyone get the result they deserved".

    They sometimes can't be keep in one's hand together, so which one do you think is important in Programming Contest?

    I still think fair is more important, just bcz , if we really wanna the most objective game, why won't we throw a dice? The goal for any contest is simply let the best competitor get the best place.

    • »
      »
      »
      12 years ago, # ^ |
        Vote: I like it -14 Vote: I do not like it

      Notice that "Everyone get the result they deserved" is SUBJECTIVE.

      Without procedural justice, how can you determine what is the result someone deserve? By your mind?

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +9 Vote: I do not like it

    First of all, thanks for this awesome contest! You do not have to apologize, this was a high-quality contest, the only issue was this test-adding for E.

    Concerning objectiveness and 'fairness'. While weak tests only affect results of a single contest, non-objectiveness tends to grow over time: if it becomes common, we will have situations like "we don't like this guy, let's look at his solutions and add tests against him". So I'd say objectiveness takes priority.

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +8 Vote: I do not like it

    Procedural justice should be considered. Adding a test IN CONTEST shouldn't be allowed. The contest organizers should consider every aspect of problems properly BEFORE THE CONTEST but not trying to hack solutions of someone.

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +23 Vote: I do not like it

    That's a good point — just throwing the dice would certainly be a very bad programming contest :)

    I agree that we should pursue both goals — only good enough solutions should pass, and judging should be automatic. And I agree that sometimes it's hard to achieve both goals.

    I think an instructive example is: suppose we can't separate the intended solution written 'normally' from a highly-optimized asymptotically-slow solution. Sometimes, we just don't use such problem for a contest. But in case we do, we have to live with the fact that highly-optimized asymptotically-slow solutions will pass. An alternative would be some kind of manual review that solutions have the desired complexity, instead of a hard time limit — but I don't like that alternative, because it will inevitably lead to borderline situations and reasons for contestants to feel they've been treated unfairly. I think it's very beautiful that we simply say 'your program has to run under 2 seconds on each testcase we've prepared' and no human judgement is involved.

    Of course, sometimes we do have to prefer fair to automatic, and one example is when we have an incorrect reference solution — I think most people would agree that we do need to fix the reference solution if we find out that it's incorrect, independent of the way we found that out.

    So our disagreement seems to boil down to whether the situation with Jacob's solution falls to the first kind or the second kind. I think that in an official competition, like a Codeforces round, letting an incorrect solution pass (that still passes all testcases that were prepared before the contest and all hacks) brings less harm than resorting to manual judging (even if it's just running an existing test generator several more times). Had this been a training, I would be fine with not letting his solution pass.

»
12 years ago, # |
Rev. 3   Vote: I like it +27 Vote: I do not like it

it's my (and Seter roosephu) first CF round.We wanted to make it perfect. CF is not OI which encourages one to cheat for scores.i don't think a ACTUALLY WRONG solution should get accepted in system test. contestants hack others' submissions to point out others' mistakes and get scores as a reward. when there is a wrong submission but nobody can hack it,we problem setters do have the right to execute it.

Anyway,Jacob is the hero of the contest..

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +12 Vote: I do not like it

    OI is not a algorithm competition but a strategy competition. That's why I dislike it.

    • »
      »
      »
      12 years ago, # ^ |
        Vote: I like it +4 Vote: I do not like it

      Do you talk about IOI-like contests? What's your point then?

      • »
        »
        »
        »
        12 years ago, # ^ |
          Vote: I like it +5 Vote: I do not like it

        Guessing the difiiculty of problems is really important in OIs, that might be not so necessary in IOI since at most one problem cannot be solved by top contestant like you and me. But if the poblems are more challenging like China's Secletion Contest or even China's NOI, working on a hard problem will be a fatal mistake.

        • »
          »
          »
          »
          »
          12 years ago, # ^ |
            Vote: I like it +11 Vote: I do not like it

          Strategy is never negligible in any contest. For example, in CF and TC contests, if one finds A/250pt is harder than usual, s/he might prefer to have a look at B/500pt and so on. In fact, in this round, many IGM's solve problems in the order CBA. One cannot always get a good rank if s/he does not think about strategy at all.

          P.S. I think China TSC and China NOI are not that hard, especially in recent years.

          • »
            »
            »
            »
            »
            »
            12 years ago, # ^ |
              Vote: I like it +16 Vote: I do not like it

            I know, they are not hard for you. You can solve at least 5 such problems in a 5-hour contest.

          • »
            »
            »
            »
            »
            »
            12 years ago, # ^ |
              Vote: I like it +5 Vote: I do not like it

            CTSC is really hard anyway. NOI is easier, but really it's hard for most people. Even CTSC and NOI have 5 hours, some problems are really hard to code for the sophisticated program logic.

            Anyway agree with you, every contest strategy matters, sports, esports, problem-solving, they're the same.

        • »
          »
          »
          »
          »
          12 years ago, # ^ |
          Rev. 2   Vote: I like it +32 Vote: I do not like it

          I agree that CTSC is simply a !@#@!%#!#$!

»
12 years ago, # |
  Vote: I like it -58 Vote: I do not like it

I think the only fair way is to test every possible input.

»
12 years ago, # |
  Vote: I like it +56 Vote: I do not like it

In my opinion, adding tests during the contest is something bad. But in the situation that the origional tests are not strong enough, it is worthy to do that bad thing. Idealistically, the tests should accept a solution if and only if which is perfectly correct. A solution which fails in the additional tests fails due to the incorrectness of itself, not some subjective reason.

»
12 years ago, # |
Rev. 5   Vote: I like it -26 Vote: I do not like it

I agree with you, Petr. One of the main reasons why programming competitions is so popular is its objectiveness which ensure the fairness. In most cases, you will never find an objective way to check each possible solution. As for the imperfect solutions, what should we do? Since we don't have the ability to discover all of them, we should , at least treat them equally.

Btw: If this happens again, someone would like to add a lot of meaningless information to protect his "score"(only a joke, don't be serious)

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +19 Vote: I do not like it

    OK,with your logic,let us suppose a extreme case.

    We don't have the ability to discover all the crime, so we should , at least treat them equally and let all crime happen with free?

    It might sound too extremity but my point here is that: yes, we can't do perfect to discover all wrong solution, but if we found one then we should do what we can do. It is still better than doing nothing.

    • »
      »
      »
      12 years ago, # ^ |
        Vote: I like it -11 Vote: I do not like it

      I mean treat them with the same rule we set before(test cases?)

      There are some realistic cases that someone do sth wrong without being punished because of defects of the laws. Then, most of us agreed that we really should let it go and then check the law again.

      Moreover, in ACM/ICPC rules, a judge can not rejudge an YES to NO, they can't add more test cases during the contest either.

      • »
        »
        »
        »
        12 years ago, # ^ |
          Vote: I like it +16 Vote: I do not like it

        Many people talk about ACM/ICPC. But CF rule is different from ACM/ICPC. You can see this comment.

        • »
          »
          »
          »
          »
          12 years ago, # ^ |
          Rev. 2   Vote: I like it -6 Vote: I do not like it

          Yes, they are different. However, there must be some reason why ACM take such rules.(Actually my main point is not the last two lines and you can ignore it, i agree it's not so supportive)

          and I think judge has too too much power, no one give them the right to change the test data after see someone's code. At least, they should make a topic to discuss it. (I know they are intend to make the match fair, but their fairness are somewhat superficial)

          Or if we can ignore the efficacy, we can create a "free hacking period" after the match for about one week and only those survived solutions will get score.

    • »
      »
      »
      12 years ago, # ^ |
        Vote: I like it +17 Vote: I do not like it

      It's part of the contest. Like I said before, the challenge work is up to the contestants. And if you're willing to just accept the perfect solution then you should probably go back in so many contests and change a lot of "random" solutions who got AC to WA. Don't let this awesome kind of contest change into a pragmatic one. I mean, the main quality that makes CF, TC and so many other so enjoyable is this freedom to earn points by 'hacking' wrong solutions or even the 'fuck yeah' moment when you get AC using a 'random' or a almost-correct solution. Competition is not only about solving a lot of problems, finding the best solutions or s.t like that. It's also about doing something enjoyable and funny for yourself.

»
12 years ago, # |
  Vote: I like it +31 Vote: I do not like it

When the mistake with test data do happens, what to do then? Hacking one program is unfair for a single competitor. Let the program get AC is unfair for other competitors. Neither seems a good idea.

»
12 years ago, # |
  Vote: I like it +42 Vote: I do not like it

While I don't have a “42” answer on the matter, I'd like to point out an associated issue.

This thread may well have hundreds of comments and seem to come to a conclusion eventually. But the Codeforces team does not usually cover such cases publicly. Instead, some internal conversations take place and the decision is silently made. To an external observer, it is unclear whether such a thread influenced the final decision at all, and hard even to find out what the decision was in the first place.

In such interesting cases (contestants' appeals in general), I would like to have them publicly recorded along with the judges' answers and motivations. Generally, it is a good thing to do; to check that it is so in more common sports, I just googled for appeals in football and instantly found some public decisions.

When the judges' decisions and motivations are publicly available, we can expect them to act and decide in a similar way when a similar issue arises. For a contestant, it is a good thing to know in advance what to do and what to expect in some corner case of the contest rules.

When the judges' decisions and motivations are kept private to the judges' team, we have no clue what will the judges' reaction be next time. Even worse, ironically, contestants which engage in private conversations with judges on the matter gain a slight unfair advantage of knowing their motivations and knowing what to expect next time. So, it turns out that keeping the appeals private leads to more unfairness — the very same unfairness the appeals are designed to decrease!

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +38 Vote: I do not like it

    Agreed. By making the judge's decisions public, the contestant who thinks he is treated unfair may have the chance to appeal. And the final decision can be made in public by the third part.

    In this case, I think the tester was right, because Jacob's solution can be simply challenged by adding some big test data not specially designed against his solution. But sometimes it's hard to say the solution is perfect. For example, if the standard solution and the contestant's solution both use hash algorithms and they choose different seeds, and the tester add some test data that specially designed against the contestant's seed. (Just an example, I think no one is that evil) I think we all think this behavior to be illegal. So making the judge's behavior public helps us to take the matter on its merits, not to make a general conclusion.

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +8 Vote: I do not like it

    I couldn't agree more. Of course the admins will still have a final say in each situation, but it would be great to see the motivation for their decisions made public. Summoning MikeMirzayanov and Gerald to the thread :)

    I guess what might stop them from revealing their motivation is that people will start to argue with it. So maybe it's better to reveal the motivation only after the community has expressed its opinion, and all the pros and cons have already been pointed out. For example, now :)

»
12 years ago, # |
  Vote: I like it -56 Vote: I do not like it

i think it was appropriate to add the test data and hack Petr's submission of E before the contest ended as soon as possible.

  • »
    »
    12 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    and the Organizers were supposed not to intervene the contest when it started

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +12 Vote: I do not like it

    Petr does not have any submissions for problem E. Petr is the author of the post, and the submission in consideration was made by Jacob. You'd better read at least the first sentence of the post before commenting.

    • »
      »
      »
      12 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      it's slip of pen, i wanted to say about Jacob's submission..

»
12 years ago, # |
  Vote: I like it +5 Vote: I do not like it

Well I don't know about you all guys but I hope this kind of thing will not repeat.

»
12 years ago, # |
  Vote: I like it +1 Vote: I do not like it

I think it necessary to prepare all data for a test before it finishes.While solutions look not quite right may pass all test data, right solutions may not be able to pass some special data. So, It's fair to give the coder the points he deserve for fair and make his solution fail for justice. Of course, I think the mistake can be understood, and the adding of the last data make the problem more perfect.

»
12 years ago, # |
  Vote: I like it +11 Vote: I do not like it

In fact the answer is simple. Someones tried to design test cases that any wrong programs can't pass, this is just impossible. For example, if a qsort don't use randomized pivot, should we design tests to make it TLE?

SO the only fair way is to design tests before contests, then to all participant, the probability of their wrong solution passing the tests are the same. Design tests based on an existing solution is just unmoral.

I think some people who excel on programming are genius, but they don't know what is fair. They just think stronger is better. To try to make only "right" solutions passing the tests are just childish thinking, stop your chuunibyou, pls.

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +31 Vote: I do not like it

    I have mentioned it several times is that : letting a wrong solution pass is unfair to ALL OTHER COMPETITOR, can you think of it? One day you participate in an Contest and code 200 lines or 5kb codes and pass a Hard problem, but other simply use greedy or even cheating to get accepcted! is that fair to you?

    The CF Competition rule doesn't guarantee anything with passing pretests.So It is not like the case in ACM contest.I think it is indeed an good point of CF.

    I know that it is unfair for him to do anti-test to run against his solution, but anyway in this situation we have no perfect chose. "Let the wrong solution pass" or "Add a test to prevent it". I still think the first choice is worse bcz it betray everyone who thought it carefully and found the wrong greedy is not correct.

    And still, I totally understand that many people say it could enlarge jury's right and make subjectivity take place, but we can do something in it ,like making jury's action more public to users.

    In the end, things are not that perfect you thought, also fair is not that simple , I just wondering. You say that the probability of wrong solution passing the testes are the same, then it is enough? I can't understand it, it is not that facing the same situation then it is fair,supposed that if the data-set maker kill one greedy , but didn't thought of another one,so one get passed while other one get zero score, and all you do is "oh how lucky I am!". This is the fair you want?

»
12 years ago, # |
  Vote: I like it +38 Vote: I do not like it

Anyway I think Codeforces need make a rule to make it clear that what a jury can do and what they can't do.

  • »
    »
    12 years ago, # ^ |
    Rev. 2   Vote: I like it +21 Vote: I do not like it

    I think, the jury can do everything if:

    1. It touches all participants identically;

    2. It doesn't affect participant' tactic during a coding phase.

    • »
      »
      »
      12 years ago, # ^ |
        Vote: I like it +6 Vote: I do not like it

      And how you want to formalize concept of "identically touching"?)

      • »
        »
        »
        »
        12 years ago, # ^ |
          Vote: I like it +27 Vote: I do not like it

        Why do I need formalize it? In my opinion making the rules for judges is a strange idea anyway :)

        • »
          »
          »
          »
          »
          12 years ago, # ^ |
            Vote: I like it +14 Vote: I do not like it

          I think YuukaKazami want to know is adding tests using participant solution considered as normal or it's an exception.

        • »
          »
          »
          »
          »
          12 years ago, # ^ |
            Vote: I like it +14 Vote: I do not like it

          Do you know that judges in real world often use any code of laws(such as constitution)? I think it is "rules for judges" :)

          Of course final decision not always conform to "code of laws" but it is an exception.

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +5 Vote: I do not like it

    "Rules are for idiots"

    House MD

»
12 years ago, # |
Rev. 2   Vote: I like it -27 Vote: I do not like it

This round should be unrated for Jacob (_maybe_ for his room too).

At least he gained some contribution :)

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +16 Vote: I do not like it

    Then it is unfair to the other people in his room.

»
12 years ago, # |
  Vote: I like it +65 Vote: I do not like it

I agree with Petr, the reason I participate in programming contests for many years is that: It is auto judged and the judgement process is public(Maybe not true for ACM). That's different with some math contests: it needs jury to understand the proof statements written by contestants and sometimes it's unfair.

And I know it's very hard to make a good contest under these conditions, because I have some experiences of writing problems (contests):

  1. Sometimes making test cases is hard, for example: I proposed this problem: MagicMolecule in late 2011, but it was used in SRM571, just about 3 weeks before today. One reason is that it's not easy for making test cases to defeat some naive search algorithm. So I think the author (and admin) should't use a problem until he(she) think test cases are prepared well.

  2. Sometimes a non-intend solution will pass the system test: In the Div-I hard problem of SRM557: XorAndSum, this short solution passed system test, which is totally out of expectation. I think it should pass even if it is indeed incorrect, because it passed all test cases we prepared before the contest. I think we should never add test cases when the contest begin, even if it will make the contest looks more 'fair'. If test cases are very weak, then they are. It will push the writer to do better in next time, but he(she) should't have the right to modify test cases during the contest.

  3. Sometimes we find the reference goes wrong, for example: In SRM571, my solution of Div-I Hard: CandyOnDisk goes wrong. During the challenge phase, Petr submit a challenge and find the reference solution gives out a wrong output. The story after that is: admins fixed the reference solution and a re-judge was made. (is that fair?) It's hard to make a decision when this kind of things happen.

So I think we should make some rules before the contest: Does the writer have permission to modify the test case? What should we do if the reference solution goes wrong? What should we do if the system goes down during the last 5 minutes of challenge phase? ...

And I agree with Gassa's suggestion: at least we should publish all modifications made during the contest. (By the way, I heard that: test cases of ACM/ICPC Finals will be destroyed after the contest, is that true? If so, then why don't we publish them instead?)

I'm happy to see people discuss this kind of issues here.

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +16 Vote: I do not like it

    Thanks for your support!

    I think creating hard rules for what happens when we discover that the reference solution is wrong might not be easy. So I think that publishing the motivation after the decision is made, like Gassa suggests, is a good proxy for having the rules without actually having them :)

»
12 years ago, # |
  Vote: I like it +32 Vote: I do not like it

CF is an interesting game,isn't it? Without rules we can't fully enjoy it. No matter the result is judges can add tests against submitted solutions during the contest or not , as long as rules are definite,it's fair,and I think we all can accept that. Don't take it seriously,just have fun.

»
12 years ago, # |
  Vote: I like it +29 Vote: I do not like it

This is the most interesting event I have seen in CF...

Standing in the perspective of both of Petr and WJMZBMR... I think that what they said both are reasonable.. So I think the most important thing is not argue with the result of this game. The passed things were history, CF should give a clear rule to solve the problem to make these kind of things not happen any more.

This is only a game. If it happen to me, I don't care about it if it's nothing to do with my money...

BTW, reading this blog and all of the replies is one of the way to improve English :)

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +43 Vote: I do not like it

    This is the most interesting event I have seen in CF...

    That's just because you've never seen Russian part of CF ;) There are lots of interesting events and lively discussions there (sometimes they are too lively).

»
12 years ago, # |
Rev. 3   Vote: I like it +99 Vote: I do not like it

Our whole Codeforces team has been tracking the discussion of the described case. We've considered the community opinions and we've added our own ideas and we now want to say:

  1. In a perfect world we should use your ideas and completely agree with you. Unfortunately, the world tends to be imperfect. When a group of authors is really motivated to prepare the contest, then no such need arises. For instance, no similar idea has been thought of the Saratov ACM-ICPC subregionals as everything is ready much earlier and in full. But the rounds are regular and the authors are different. Not all of them are super skilled or super motivated. In this case the problem is weak tests.

  2. The jury's goal is to face every situation deliberately and reasonably and act, whenever needed, to improve the contest as much as possible. In this case we are sure that the added test didn't contain any specific patters against this solution. Most probably, the point was fixing a bug in the tests after viewing this solution.

  3. We consider it okay to fix a bug in a problem after some participant's feedback. In this case it was a solution attempt. Don't worry about biased jury. Trusting the jury is an important part in programming contests. What will become of these contests if we stop trusting Andrew Stankevich to work in the NEERC jury? Let's trust ACM-ICPC World Finals jury members. You shouldn’t ask “who knows who they are and how they choose problems?”

  4. Petr has offered this principle, because it guarantees the impartiality and objectivity. But the human factor in the preparation and holding rounds always exists. Authors makes problems and problemsets according to their preferences. Maybe the author consciously chooses a problems, making them easy for his friends? Should we allow to participate his friend's in his round? Should we get the problems for a round from 5 different writers from 5 different countries? Or come up with some kind of limitation that reduces the possibility of such a partiality. Petr proposed the principle of the same kind. We should trust the jury as we trust participants.

  5. It is also important to understand that we try to act objectively and impartially. Jacob submitted a wrong solution and received WA. The test had a general character and its solution simply pointed the testset's defect. This is of course an isolated case and we will hope it won't have any reason to happen again.

  6. I want to repeat — this situation is an exception, rare untypical case.

  • »
    »
    12 years ago, # ^ |
      Vote: I like it +15 Vote: I do not like it

    Thank you for sharing your motivation with us!

    I believe the effect of that is more important than the particular decision.