chubakueno's blog

By chubakueno, history, 9 years ago, In English

I have always wondered what is the philosophy behind those ICPC problems that don't specify the number of test cases. What are the problem setters trying to evaluate, and how should I respond? Obviously, enough test cases will TLE out my solution.

One of the great things about programming contests is choosing the right algo for the right job, say, I wouldn't go full Fast Fourier Transform if simple convolution gets the job done. So, should I respond to those problems with the best algo I can think of? Should I respond with a provably optimal algo? Or, should I reverse engineer the problem-setter expectations and choose something along those lines?

This blog is intended for discussion, please don't understand this as a rant :)

  • Vote: I like it
  • +92
  • Vote: I do not like it

»
9 years ago, # |
  Vote: I like it +12 Vote: I do not like it

There are some cases, when specyfing he number of tests can mislead participants.

If you are problem setter and you set number of test cases low, you narrow space of all test cases to only big ones. If you set it high, you can come up with a lot of small test cases, but it can be confusing (no. testcases * max. constraints can exceed any "feasible" number). If you tried specify e.g. sum of input length, you would mislead participants completely, if a solution is not linear.

So, in my opinion, sometimes it's just better and safer for problem solvers not to specify number of testcases.

  • »
    »
    9 years ago, # ^ |
      Vote: I like it +14 Vote: I do not like it

    I don't see why the sum of input length is ever bad, since the bounds on each case should tell you what time complexity you need. Moreover, leaving out the number of testcases could very easily lead to a suboptimal solution being submitted and TLEing.

    • »
      »
      »
      9 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      In CERC, we were told that we should assume there's one test case per file when assessing the chances of passing with given complexity. IIRC, it worked well that way.

      If there's a lot of large tests per file, then specifying input size or their number is in order; otherwise, it's not necessary. That's up to problemsetters' judgement.

      • »
        »
        »
        »
        9 years ago, # ^ |
          Vote: I like it +26 Vote: I do not like it

        Yeah, I remember some of your CERC problems, with an easy solution in n·2n (n ≤ 20) and 1000 test cases. I suggest firing such problemsetters.

        • »
          »
          »
          »
          »
          9 years ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          Note that I was only talking about CERC 2013-2014 ("we were told"); I checked back, there was no such problem there.

          Actually, I checked back longer. Was it 2011? Apparently, CTU in Prague organised it back then. That's a technical university, and their subregionals are pretty bad. I'd totally expect a fail like this from them.

          Also, http://codeforces.net/blog/entry/19122?#comment-240233 — the venue (and organisers) changes occassionally.

          • »
            »
            »
            »
            »
            »
            9 years ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            I think it was 2012, maybe not CERC, but some subregional contest from that region.

            • »
              »
              »
              »
              »
              »
              »
              9 years ago, # ^ |
                Vote: I like it 0 Vote: I do not like it

              That was exactly problem A from CERC 2012 organized by people we are talking about.

  • »
    »
    9 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    You have an interesting point, since it may be necessary to include a decent amount of "hack tests" together with the time complexity tests. Those quadratic expected time problems are the ones I was talking about, since it is quite easy to TLE an N^2 with a hundred or so test cases when N=1000. In this regard, Xellos comment is a good advice in how to tackle those problems, I think I will take his advice!

  • »
    »
    9 years ago, # ^ |
      Vote: I like it +5 Vote: I do not like it

    It's nearly always better and safer to specify a single test case per execution and set time limits accordingly (i.e. how Codeforces does it).

    The effect of multiple test cases is forcing the contestant to effectively guess at how many of those are big and how many are small, and it makes it difficult to test runtime for the worst case to see if your code has chances of passing or not.

    I frankly don't know why multiple test cases are still a thing.

    • »
      »
      »
      9 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      To my knowledge, ICPC regional problems are (nearly)always judged against a single test file that almost always has several test cases(sometimes you know how many, sometimes you dont). I don't think that this will change. It is quite frustrating not to know if your solution will TLE or not, even more when the whole problem set has that "several test cases" style. I think the best(nut necessarily safer) way to go is to assume that just 5-10 will be big, because else it wouldn't make much sense not to say how many test cases has the input.

      • »
        »
        »
        »
        9 years ago, # ^ |
        Rev. 3   Vote: I like it +5 Vote: I do not like it

        I wouldn't be so hasty as to say this won't change. World Finals changed it a few years ago, and it would be natural for most regional organizers to follow suit... eventually.

        • »
          »
          »
          »
          »
          9 years ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          quantum computing comes before. JK, its a possiblity..

»
9 years ago, # |
  Vote: I like it +13 Vote: I do not like it

Yeah, that is really frustrating. I know one of main organizers of CERC and I have ranted about that habit many times. He assured me the thing that Xellos said, but it's not always as it. I once saw one problem when there were <=100 testcases, and I thought that it should be like in CERC, but it wasn't and all testases within one file where maxtests >_>. It is frustrating and shameful that such systems still exists.

  • »
    »
    9 years ago, # ^ |
      Vote: I like it +8 Vote: I do not like it

    There is no "main organizer of CERC" and there is no such habit in general. You probably have a specific CERC venue in mind. The look of CERC depends very much on the organizing university, and only a few of those had these bad habits.

    • »
      »
      »
      9 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Yes, yes, of course I meant last three CERCs in Cracow (only ones I have participated in :P).

»
9 years ago, # |
Rev. 2   Vote: I like it +52 Vote: I do not like it

Multiple test cases in a single input is OK in one specific setting: when a single test case is solvable very quickly and you want to, and can reasonably, throw 10,000 test cases at the contestant's solution. (Interactive problems with 10,000 consecutive queries can also be seen as falling into this category.) In all other situations there are no good arguments to have multiple test cases in the same run.

Multiple test cases in a single input are absolutely not OK under circumstances that make it impossible to determine whether a specific solution is good enough to pass. History has shown us time and time again that there will be plenty of cases when this misguides the contestants in one of two ways: sometimes a solution that seems too slow still passes, and sometimes a solution that seems good enough doesn't. I have even seen contests where both of these happened at once.

In both cases, this turns an algorithmic contest into a "guess what the organizers mean" contest. Already one such problem in an entire problem set can significantly damage the validity of results of the entire contest: some teams are lucky when guessing, some teams are not, and an unlucky guess will cost you an awful lot of time and effort. Using such problems is extremely bad and it should be avoided.

So, how should you respond when problemsetters do that? During the contest, take your best guess. After the contest, complain as much as you can. Inexperienced problemsetters often don't even realize that what they are doing is wrong.

»
9 years ago, # |
  Vote: I like it +83 Vote: I do not like it

I believe that programming contests are not about "guess which tests jury has". The reason why problems have strict constraints for all variables and sizes is to give participants an opportunity to be sure that invented solution is efficient enough to pass all the tests. And the opposite: that easier but not so efficient solution will not pass the tests.

I think that constraints for multitest inputs should be completely formal as well. Jury should have a solution which can pass any multitest input which satisfies constraints. It is good way to think about it in term of Codeforces rules: problem statements should be enough to write formal validator. A test (miltitest) can be included to testset if and only if it passes the validator. It should be impossible that someone can invent a test satisfying validator, but not suitable for a jury solution.

There are two (connected) reasons why multitest cases can be used:

  • To speedup testing (reduce chance of judging queue). Example: easy problems on Russian Code Cup always use multitests;
  • Multitests give jury a way to test participant solution on really great number of tests, it is something like stress-testing. It is about 10000 small testcases which nearly for sure cover all possible small cases.

I like an idea to specify strict constraints for single test case, for number of test cases in input and guarantee that sums of sizes/values do not exceed bounds for single test.

Good example is:

The input contains one or multiple sets of test data separated by single line breaks.

Each set starts with a line that contains two integers n and m (2  ≤  n  ≤  2·105, 0  ≤  m  ≤  2·105), where n is the number of cities in Berland and m is the number of unpaved roads... The rest of graph description follow...

It is guaranteed that the total number of roads in input doesn't exceed 2·105, and the total number of roads also doesn't exceed 2·105. The number of sets in input doesn't exceed 10000.

Such input description says: "OK, it is enough to write solution which will fit in TL just for single test, because many small tests will require not greater time than one huge test.". Also it gives jury the way to test a solution on 10000 small graphs and well as on a single huge graph.

  • »
    »
    9 years ago, # ^ |
      Vote: I like it +5 Vote: I do not like it

    I agree very much with general message — this message and misof's one perfectly express all my complaints for previous 3 years :P. However unfortunately giving a constraint on a sum won't solve that, because sum of sizes can tell not that much about runtime. Consider such example "n ≤ 20, t ≤ 1000, sum of n ≤ 10000". If I saw such constraints on a Cracow's CERC then I would think "ok, so probably vast majority of testcases will have n<=10 and few n=20", but it can also mean 500 testcases with n=20 and that is a big difference if intended solution is O(2n).

    • »
      »
      »
      9 years ago, # ^ |
        Vote: I like it +5 Vote: I do not like it

      I like an idea to specify strict constraints for single test case, for number of test cases in input and guarantee that sums of sizes/values do not exceed bounds for single test.

    • »
      »
      »
      9 years ago, # ^ |
        Vote: I like it +5 Vote: I do not like it

      But main idea is: jury should have a solution working on any test satisfying constraints + jury should do their best to have all principle tests satisfying constraints. I.e. if a participant see that some test is formally valid, he should be sure that jury has solution working on it and it is expected that the test (or a similar test) presents in the testset.

  • »
    »
    9 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Agreed, just wanted to note that jury also should not forget degenerate testsuites that formally correct, for example 1e5 tests with 2 vertices in your example.