Suggestions about pretests and systests

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	157
6	Qingyu	156
7	djm03178	151
7	adamant	151
9	luogu_official	150
10	awoo	147

The purpose of pretests is so that the judge queue is not be overloaded during the contest. At first I thought this means that the judge never runs the systests until the end of the contest. However, after setting a few contests, I realize that the judge actually grades the systests during contest time, and as problemsetters we can even see in real time who will FST by the end of the contest.

I assume this means that the grader prioritizes grading pretests and when it has "free time" it grades the systests. Of course, this is a good system to prevent the judge queue from clogging if it ever becomes too long. However, for the four contests I've got the chance to observe the grading process, the systests does not seem to have any backlog, and almost all have been graded except those which just came in. This is probably because almost everyone uses multitests now and because of that we can usually fit the systest into the pretest.

Which brings me to my suggestion: if the grader is free enough to grade systests, should it report the verdict to the contestants? If I were to imagine, it would go something like this. First the grader prioritizes grading pretests and reports the result to the user. If it is free it grades systests and reports the result to the user sometime afterwards. If the queue is bad, it could mean that you would get your systest verdict much later than your pretest verdict.

I think the biggest benefit is preventing FST, which are not only very frustrating and to some extent luck based (whether the problemsetter decided to put a specific case in pretests or not). However, some problems I can think of:

There will be no guarantees how soon you might get your systest verdict, and is it fair that some will get know they FST after 1 minute and some will know they FST after 15 minutes?
How do we handle hacks, do we grade new testcases immediately? And do we FST all who fail?
This probably is not good for those who enjoy hacking

What are your thoughts on this?

Comments (35)

Show archived | Write comment?

oolimry

3 years ago, # |

Auto comment: topic has been updated by oolimry (previous revision, new revision, compare).

→ Reply

Olympia

+99

I know several people who are quite fond of the pretest/system testing systems: they think it’s exciting to not know for sure whether the solution is right or wrong. Personally, FSTs frustrate me. I think it’s silly to fail someone who basically had the right idea but forgot to use ‘long long‘ and used ‘int’ instead (and thereby FSTed).

Furthermore, people who are hacked during contest have a competitive advantage because they have time to fix their mistakes.

It’s quite fascinating to hear that system tests get ran more or less in ‘spare time.’

jakubd

3 years ago, # ^ |

-10

that's the point. to make this more interesting. if you were doing some contest, where you don't have pretests and you get a WA, 2 things can happen:

you lose time finding the bug, therefore likely solving less tasks
you jump immediately to the next problem, essentially same result as getting a FST

so majority of time pretests are actually better

timreizin

+81

I believe that hacks during contest are a vestige and should be removed. This way we deal with the second problem of "in-contest systesting".

CelioPassos

+100

Unpopular opinion, but I think pretests should be made weaker, not stronger, in order to discourage guessing and proof by AC. Pretests should be there just to catch uninteresting and annoying bugs like swapping $$$N$$$ and $$$M$$$ by mistake, corner cases, etc. They shouldn't check if your solution is $$$O(N^2)$$$ instead of $$$O(N)$$$ or if your greedy is actually correct.

Everule

+72

I would completely agree with you, if it was possible. But usually weak pretests also don't catch silly bugs, and its just frustrating playing the lottery of does the bug trigger in pretest or systests.

Then pretests should be just strong enough to catch silly bugs, and nothing else.

kal013

+140

There should either be strong pretests or no pretests at all.
Why should uninteresting bugs be treated as 'less wrong' especially corner cases? Also, who would define a bug to be interesting/uninteresting -- it seems very arbitrary.

← Rev. 2 →

-74

Why should uninteresting bugs be treated as 'less wrong' especially corner cases?

I didn't say they were 'less wrong'. Technically they are as wrong as any other bug, they are just uninteresting and annoying in comparison to other stuff like whether or not your solution has the correct complexity.

Also, who would define a bug to be interesting/uninteresting -- it seems very arbitrary.

Bruh, just use common sense. Any decision when it comes to organizing a contest is arbitrary, or do we have exact rules to decide if a problem set is good or bad?

+39

I didn't say they were 'less wrong'.

Giving information about some bugs over others implies that the uninteresting ones were 'less wrong' and deserve a better chance at AC.

Bruh, just use common sense.

I seem to lack common sense. Hopefully, you can help me with this example:

Consider a problem requiring a dp optimization and two submissions, one without said optimization and another with a slightly wrong implementation such that the complexity analysis no longer holds. What would you do?

Even if a 'correct' answer does exist, I'd rather not have to hope for the setter having as much common sense as you do.

Any decision when it comes to organizing a contest is arbitrary, or do we have exact rules to decide if a problem set is good or bad?

There do exist general guidelines such as not having a spike in difficulty, not easily google-able etc. Anyways, it proves my point -- why add an extra layer of subjectivity to the contests?

← Rev. 3 →

-42

I'd be in favor of adding pretests to catch the slightly wrong implementation, though it's unlikely problem setters would even come up with all possible slightly wrong implementations.

To make hacking great again and, as I already said, avoid proofs by AC.

My point was that such submissions are not differentiable by test cases (i.e. both would either fit the time limit or not). So you have to either include an extreme test in pretest which would fail both solutions or both of them would pass pretests.

To promote that, there could be a no pretest system as in TopCoder. Trying to get some sort of ill-defined middle ground would just create more problems than it solves.

-38

I never said I was against extreme tests in pretests (ok, maybe I wasn't clear enough). I am in favor of it actually. What I am against are tests specifically designed to catch outright wrong solutions. For example, let's say there is some quantity $$$X$$$ in a problem (that is not part of the input) which some people are likely to assume is $$$O(N)$$$ when actually it is $$$O(N^2)$$$. I don't think setters should create a test case where $$$X \approx N^2$$$, unless the expected time complexity of a solution is $$$\Omega(N^2)$$$.

-39

Yes, everything should be either black or white like this. No middle ground... Let's apply such reasoning to other things. What's your opinion about DP problems? Do you like them? How many problems in a contest should be about DP? Do you have an answer or some ill-defined middle ground? Should we choose between all DP or no DP then?

You don't need a well defined middle ground as long as you use common sense, ffs.

+12

I am in favor of making things as objective as possible.

The example you give does not follow my statement, "Trying to get some sort of ill-defined middle ground would just create more problems than it solves." In your example, having an extreme take would have an obvious detriment to contest quality.

The example I gave (a real-life one, mind you) is one of the many pitfalls of the middle ground you wish to implement in pretests. The comparison between them and the benefits you mention is not so obvious.

Often, the intended solutions can require slight optimizations ($$$O(N^ 2 \times 2^N)$$$ to $$$O(N \times 2^N)$$$ say) to have the intended complexity. Do you expect participants to be able to judge the constant factors and estimate whether the solution fits the time limit without a max-test?

The only real 'benefit' of your proposal would be cases where a participant would have usually been able to come up with the correct solution within the contest duration, given they get the information of (TLE/WA etc.) on their unproven submission, would FST instead.

Why do you consider this to be the correct approach given the participant has the ability to come up with the correct solution and gets a penalty for the incorrect attempt?

You do seem to use common sense as an argument a lot, especially in a discussion involving subjectivity and a little ambiguity.

I have spent enough time in this discussion and have hopefully put my point across; everyone is welcome to their opinions. I will not be partaking in this discussion any further.

-32

It only creates problems for people who guess solutions. You haven't stated any other problem until now.

Do you expect participants to be able to judge the constant factors and estimate whether the solution fits the time limit without a max-test?

I just said above that I am in favor of max-tests.

The only real 'benefit' of your proposal would be cases where a participant would have usually been able to come up with the correct solution within the contest duration, given they get the information of (TLE/WA etc.) on their unproven submission, would FST instead.
Why do you consider this to be the correct approach given the participant has the ability to come up with the correct solution and gets a penalty for the incorrect attempt?

Is this participant able to come up with a proven correct solution in contest time? Or an unproven correct solution? If it's the former, my approach does not affect them. If it's the latter, then they are just guessing. Should they be rewarded for guessing correctly, as in a casino?

We're not robots following instructions to launch a rocket. Human beings can work with loosely defined concepts. Although you may be an exception.

+26

If your arguments had any merit, you wouldn't have to resort to ad hominem attacks. Please keep the discussion civil.

-40

Sorry for calling you a robot, sir.

-is-this-fft-

It may be hard to define formally but intuitively, there is a pretty big mistake between random mistakes and creating a wrong algorithm by guessing.

bIeah

This is what happens when you Come to Brazil!.

Geothermal

+61

I completely agree with the prior comments by kal013. In particular, I don’t think what you’re describing is “common sense” because this discussion indicates that your sense of what “should” FST/get rejected on pretests is certainly not common/shared between participants. This means that even in your ideal world, the set of pretests will be influenced by the author’s subjective judgment about what should FST.

A few additional thoughts on this:

There’s been a lot of discussion about the element of subjectivity that this would introduce, and you’ve made the argument that subjectivity is not a problem because problemsetting is already a subjective process in numerous ways. However, the difference between existing forms of subjectivity and subjectivity in the choice of pretests is that the latter is hidden from the contestants. You suggest that your proposal would reduce the element of guessing in programming contests, but in fact, it would force contestants to make more guesses, rather than fewer, about things like “is there a maximal test in pretests?,” “would the pretests fail me if I implemented my optimization wrong?,” “would pretests have caught me if this observation was incorrect?,” and so on.
Outside of having no pretests, there is no way to be perfectly consistent with the principles you have given. In practice, almost any pretests will fail a bad greedy, so under the system you’ve proposed, the most likely situation is that greedy solutions that are very incorrect (i.e., pass only a small proportion of test cases) will fail pretests while greedy solutions that are almost correct (i.e., only fail on a handful of edge cases) will FST. It’s unclear to me why you should be penalized more for writing an almost correct solution than for submitting a solution that is clearly wrong.
You suggest that implementing this system would reduce the element of guessing in competitive programming. I can guarantee that people will still attempt proofs by AC under the new system; the change would just make the stakes of this guessing game much higher.
This might be a hot take, but I don’t actually think the existence of proofs by AC is a problem in the status quo. First, I think there’s inherent value to being able to intuit a correct solution faster than you can prove it formally. Second, and more importantly, you are already rewarded for justifying your solution ideas because writing an incorrect greedy wastes a large amount of time and costs you the penalty points for your wrong submission. In any individual contest, one might be rewarded for guessing a correct solution without any proof, but in the long run, competitors who are more able to analyze their solutions in order to determine whether they are correct before writing any code will be more successful/earn higher ratings. As a result, it’s not as if there’s no penalty for blind submitting without thinking about the correctness of your solution. The only change weakening pretests would make is vastly increasing the penalty for writing slightly wrong solutions while simultaneously leaving the penalty for writing very wrong solutions fixed. Again, this makes little intuitive sense to me—why is submitting a greedy that fails on a particular edge case worse than submitting a solution that is the wrong time complexity or that fails on a wide range of possible tests?

-19

You suggest that your proposal would reduce the element of guessing in programming contests, but in fact, it would force contestants to make more guesses, rather than fewer, about things like “is there a maximal test in pretests?,” “would the pretests fail me if I implemented my optimization wrong?,” “would pretests have caught me if this observation was incorrect?,” and so on.

In my proposal there would no such questions:

I already stated that I am in favor of having maximal tests in pretests.
I already stated that I am in favor of having pretests to catch slightly wrong correct optimization.
It's very clear that I am against pretests to catch wrong assumptions, so the contestant should assume no, that there are no such pretests. Also, it's part of whole purpose of making guessing riskier.

Outside of having no pretests, there is no way to be perfectly consistent with the principles you have given. In practice, almost any pretests will fail a bad greedy, so under the system you’ve proposed, the most likely situation is that greedy solutions that are very incorrect (i.e., pass only a small proportion of test cases) will fail pretests while greedy solutions that are almost correct (i.e., only fail on a handful of edge cases) will FST. It’s unclear to me why you should be penalized more for writing an almost correct solution than for submitting a solution that is clearly wrong.

Is it a huge issue if we're not perfectly consistent? I never set a contest, but I was under the impression that creating tests were like: 1) generate a bunch of random small tests, 2) generate some random big tests, 3) add special tests to catch specific things; and what differentiates weak tests from strong tests is basically part 3). I am only suggesting a different approach to that part, it doesn't seem like it would make a big difference for the case of an almost correct solution compared to current approach (it should fail something on part 3 if the setters foresee it, or something on part 1 by chance). How is the current approach better in this case? This seems more like argument against system tests in general.

You suggest that implementing this system would reduce the element of guessing in competitive programming. I can guarantee that people will still attempt proofs by AC under the new system; the change would just make the stakes of this guessing game much higher.

True, but that's part of my point. Making it riskier.

Again, this makes little intuitive sense to me—why is submitting a greedy that fails on a particular edge case worse than submitting a solution that is the wrong time complexity or that fails on a wide range of possible tests?

This shouldn't happen, if I'm not completely wrong in what I just wrote.

-16

I referred to "common sense" when I was talking about differentiating a correct solution with a random bug from an outright wrong solution originated from a guess, not on what should FST (in this I was clear, I want the former to fail pretests and the latter to fail system tests).

But yeah, I guess I am more ok with some extra subjectivity from the setters than others. :(

Xellos

They should, but only marginally. Put in one handcrafted maxtest and one random maxtest instead of making pretests 5-20 be maxtests. Multitest 50 small cases instead of 10000. That kind of stuff.

The problem with your idea is that if you're making very weak pretests like that, it's very easy to make very weak pretests that check even less and can pass complete nonsense rather than filtering it out.

spookywooky

The whole concept of pretest and systest is questionable. For perfect user experience we would have all test as pretest, and they would run very fast.

It is obvious that this limits the number of tests.

So there is a need to compromise a solution, where borders are: Number of submissions, number of tests per submission, complexity of tests and available calculation power, expected tolerance on queue time.

As we see at atcoder, it is possible to get rid of systests if willing to compromise the other parameters.

Atcoder mostly seems to compromise by asking us to count stuff which is usually fairly easy to test.

Um_nik

+17

The slowest part almost always is running the solution, not running the checker.

+45

What I mean is that in certain counting tasks (especially 1 int as input, 1 int as output), there is almost no possibility of a WA if your solution correctly solves one test case of the form "random large integer", so you can get away with having very few tests.

+14

Fair point, I misunderstood you.

intrusiv

Funnily enough, every respectable olympiad employs a system of full feedback-ness. Although, tehnically, codeforces and a CMS hosted olympiad are totally different, I yet cannot-- it is beyond my comprehention to understand why we wouldn't recurr to a system of almost to none systests.

I mean, how many times have you seen a counter example that spans thousands of characters, running it clogging up the queue harshly? And, fair enough, for each problem there may be one or two, but honestly, as far as I have encountered and written, maximal tests cannot be as many as to provoke a catastrophe (unless one overlyexagerates). I absolutely beg someone to give a counterexample to this statement.

TL;DR, why are there people promoting this system, it is beyond me. Please explain :(

+25

It's more fun.

I thought gambling your generally percieved value by trying to do often enough shit math puzzles or sorting out casework minutia or hoping to be mentally able to solve the problems that are in your preffered difficulty range that would propell you standing in a short span of time was fun enough ._. Apology, this didn't fit my view.

zghtyarecrenj

CNOI: I have no feedback. :D

djm03178

There will be no guarantees how soon you might get your systest verdict, and is it fair that some will get know they FST after 1 minute and some will know they FST after 15 minutes?

I think this should be avoided, like, if we're to implement such feature, the judge system must finish the whole system test process of a problem for all submissions and then report the result to all contestants at once, and then always do the system test right away for all submissions that are made afterwards. I'm not sure if CF judge system can handle it fast enough for this to be actually useful, though.

dXqwq

At least i hate pt!=st(without hacks and different seeds)

The purpose of pretests is so that the judge queue is not be overloaded during the contest.

Perhaps someone disagrees.

oolimry's blog