A Type of Tests That Might Be Effective for Many Multi-Test Problems

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	157
6	Qingyu	155
7	djm03178	151
7	adamant	151
9	luogu_official	150
10	awoo	147

I've always loved hacking in CF rounds, especially for open hacking and uphacking. Recently, I've been focusing on hacking for TLE because there are plenty of solutions that can pass every single randomly generated tests but can still have broken worst case time complexity and the tests for that can only be found under deep investigation on the code.

One of the findings that I saw during hacking in recent rounds, is that there is a fairly common mistake that many people make but most problems don't have tests against it. The conditions for such mistakes are:

The problem must be a multi-test problem (around $$$10^4$$$ or more if possible).
The constraint on the $$$n$$$ variable of a single test should be as large as the constraint on the sum of $$$n$$$ in all tests.
The solution has both of these slow aspects described below but none of them are that slow to get TLE when only one of them is attacked.

Condition #1 and #2 are very common in recent rounds so it's not hard to find two or more problems of such types in every round. Condition #3 is usually found when you just sort accepted solutions by execution time.

So, the slow aspects are:

Slow operations for every test case: Most commonly, using endl (or just alternating cin and cout without tie(0)) or initializing a whole array with memset or defining a vector with maximum $$$n$$$ size, etc.
Bad time complexity or bad constant but still fitting in TL.

Now you can see what I'm talking about. Most such problems have tests that are against #1 or #2, but not both. #1s are just easily be attacked by any random test with maximum number of test cases. Attacking #2s may require further preparation from the writer to come up with specific edge cases, but usually it is pretty well-done. However, I have never seen a round myself where they prepared a case that can counter both.

For example, let's see a problem from a recent round: 1923D - Slimes. The constraint on the number of test cases is $$$10^4$$$ so the #1 constraint is satisfied, and we have $$$n \le 3 \cdot 10^5$$$ so the #2 constraint is also satisfied.

Now let's take a look at one of the hacks I made for it: https://codeforces.net/contest/1923/hacks/998199. If you look at the submission (247991883), you'll see that the solution was accepted before the hack while the original test set already had a test with maximum number of tests (test #2-4) and various maximum $$$n$$$ tests (tests #8~). The maximum $$$t$$$ test took 1794 ms (test #3) while the maximum $$$n$$$ test took 639 ms (test #18).

The reason why it took long on maximum $$$t$$$ is simple: it calls con() for every test, which does some calculations that take $$$O(N)$$$ time where $$$N$$$ is the maximum $$$n$$$ possible. Therefore, just by having $$$10^4$$$ tests the code will perform like $$$10^4 \cdot 3 \cdot 10^5$$$ lightweight operations, but it's still fitting in TL. It is likely that test #3 and #4 will also reach almost at the limit of sum of $$$n$$$ in all test cases, but they really didn't add up much, because each $$$n$$$ is too small.

For maximum $$$n$$$ tests I didn't even try to find anything special about the worst case test, though if anything something with a series of small answer would have been more effective by the looks of the tests.

So, here's what I did: https://codeforces.net/contest/1923/hacks/998199/test. If you look at the generator, it's simply making $$$9999$$$ test cases with $$$n=1$$$, and a single $$$n=300000-9999$$$ test which wasn't even against its weakness (it was made for another submission before). A $$$n=290001$$$ test shouldn't be much different from a $$$n=300000$$$ test, but having $$$t=10000$$$ itself caused huge slowdown. So you know what happened: Successful hacking attempt.

Similar situations can occur in almost every problem with similar constraints. Therefore, I think it is something that future writers should consider: Add several tests of such type for such problems. I hope this blog would help strengthening main tests against these solutions that really should not pass even without my hacks.

Comments (5)

Write comment?

djm03178

12 months ago, # |

Auto comment: topic has been updated by djm03178 (previous revision, new revision, compare).

→ Reply

Bungmint

djm ORZ

vstiff

← Rev. 2 →

-31

I dont this kind of fails will contribute something interesting to the algorithms itself, which is CP all about. Instead it will add frustration to those who solved the problem but missed some minor implementation aspect and possible restrict use of languages where it's hard to precisely control such things. I'd prefer problems to have limited number of test-cases per launch so that actually solving the problem itself would be valued more that knowing the low-level stuff like nuances of input-output in your the language.

I mean, low level shuffling is of course fun, but still the focus should be on algorithmic aspect of the solution.

2147483648

12 months ago, # ^ |

+16

Then let's remove all tests, lol.

Sure

Btw, for existing problems with huge inputs I agree with djm03178 that it makes sense to include tests against it, so that new people will discover it during practice not in tournament

djm03178's blog