Possible bug in checker when running PyPy3

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	157
6	Qingyu	156
7	adamant	151
7	djm03178	151
9	luogu_official	150
10	awoo	146

Is the checker occasionally failing to detect that a test has completed, and so giving a TLE?

In Codeforces round 727 problem C, when the system tests were run, I got TLE on my submission on test 18. Looking at the results my code printed the result before getting the TLE, and my code should exit immediately after printing the result. At first I assumed that I was just unlucky, and my code had taken precisely one second to run, so had been stopped just before exiting.

After the contest I tried modifying my code in various ways to speed it up, for example https://codeforces.net/contest/1539/submission/120193428 and https://codeforces.net/contest/1539/submission/120195587. In all cases I got exactly the same result; my code printed an answer but still got a TLE. It seems extremely unlikely that every single version of my code would take precisely 1 second to run.

To confirm this, I tried running my most recent version under Python3 instead of PyPy3. This is normally slower, but on this occasion passed all tests without any TLEs. See https://codeforces.net/contest/1539/submission/120195637

Comments (6)

Show archived | Write comment?

aberent

4 years ago, # |

Auto comment: topic has been updated by aberent (previous revision, new revision, compare).

→ Reply

Shantanu

Exactly same happened to me :(

ssvb

It's possible that your code finished running, but it was measured to take more than 1 second of CPU time to complete the job.

Killing processes that are running too long can't be done immediately and with perfect timing accuracy, so there may be some safety headroom (for example, forced killing may kick in after some delay). And if your code finished on its own without getting killed, then its time may be still compared with the time limit for deciding whether it was AC or TLE. Wallclock time vs. CPU time is another thing that may make everything a little bit more complicated.

4 years ago, # ^ |

See my reply to @bjy. What is surprising is not that this happened once, but that it kept happening identically even with significant changes to my code. These changes should have significantly changed the performance for either better or worse.

bjy

To your question about 'precisely 1 second' -- all that indicates is that submitted programs get cut off once they hit/exceed the time limit ("go home, yer drunk, closing time is NOW" etc.)

Otherwise, there are already a number of individual posts and comments in the contest threads about the cause of the TLE itself concluding that the 32-bit version of pypy here at codeforces gets slow with large numbers (and sorting piles of them, for instance)... see https://codeforces.net/blog/entry/90184 -- there are some other posts/comments about i/o too, but my extremely limited (local, lazy) testing hit a good-enough-for-me point of diminishing returns whenever I switched to sys.stdin.readline a while back... ymmv! (or at least, there's the custom invocation feature to do better tests)

Fwiw, searching a few related terms pulls up posts from about 2 years ago referencing their approach to sandboxing user-submitted code on windows: said approach works on 32-bit but the technique/vector they use to alter/block system calls is itself blocked/unavailable on 64-bit.

tl;dr it's related to a site-specific system thing, and without any grand announcements of roadmaps and timelines (or linux boxes? or maybe a centralized faq of versions and intended upgrades...), I don't suspect it'll change any time soon, which is a bummer, but understandable.

Yes, the TLE could be caused by the installed version of PyPy being 32 bit, and "precisely one second" might not actually be 1000ms. My impression in the past, however, has been execution is normally cut off within a few ms of the time limit (otherwise one would see printed results combined with a TLE far more often).

What was surprising is that every single version I tried gave the same behaviour of printing the result and then failing with a TLE. There were some significant differences between the different versions (e.g. changing between loops and list comprehension and switching to using itertools.accumulate and bisect to find the final answer). These changes should have significantly altered the performance, and did on the previous tests. Given this I would have expected at least one of my 6 versions to either pass test 18 or fail with a TLE before printing the result.

aberent's blog