In yesterdays ACL1 I got WA on problem B. It turns out that compiler ignored the function call inside assert
, the same code with function call not in assert works fine. Moreover, Merkurev explained that the reason for that is -DNDEBUG
flag, and my submission works on GCC.
I don't understand computers, I just memorized some stuff that's enough for solving puzzles. I want to ask why the -DNDEBUG
flag used in Clang, but not in GCC. rng_58?
Me neither, so I'll ask engineers.
It seems there's no strong reason (someone suggested these options so we followed the suggestion). Do you suggest to change something (like removing the option)?
Sorry for the unlucky situation in yesterday's contest (luckily it wasn't rated for you).
I don't know. I just like to solve puzzles, and if we can set some options which will be used by most platforms by default that should be beneficial for me as I don't have to think which compiler options are used in this particular platform. Looks like some people (now including me due to influence of my teammate) believe that asserts should be executed. It feels like this is true for most platforms, but I'm not sure as I don't usually look at the compiler flags until something unexpected happens. If this actually a standard I think it might be better to remove the option, but I should be the last one to ask if this is actually a standard or not.
"Compilator" — Um_nik, 2020
Yeah, I also don't know what's the right English word for many things, I just used Russian word because I thought that the Russian word came from English directly.
He doesn't understand English, he just memorized some stuff that's enough for solving puzzles.
Does anybody understand English?
I'm not sure you understand the meaning of the word "understand".
Um_nik by puzzles do you mean tasks?
I just memorized some stuff that's enough for solving puzzles
Weird flex but ok.
this is the exact opposite of flex
I'd say
NDEBUG
shouldn't be defined by an online judge.The flag turns assertions into no-ops. The standard caveat of an assertion is that, when you disable assertions,
assert(fun(n))
discards thefun(n)
part as well.In contests, people tend to put all sorts of checks inside assertions. More importantly, often the contestants make debug submissions relying on the
assert
working, and as far as I know, it's considered a good practice in contests. DefiningNDEBUG
system-wide is therefore counter-productive. The user can always define it explicitly if they want, maybe even as part of their#ifdef LOCAL
or#ifdef ONLINE_JUDGE
template.Why would it be considered good practice to rely on something that is supposed to be disabled in the the final build – at least in a professional context? Assertions should be used to check pre- and postconditions, not to change the state by calling a mutating function. This seems like a misuse of a language feature.
You can also reenable
assert
by putting#undef NDEBUG
before you include the headercassert
, so your last argument is not very convincing.PS: Needless to say that offering compilers with different flags is a poor decision if it had been deliberate at all.
Some common uses of
assert
by contestants may indeed be considered misuses. Problem is, what would you write instead, in vanilla C++, to check validity of statements about your program? This way is short and concise, and it's better than nothing, and better than rolling up your own. This is what matters a lot in a contest program.I agree that
assert
shouldn't call functions with side effects, and when people learn it, it goes the painful way. But it's beside the point of definingNDEBUG
in a judge environment.I don't agree that contest judge is a production environment which should have assertions disabled. One of a judge's valid and agreed uses in the contest is helping debug your solution.
And as we can see, requiring users to write
#undef NDEBUG
is against the principle of least astonishment.A statement is by definition non-mutating. Whenever we want to execute a function with side-effects and check the return value, we should write it exactly like that, i. e.
auto result = change(); assert(result);
. Typing speed is hardly a hindrance to a good performance (maybe unless you are in the Top 20 of users) so these extra keystrokes shouldn't hurt.I did not mean to say judges should only run release builds. I just wanted to make the misconception about the intended behavior of
assert
clear. On the other hand, I wasn't aware you could check the output of judge's run during a contest.Usually debug submitting goes like this: put a couple assertions, and a Wrong Answer turns into a Runtime Error. So you narrow the bug down to what you checked in these assertions.
What are you talking about "a statement is by definition non-mutating"? Maybe you are used to writing it like that, but that doesn't mean everyone is.
How do you know the intended behavior of
assert
? Are you the developer of C++ who added asserts?Outside of competitive programming, release builds are built with
-DNDEBUG
(asserts are skipped).You say it like it's predetermined, it was not someone in your company who wrote that compilation line
It's not only in my company, so I believe it's one of the best practices in software industry
That's not the point of his post. Treat it like "a company" or "one's company".
I don't have good source for statistics, but I would bet on 50/50 on enabling and not. And a lot of projects have there own assert implementation to avoid this.
Probably it's more frequent on Windows, because it's default behavior for MSVC (-DNDEBUG in Release and no -DNDEBUG in Debug).
Seconding this, the C++ spec (https://en.cppreference.com/w/cpp/error/assert) says nothing about "by definition non-mutating", it only says that it evaluates to 0 if NDEBUG is defined and checks the boolean if NDEBUG is not defined.
Also, it's completely untrue that release builds are always built with NDEBUG. I work on a "real production codebase" at a "real software company" and our builds to not use NDEBUG. In C++, there's no such thing as "a standard release build", there are only options that the compiler/language gives you, e.g. the option to set NDEBUG to disable all asserts.
The point here is to be simple, transparent, and accessible by using essentially the default set of flags. If anything, I think
-DNDEBUG
is a super niche and weird feature, and it's only a step away from-Dassert=printf("HAHAHAHA\n")
.Uh, assert is definitely intended to be non-mutating... there is a reason that there exists a flag to disable it. There's no flag to disable sort(), and nobody would ever dream of making one, because it is intended to be mutating. One does not have to be the developer of C++ to know this is good practice, it's been trained into anyone with any idea of how to do decent software development
This is a good practice, I understand pros of that, and I don't doubt that you have been trained that. But please don't say anyone, and don't say what people were intending with asserts. You have to be the developer of C++ (and the exact one who made asserts) to know the intent.
Just to go to a bit of an extreme: there totally is a flag to disable sort, it's
-Dsort=""
. More reasonably, there are flags to set your heap size to 0 (force all allocations to fail), there are flags to disable the standard library entirely (-nostdinc++
), and many more configurations that people do use in practice, but we obviously shouldn't care about.The point is that no code has any obligation to work with all possible (intended) flags, especially in a production environment. For many many C++ projects, the build system and compile flags are as much part of the source code as the .cpp files, and they won't work if you change random flags. Don't take "the spec writers intended for this to be something you can choose to use" as "you should/must use this feature".
Too be formally correct, -Dsort="" will just not compile, because it will try to define name function with empty name in algorithm.
Something like -D_GLIBCXX_ALGORITHM would work better ;)
Wait, there is a flag to disable sort??? What the... Congratulations, you have just won the argument.
Return codes. In vanilla C, too, in fact. Both show up as RE instead of WA. It has an added benefit — if an online judge shows you just the return code, it's already telling you what "assertion" you failed.
I agree that
-DNDEBUG
shouldn't be in an online judge, it's useful for things like production builds which isn't the case here.Return codes, yes, actually true to some extent. However, when you are in a function other than
main
and, for example, want to literally check preconditions and postconditions, return codes get cumbersome.Depending on the specific case, return codes for the program instead of function work as well:
if(condition) exit(1)
is RTE just likeassert(condition)
.Although I would never dare to mutate the state with an
assert
, I think principles from software engineering do not need to be blindly implemented in competitive programming. Best practices in software engineering exist because they make something better: if those things don't make anything better in competitive programming then they are not necessary here.I think "assert is supposed to be disabled in the final build" is not true. It can be disabled as an optimization if you choose, but I don't think it's true that all "professional contexts" use
-DNDEBUG
.I think it's reasonable to expect the judge to be an unurprising and unmodified base environment with base flags (additive changes like -Iboost only), and having the
-DNDEBUG
flag is certainly surprising and not additive.This topic is related to a discussion I had the other day. Are there cases in which the use of assert is ilegal and should result in disqualification of the team (in lets say ICPC)?
This may sound like an absurd question at first because one may think that anything you can code should be acceptable, but that is not true: if someone uses the judge result to discover the hidden test cases (by doing a binary search for example — if n>=x WA, else TLE) this is ilegal (or isnt it?). I believe that one rule is that every submission should be an actual attempt of solving the problem. So, suppose I submit a code and I got WA. Can I add only an assert to the code to check if the error is in a certain supposition of mine? Maybe not because that is not an attempt of solving the problem. But what if the first submission contained the assert already and I got RTE? I get free information that the error is in the suposition. But this is just weird, maybe this is a loophole. Anyone has any thoughts on that?
No, this is explicitly an allowed use of assert; that's the whole point of having a separate RTE verdict from a WA verdict. The judge telling you RTE from failing an assert is similar to the judge telling you RTE from an out-of-bounds. You should be using asserts for this reason, it's one the the tricks that really helped me.
However, there is a question of whether we're giving too much feedback, which is obviously a balance. Overall, I think allowing this feedback is good for the contests, because it makes them more fun for the contestants (less stress from unknowns/debugging), and is still reasonably skill-based.
There are several arguments for witholding feedback:
The main argument for giving more feedback is that it makes the contest a better test of skill, rather than having too much luck from random bugs. (This is similar to how competitive Super Smash Bros is played without items; just because a better player "should" be better with items, doesn't mean it's good for the competition.) Secondarily, it also makes the contest more fun for contestants because you're able to solve/debug more problems.
How much feedback you give is also a spectrum. Here are a few points on the spectrum:
sleep(n/1000)
), and memory (a = new int[k*1000]
). There are many variants of this; AtCoder randomizes the order of test cases, many judges stop after the first failed case. If leaking data is not a concern (e.g. because penalty or because the contest is too short for it to be practical), this is also decent.Overall, I think we've hit a happy medium with per-problem verdict, maybe with index-of-failed-case and max runtime/memory.
None of these directly address your question of whether giving different verdicts for RTE/WA/TLE/MLE is good. I think these are small and helpful hints to give. I think not distinguishing TLE/WA would be very annoying to debug, and at that point, giving a little more feedback about RTE seems totally fine (if you insist, I can and will write
void tle_assert(bool b) { while(!b); }
).I think this is impractical and unenforceable. If I resubmit some code which failed before because I think the issue might be nondeterministic, is that illegal? What if I accidentally introduce some bug which guarantees an MLE? What if I intentionally introduce some bug which guarantees an MLE? Most scoring schemes also do more than enough to penalize extraneous submissions, so I also don't think this is a problem.
I like your number 2 in spectrum for ICPC indeed. But going back to my question, you said that making the use of assert in some cases(and other ways of exploiting the judge veredict) is unenforceable, but I dont agree with that. What this rule may be is subjective, that is, it depends on the judge looking at your code and thinking that you are intentionally trying to get information from the hidden test cases. I believe that this is the current rule in ICPC, but in my opinion a better rule would be allowing you to do anything and limiting the number of submissions to 50 per problem per team so you cannot “binary search” stuff. That way there is no subjective factor and it is easier to judge.
Most serious contests don't have such rules and instead have strong/numerous test cases to make sure that this isn't an issue.
I should have started by saying the following: an icpc organizer from Latin America said that an example of an ilegal use of assert is checking whether the input graph is always bipartite (because this may be the case even though the statement does not say that directly, this should be a deduction the contestant has). This use is ilegal because you are trying to check something from the hidden test cases. This rule is BS imo.
Yeah I agree that's pretty BS. I guess the ICPC rules are pretty loose (they pretty much just give judges final say), so it's kinda unfortunate that they're not becoming more objective; at least they haven't codified subjective grading.
Sorry, I got a little confused, I didn't realize you were just stating that such a rule exists and not that you were advocating for it. I think the subjectivity is precisely why rules like "all submissions should be real attempts" are starting to die off in favor of more test data with carefully designed tests. Right now, I don't believe ICPC includes any such rule. ICPC rules prohibit interfering with the network or the judge machines, but there are no rules against using valid submissions to gain information.
I want to point out that there's a spectrum of possibilities of how you can "hack" a problem: the most egregious is to get all test data locally and hard-code the answers in a big if statement. Then, there's identifying/hardcoding particular special cases to fix 1-2 test cases that you failed. Beyond that, one could identify some patterns in 1-2 test cases that get TLE and try to write special heuristics optimize those cases. Finally, there's a whole range of heuristic performance optimizations that could also be considered hacking. I think the consensus is that hard-coding all cases is definitely bad and we should prevent it, but case-specific heuristic performance optimizations are fine but sometimes frowned upon (some ICPC search problems even are designed to be solved with heuristic performance optimizations).
In practice, we usually don't need a submission limit to prevent hacking because (a) there are too many distinct test cases, which totally prevents extracting all data and (b) because 20 minutes of penalty is a big deal. Penalty gives a clear ordering of solve normally > hacking with many submits > not solving, which I think is definitely the right ordering. (FWIW, serious contests without penalty often do have a generous submission limit. I believe IOI's is around 100 submissions per problem.)
Finally, I think there's this assumption that hacking problems is automatically bad; I think it definitely shows some relevant skill and it can be a fun part of contests. There's also a very a fine line solving heuristic search problems and squeezing solutions via hacking. For many heuristic problems, there's a balance between WA-AC-TLE, and you often need to "binary search" on the judge data to tune your parameters. These are totally acceptable and a good skill to encourage.
It's a rule for the Latin American ICPC regional.
I agree with pretty much all that you said (well, except for heuristic problems being acceptable, but that's a different discussion...): being able to trade penalty time for only a few bits of information is not necessarily a bad thing and adds an interesting dimension to strategy. This rule being hard to enforce means it's very rarely actually invoked (for the times I can remember, a team was attempting to interfere with the judge machines), so while I would prefer to see the rule gone, it doesn't affect much in practice.
Has anyone ever been disqualified upon this rule?
Personally all I know is more than one team has been disqualified for submitting fork bombs. You'd need someone older than me to know what happened prior to 2011, though.
Actually, I don't fully understand if a believing in any hypothesis violates the rule. Like imagine a problem where the graph is not said to be bipartite, but it follows from the statement, the hardest part of the problem is to discover this fact, and the bipartite case is trivial to implement. If I submit the following (pseudo)code:
then it seems that it's illegal. Is it legal to comment the assert here and then submit? I mean, on one hand, it's still "trying to check something from the hidden test cases", the only difference is that the verdict I get in response does not have to be RE. On the other hand, as far as I know, a contestant doesn't have to prove that their solution is correct, this is what the testing system is for. So please, if I miss any crucial difference between these, anyone tell me what it is.
If your thought process was sth like "maybe the graph is not necessarily bipartite, but is nearly bipartite (whatever that means) so solve should still work, or maybe I just made a bug in testing if the graph is bipartite" then it's definitely "an actual attempt of solving the problem", but what if Jury says "we don't believe you that's what you thought"
EDIT: I slightly misunderstood what you said, my response refers to case: "you submitted the code with assert, got RE and resubmitted without it (or submitted without got some verdict other than AC, and resubmitted with assert)". If you made only one submission it's obviously "an actual attempt of solving the problem"
As written, your code is an attempt to solve the problem and therefore should be OK according to my interpretation of the rule.
However, if you submit without the assert and later submit the same code with the assert, then you know the second submission is not going to pass so it would be an illegal submission.
So this is a very interesting case, because to me it sounds like the rule would make the exact same code either legal or illegal, depending on the circumstances.
If the only change in the code is adding an assertion, then yes, but what if I also make some change like: in dfs before iterating through all vertices I shuffle them/sort them by degree/indices, starf dfs from vertex with minimum/maximum degree/random vertex or even start from vertex no 2 instead of 1 or anything like this? It is entirely possible that with a change like this I can slip some random bullshit/solution with a very subtle bug past weak testcases and it is less likely but still possible that sorting all vertices by sth seemingly reasonable like degree/subtree size etc. will actually turn my solution into a provable greedy. Is it still illegal to submit it?
And what's the point of this whole thread? To learn in perfection trashy Latin American RC rules?
One of the cool things about competitive programming is automated judgement. And this rule makes it manual. In my opinion, it makes no sense at all.
I don't understand computers, I just memorized some stuff that's enough for solving puzzles.
Funny, how the guy at #1 in top rated describes competitive programming. Can I get even remotely close to where you are if I memorize stuff/problems/ideas/tricks ? Please tell how you "memorize" ?
I think he means to say he memorized basic things like how to compile programs, terminal I/O redirection, and useful compiler flags etc. These, together with basics of C++/Java/other languages, is mostly what he needs to convert his solutions of puzzles (= contest tasks) to code and test it.
Fair enough. You just thwarted my plans to take over Codeforces.
That is what I meant, yes. Example: I used vim for 5 years because my team were using vim, I just copied .vimrc and makefile from my teammate, I didn't even try to understand them.
But, The_Dark_Knight_Rises, you are not far off. Practice = "memorize stuff/problems/ideas/tricks" (and how to apply them). So yeah, I'm sure you can overtake me with enough practice. You might not have enough patience (or time in your life) to do enough practice though...