Checkers with testlib.h

Introduction

Checker is a program that should be written when task allows more than one correct solution. Although it seems that it is very easy to write checker, there are lots of important technical details that are easy to forget if you don't use a special library like testlib.h.

A common convention is that a checker should be a program taking three command-line arguments: the testdata filename, the participant output filename and the jury answer filename. Checker should read the contents of the input, output and answer, decide whether participant answer is correct (and optimal being compared to the jury's answer if there can be unoptimal answer in this task) and return one of several pre-defined verdicts:

Verdict	testlib macro	meaning
Ok	`_ok`	The output is correct, follows the output format and represents an optimal answer (if applicable in this problem).
Wrong Answer	`_wa`	The output is wrong or incorrect.
Presentation Error	`_pe`	The output doesn't follow output format specification of the task. Although, on Codeforces this verdict is being replaced by Wrong Answer since it's usually hard to distinguish Presentation Error and Wrong Answer
Partially Correct	`_pc(score)`	If there is a partial scoring in the task, this verdict should be used in situation when solution is partially correct or to give solution some number of points. Here `score` should be an integer between 0 and 200 where 0 represents the lowest mark (no points) and 200 represents the highest mark (maximum possible number of points)
Fail	`_fail`	This verdict means that checker has encountered a critical internal error or that the jury's answer is incorrect or that contestant found the more optimal answer than jury. This verdict means that something wrong has happened and it requires a special investigation from jury.

Usually the verdict returned by checker is indicated by the return code of its executable, but it may possibly be transfered to the testing system by many other ways: by creating a special xml-file with checker outcome, by writing to stdout or somehow else. When using testlib.h for writing a checker all those system-dependent ways are combined in a single expression quitf(VERDICT, "comment", ...).

Simplest checker example

Problem statement: You are given two integers a and b ( - 1000 ≤ a, b ≤ 1000). Find their sum and output it.

Let's write a checker for this problem. It will be very simple:

#include "testlib.h"

int main(int argc, char* argv[]) {
    // This command initializes checker environment.
    registerTestlibCmd(argc, argv);
    // Now there are three global variables specifying testlib streams:
    // inf - stream with the testdata.
    // ouf - stream with the contestant output.
    // ans - stream with the jury answer.
    // All those streams provide the similar interface for reading data.

    // This function reads a single integer from the participant output that 
    // should be between -2000 and 2000. If it doesn't belong to the specified
    // range, checker finishes with verdict _pe and comment saying that [sum of numbers]
    // is outside of the specified range.
    int pans = ouf.readInt(-2000, 2000, "sum of numbers");
    
    // This function reads a single integer from the jury output. Here we suppose
    // that jury's answer is correct and we do not need to additionally verify it.
    int jans = ans.readInt(); // We suppose that jury's answer is correct
    
    if (pans == jans)
        quitf(_ok, "The sum is correct."); // This finishes checker with verdit OK.
    else
        // quitf handles a comment like printf, i. e. you may use specifiers like
        // %d, %s etc in the comment.
        quitf(_wa, "The sum is wrong: expected = %d, found = %d", jans, pans);
}

Available methods

There are lots of methods useful for writing checkers.

Method	Description
`stream.readXXX`	All methods of form readXXX (like readInt, readLong, readDouble, readToken etc) are common for all testlib uses: checkers, validators and interactors. TODO: put all such methods on the separate page.
`void quit(TResult verdict, string message);` `void quit(TResult verdict, const char* message);` `void quitf(TResult verdict, const char* message, ...);`	Finishes the checker with a given verdict and comment.
`void quitif(bool condition, TResult verdict, const char* message, ...);`	if condition is true then performs quitf(verdict, message, ...)
`void ensuref(bool condition, const char* message, ...);`	An equivalent of assert. Checks that condition is true, otherwise finishes with _fail verdict. Useful for debugging checkers.

TODO: finish this list.

readAns paradigm

Suppose you have a task that asks contestant to find a complex composite answer that is not just a single number. Example:

Problem statement You are given a connected undirected weighted graph witn n vertices and m edges. Find a simple path between vertex s and vertex t (s ≠ t) of maximum weight and output it. Samples (input and output format is clarified in square brackets):

Sample input	Sample output
4 5 [n, m] 1 2 4 [edges] 2 4 2 1 4 4 1 3 5 3 4 3 1 4 [s, t]	3 [number of vertices in path] 1 3 4 [path]

Here is an example of bad checker implementation for this task.

Bad checker implementation

#include "testlib.h"
#include <map>
#include <vector>
using namespace std;

map<pair<int, int>, int> edges;

int main(int argc, char* argv[]) {
    registerTestlibCmd(argc, argv);
    int n = inf.readInt(); // no need to additionally call readSpace() or readEoln() since
    int m = inf.readInt(); // there is no need to validate input file in the checker
    for (int i = 0; i < m; i++) {
        int a = inf.readInt();
        int b = inf.readInt();
        int w = inf.readInt();
        edges[make_pair(a, b)] = edges[make_pair(b, a)] = w;
    }
    int s = inf.readInt();
    int t = inf.readInt();

    // reading jury answer
    int jvalue = 0;
    vector<int> jpath;
    int jlen = ans.readInt();
    for (int i = 0; i < jlen; i++) {
        jpath.push_back(ans.readInt());
    }
    for (int i = 0; i < jlen - 1; i++) {
        jvalue += edges[make_pair(jpath[i], jpath[i + 1])];
    }
    
    // reading participant answer
    int pvalue = 0;
    vector<int> ppath;
    vector<bool> used(n, false);
    int plen = ouf.readInt(2, n, "number of vertices"); // path should at least contain s and t
    for (int i = 0; i < plen; i++) {
        int v = ouf.readInt(1, n, format("path[%d]", i + 1).c_str());
        if (used[v - 1]) // checking that no vertex is used twice
            quitf(_wa, "vertex %d was used twice", v);
        used[v - 1] = true;
        ppath.push_back(v);
    }
    // checking that path is actually between s and t
    if (ppath.front() != s) 
        quitf(_wa, "path doesn't start in s: expected s = %d, found %d", s, ppath.front());
    if (ppath.back() != t)
        quitf(_wa, "path doesn't finish in t: expected t = %d, found %d", t, ppath.back());
    // checking that each pair of adjacent vertices in the path is indeed connected by an edge
    for (int i = 0; i < plen - 1; i++) {
        if (edges.find(make_pair(ppath[i], ppath[i + 1])) == edges.end())
            quitf(_wa, "there is no edge (%d, %d) in the graph", ppath[i], ppath[i + 1]);
        pvalue += edges[make_pair(ppath[i], ppath[i + 1])];
    }
    
    if (jvalue != pvalue)
        quitf(_wa, "jury has answer %d, participant has answer %d", jvalue, pvalue);
    else
        quitf(_ok, "answer = %d", pvalue);
}

Here are the main two issues that appear in this checker.

It believes that the jury's answer is absolutely correct. In case when jury's answer is unoptimal and contestant have really found the better answer, he will get the verdict WA that is not fair. There is a special verdict Fail exactly for this situation.
It contains the duplicating code for extracting the answer value for jury and contestant. In this case extracting the value is just one "for" cycle but for a harder task it may be a very complicated subroutine, and as usual, using the duplicated code makes twice harder to fix it, rewrite it or change it when output format changes.

In fact, reading an answer value is a subroutine that works exactly the same for contestant and jury. That's why it is usually being put into a separate function receiving the input stream as a parameter.

Good checker implementation

#include "testlib.h"
#include <map>
#include <vector>
using namespace std;

map<pair<int, int>, int> edges;
int n, m, s, t;

// This function receives stream as an argument, reads an answer from it,
// checks its correctness (i. e. that it is indeed a correct path from s to t in the graph),
// calculates its value and returns it. If the path is incorrect, it stops the execution
// with _wa outcome if stream = ouf (contestant) or with _fail outcome if stream = ans (jury).
int readAns(InStream& stream) {
// reading participant answer
    int value = 0;
    vector<int> path;
    vector<bool> used(n, false);
    int len = stream.readInt(2, n, "number of vertices"); // path should at least contain s and t
    for (int i = 0; i < len; i++) {
        int v = stream.readInt(1, n, format("path[%d]", i + 1).c_str());
        if (used[v - 1]) { // checking that no vertex is used twice
            // stream.quitf works as quitf but it modifies the verdict according
            // to what stream it is being invoked from. If stream == ouf then
            // it works exactly like quitf, otherwise if stream == ans then
            // any verdict will work like _fail (because it's bad when jury's answer is incorrect)
            stream.quitf(_wa, "vertex %d was used twice", v);
        }
        used[v - 1] = true;
        path.push_back(v);
    }
    // checking that path is actually between s and t
    if (path.front() != s) 
        stream.quitf(_wa, "path doesn't start in s: expected s = %d, found %d", s, path.front());
    if (path.back() != t)
        stream.quitf(_wa, "path doesn't finish in t: expected t = %d, found %d", t, path.back());
    // checking that each pair of adjacent vertices in the path is indeed connected by an edge
    for (int i = 0; i < len - 1; i++) {
        if (edges.find(make_pair(path[i], path[i + 1])) == edges.end())
            stream.quitf(_wa, "there is no edge (%d, %d) in the graph", path[i], path[i + 1]);
        value += edges[make_pair(path[i], path[i + 1])];
    }
    return value;
}

int main(int argc, char* argv[]) {
    registerTestlibCmd(argc, argv);
    n = inf.readInt(); // no need to additionally call readSpace() or readEoln() since
    m = inf.readInt(); // there is no need to validate input file in the checker
    for (int i = 0; i < m; i++) {
        int a = inf.readInt();
        int b = inf.readInt();
        int w = inf.readInt();
        edges[make_pair(a, b)] = edges[make_pair(b, a)] = w;
    }
    int s = inf.readInt();
    int t = inf.readInt();

    int jans = readAns(ans);
    int pans = readAns(ouf);
    if (jans > pans)
        quitf(_wa, "jury has the better answer: jans = %d, pans = %d\n", jans, pans);
    else if (jans == pans)
        quitf(_ok, "answer = %d\n", pans);
    else // (jans < pans)
        quitf(_fail, ":( participant has the better answer: jans = %d, pans = %d\n", jans, pans);
}

Notice that by using this paradigm we also check the jury answer for being correct. Checkers written in such form are usually shorter and easier to understand and fix. It is also applicable when task output is NO/(YES+certificate).

Notes, advices and common mistakes

Use readAns paradigm. It really makes easier to understand and work with your checker.
Always use optional arguments for methods like readInt(), readLong() etc. If you forget to check range for some variable, your checker may work incorrectly or even face runtime-error that will result in Check Failed in testing system.

Bad:

// ....
int k = ouf.readInt();
vector<int> lst;
for (int i = 0; i < k; i++)       // This will behave similarly for k = 0 as well as for k = -5.
    lst.push_back(ouf.readInt()); // But we don't want to accept list of length -5, don't we?
// ....
int pos = ouf.readInt();
int x = A[pos]; // 100% place for runtime-error. There will surely be a contestant who will output
                // -42, 2147483456 or some other garbage instead of correct value.
// ....

Good:

// ....
int k = ouf.readInt(0, n); // Now negative values of k will cause PE.
vector<int> lst;
for (int i = 0; i < k; i++)
    lst.push_back(ouf.readInt());
// ....
int pos = ouf.readInt(0, (int)A.size() - 1); // That's how we prevent index out of range.
int x = A[pos]; 
// ....

Use optional comments in readXXX methods. With them it is much easier to understand where did checker stop its execution (if not you don't specify it, the comment will be like "integer doesn't belong to range [23, 45]" and it's not clear what integer caused such behavior). Also write informative comments in quitf calls; remember that your comments will be probably available to the contestants after the competition (or even during the competition like while hacking on Codeforces).

Комментарии (26)

Показать архивные | Написать комментарий?

Zlobober

9 лет назад, # |

+28

Auto comment: topic has been updated by Zlobober (previous revision, new revision, compare).

→ Ответить

CMaster

← Rev. 2 →

can we use readString () ?

Or readString () = readWord () ?

MikeMirzayanov

9 лет назад, # ^ |

From testlib.h:

/* As readLine(). */
std::string readString();

kaushal02

-10

Website shows C++5.1 compiler yet somehow gives Compilation error on using "auto" command in for loops to iterate over containers. Is the command included in the tester repositories?

riadwaw

you should choose compiler that supports c++11

bluemmb

← Rev. 3 →

suppose input is :

5
Hello

which is correct for these I/O streams in checker ?

int a = stream.readInt();
string s = stream.readLine();

int a = stream.readInt();
stream.readEoln();
string s = stream.readLine();

The second option is correct. Also you can consider (if string expected to be a single token):

int n = inf.readInt();
string s = inf.readToken();

Azret

8 лет назад, # |

+10

А как я могу правильно загрузить IOI-Style задачи? Как делать scoring according to groups?

Miteusz

7 лет назад, # |

Why some readXXX methods (for example readInt and readDouble) return Presentation Error, and some other readXXX methods (for example readWord and readToken with regex) return Wrong Answer? Which exactly readXXX methods return Presentation Error?

xzm2024

And how to write a checker including partially correct ?

jamienguyen

5 лет назад, # ^ |

Same question here. I tried partial scoring using checker. When I want to say that "you got 25% for this test" and did

quitf(_pc(50), "p2 is correct, p1 is not: received %d, expected %d. scored 25/100", p1, a1);

What I received is this error:

Checker 'checker.cpp' returns exit code 50 [partially correct (50) p2 is correct, p1 is not: received 7, expected 6. scored 25/100]

I guess this feature is not yet implemented :<

dorijanlendvaj

When using partial scoring you need to use quitp, not quitf. Also, keep in mind that that way of partial scoring always adds 16(unless that was changed), so you should use 34, not 50.

Thanks for your help! Btw where did you find the documentation for these commands? In the blog, they say nothing about the quitp function.

I think I found that out from https://codeforces.net/blog/entry/51019

dcordb

5 лет назад, # |

How to use multigen generators in the script part of the generator. I mean I can't do generator_name [params], it gives an error because I didnt added the correct file name of the index of each tests.

So the question is, how can I add multigen generators to a problem?

johutha

4 года назад, # |

+23

Is it possible to run the contestant's code multiple times? E. g. In a problem, where the participant needs to en- / decode information in some structure, we can't let the participant en- and decode in the same run because they could just store the information in an array. And we obviously need to make sure that the $$$decode(encode(x)) = x$$$. How to handle this type of problem?

dalex

4 года назад, # ^ |

+1, I need this feature too. Now it is not supported in testlib and on Codeforces.

Maybe other testing systems that allow low-level settings support it. I think it must be possible on Yandex Contest (but not sure).

SecondThread

What does the auto-update checkbox do next to the dropdown to select a prebuilt checker or make your own?

SimB4

3 года назад, # ^ |

+, it produces the warnings.

cry

3 года назад, # |

im getting "Validator 'testvalidator.exe' returns exit code 3 [FAIL Expected EOF (stdin, line 1)]" Why is that?

Muaath_5

21 месяц назад, # |

-16

What should I put in the checker in intractive problems?

The intractor is enough in my case.

Palestinian_Dream

11 месяцев назад, # |

I was wondering for multiple-answer problems and to be more specific Permutations-Generating Problems how we can read the permutation?

Serikbay777

10 месяцев назад, # |

If I want to check whether my checker is correct how I have to compile the code?

MuhammadSawalhy

4 месяца назад, # ^ |

-8

Here's the official documentation of what you want to do, you can get the docs from "help" link in Polygon.

Checker docs

donovilia2007

8 месяцев назад, # |

Hello from 2024

This article was very useful, I am glad that I started my journey in developing problemsets with it