[testlib] Checkers

Правка en5, от Zlobober, 2015-06-09 19:51:19

Introduction

Checker is a program that should be written when task allows more than one correct solution. Although it seems that it is very easy to write checker, there are lots of important technical details that are easy to forget if you don't use a special library like testlib.h.

A common convention is that a checker should be a program taking three command-line arguments: the testdata filename, the participant output filename and the jury answer filename. Checker should read the contents of the input, output and answer, decide whether participant answer is correct (and optimal being compared to the jury's answer if there can be unoptimal answer in this task) and return one of several pre-defined verdicts:

Verdict testlib macro meaning
Ok _ok The output is correct, follows the output format and represents an optimal answer (if applicable in this problem).
Wrong Answer _wa The output is wrong or incorrect.
Presentation Error _pe The output doesn't follow output format specification of the task. Although, on Codeforces this verdict is being replaced by Wrong Answer since it's usually hard to distinguish Presentation Error and Wrong Answer
Partially Correct _pc(score) If there is a partial scoring in the task, this verdict should be used in situation when solution is partially correct or to give solution some number of points. Here score should be an integer between 0 and 200 where 0 represents the lowest mark (no points) and 200 represents the highest mark (maximum possible number of points)
Fail _fail This verdict means that checker has encountered a critical internal error or that the jury's answer is incorrect or that contestant found the more optimal answer than jury. This verdict means that something wrong has happened and it requires a special investigation from jury.

Usually the verdict returned by checker is indicated by the return code of its executable, but it may possibly be transfered to the testing system by many other ways: by creating a special xml-file with checker outcome, by writing to stdout or somehow else. When using testlib.h for writing a checker all those system-dependent ways are combined in a single expression quitf(VERDICT, "comment", ...).

Simplest checker example

Problem statement: You are given two integers a and b ( - 1000 ≤ a, b ≤ 1000). Find their sum and output it.

Let's write a checker for this problem. It will be very simple:

#include "testlib.h"

int main(int argc, char* argv[]) {
    // This command initializes checker environment.
    registerTestlibCmd(argc, argv);
    // Now there are three global variables specifying testlib streams:
    // inf - stream with the testdata.
    // ouf - stream with the contestant output.
    // ans - stream with the jury answer.
    // All those streams provide the similar interface for reading data.

    // This function reads a single integer from the participant output that 
    // should be between -2000 and 2000. If it doesn't belong to the specified
    // range, checker finishes with verdict _pe and comment saying that [sum of numbers]
    // is outside of the specified range.
    int pans = ouf.readInt(-2000, 2000, "sum of numbers");
    
    // This function reads a single integer from the jury output. Here we suppose
    // that jury's answer is correct and we do not need to additionally verify it.
    int jans = ans.readInt(); // We suppose that jury's answer is correct
    
    if (pans == jans)
        quitf(_ok, "The sum is correct."); // This finishes checker with verdit OK.
    else
        // quitf handles a comment like printf, i. e. you may use specifiers like
        // %d, %s etc in the comment.
        quitf(_wa, "The sum is wrong: expected = %d, found = %d", jans, pans);
}

Available methods

There are lots of methods useful for writing checkers.

Method Description
stream.readXXX All methods of form readXXX (like readInt, readLong, readDouble, readToken etc) are common for all testlib uses: checkers, validators and interactors. TODO: put all such methods on the separate page.
void quit(TResult verdict, string message);
void quit(TResult verdict, const char* message);
void quitf(TResult verdict, const char* message, ...);
Finishes the checker with a given verdict and comment.
void quitif(bool condition, TResult verdict, const char* message, ...); if condition is true then performs quitf(verdict, message, ...)
void ensuref(bool condition, const char* message, ...); An equivalent of assert. Checks that condition is true, otherwise finishes with _fail verdict. Useful for debugging checkers.

TODO: finish this list.

readAns paradigm

Suppose you have a task that asks contestant to find a complex composite answer that is not just a single number. Example:

Problem statement You are given a connected undirected weighted graph witn n vertices and m edges. Find a simple path between vertex s and vertex t (s ≠ t) of maximum weight and output it. Samples (input and output format is clarified in square brackets):

Sample input Sample output
4 5 [n, m]
1 2 4 [edges]
2 4 2
1 4 4
1 3 5
3 4 3
1 4 [s, t]
3 [number of vertices in path]
1 3 4 [path]

Here is an example of bad checker implementation for this task.

#include "testlib.h"
#include <map>
#include <vector>
using namespace std;

map<pair<int, int>, int> edges;

int main(int argc, char* argv[]) {
    registerTestlibCmd(argc, argv);
    int n = inf.readInt(); // no need to additionally call readSpace() or readEoln() since
    int m = inf.readInt(); // there is no need to validate input file in the checker
    for (int i = 0; i < m; i++) {
        int a = inf.readInt();
        int b = inf.readInt();
        int w = inf.readInt();
        edges[make_pair(a, b)] = edges[make_pair(b, a)] = w;
    }
    int s = inf.readInt();
    int t = inf.readInt();

    // reading jury answer
    int jvalue = 0;
    vector<int> jpath;
    int jlen = ans.readInt();
    for (int i = 0; i < jlen; i++) {
        jpath.push_back(ans.readInt());
    }
    for (int i = 0; i < jlen - 1; i++) {
        jvalue += edges[make_pair(jpath[i], jpath[i + 1])];
    }
    
    // reading participant answer
    int pvalue = 0;
    vector<int> ppath;
    vector<bool> used(n, false);
    int plen = ouf.readInt(2, n, "number of vertices"); // path should at least contain s and t
    for (int i = 0; i < plen; i++) {
        int v = ouf.readInt(1, n, format("path[%d]", i + 1).c_str());
        if (used[v - 1]) // checking that no vertex is used twice
            quitf(_wa, "vertex %d was used twice", v);
        used[v - 1] = true;
        ppath.push_back(v);
    }
    // checking that path is actually between s and t
    if (ppath.front() != s) 
        quitf(_wa, "path doesn't start in s: expected s = %d, found %d", s, ppath.front());
    if (ppath.back() != t)
        quitf(_wa, "path doesn't finish in t: expected t = %d, found %d", t, ppath.back());
    // checking that each pair of adjacent vertices in the path is indeed connected by an edge
    for (int i = 0; i < plen - 1; i++) {
        if (edges.find(make_pair(ppath[i], ppath[i + 1])) == edges.end())
            quitf(_wa, "there is no edge (%d, %d) in the graph", ppath[i], ppath[i + 1]);
        pvalue += edges[make_pair(ppath[i], ppath[i + 1])];
    }
    
    if (jvalue != pvalue)
        quitf(_wa, "jury has answer %d, participant has answer %d", jvalue, pvalue);
    else
        quitf(_ok, "answer = %d", pvalue);
}

Here are the main two issues that appear in this checker.

  1. It believes that the jury's answer is absolutely correct. In case when jury's answer is unoptimal and contestant have really found the better answer, he will get the verdict WA that is not fair. There is a special verdict Fail exactly for this situation.
  2. It contains the duplicating code for extracting the answer value for jury and contestant. In this case extracting the value is just one "for" cycle but for a harder task it may be a very complicated subroutine, and as usual, using the duplicated code makes twice harder to fix it, rewrite it or change it when output format changes.

In fact, reading an answer value is a subroutine that works exactly the same for contestant and jury. That's why it is usually being put into a separate function receiving the input stream as a parameter.

#include "testlib.h"
#include <map>
#include <vector>
using namespace std;

map<pair<int, int>, int> edges;
int n, m, s, t;

// This function receives stream as an argument, reads an answer from it,
// checks its correctness (i. e. that it is indeed a correct path from s to t in the graph),
// calculates its value and returns it. If the path is incorrect, it stops the execution
// with _wa outcome if stream = ouf (contestant) or with _fail outcome if stream = ans (jury).
int readAns(InStream& stream) {
// reading participant answer
    int value = 0;
    vector<int> path;
    vector<bool> used(n, false);
    int plen = stream.readInt(2, n, "number of vertices"); // path should at least contain s and t
    for (int i = 0; i < plen; i++) {
        int v = stream.readInt(1, n, format("path[%d]", i + 1).c_str());
        if (used[v - 1]) { // checking that no vertex is used twice
            // stream.quitf works as quitf but it modifies the verdict according
            // to what stream it is being invoked from. If stream == ouf then
            // it works exactly like quitf, otherwise if stream == ans then
            // any verdict will work like _fail (because it's bad when jury's answer is incorrect)
            stream.quitf(_wa, "vertex %d was used twice", v);
        }
        used[v - 1] = true;
        path.push_back(v);
    }
    // checking that path is actually between s and t
    if (path.front() != s) 
        stream.quitf(_wa, "path doesn't start in s: expected s = %d, found %d", s, path.front());
    if (path.back() != t)
        stream.quitf(_wa, "path doesn't finish in t: expected t = %d, found %d", t, path.back());
    // checking that each pair of adjacent vertices in the path is indeed connected by an edge
    for (int i = 0; i < plen - 1; i++) {
        if (edges.find(make_pair(path[i], path[i + 1])) == edges.end())
            stream.quitf(_wa, "there is no edge (%d, %d) in the graph", path[i], path[i + 1]);
        value += edges[make_pair(path[i], path[i + 1])];
    }
    return value;
}

int main(int argc, char* argv[]) {
    registerTestlibCmd(argc, argv);
    n = inf.readInt(); // no need to additionally call readSpace() or readEoln() since
    m = inf.readInt(); // there is no need to validate input file in the checker
    for (int i = 0; i < m; i++) {
        int a = inf.readInt();
        int b = inf.readInt();
        int w = inf.readInt();
        edges[make_pair(a, b)] = edges[make_pair(b, a)] = w;
    }
    int s = inf.readInt();
    int t = inf.readInt();

    int jans = readAns(ans);
    int pans = readAns(ouf);
    if (jans > pans)
        quitf(_wa, "jury has the better answer: jans = %d, pans = %d\n", jans, pans);
    else if (jans == pans)
        quitf(_ok, "answer = %d\n", pans);
    else // (jans < pans)
        quitf(_fail, ":( participant has the better answer: jans = %d, pans = %d\n", jans, pans);
}

Notice that by using this paradigm we also check the jury answer for being correct. Checkers written in such form are usually shorter and easier to understand and fix.

История

 
 
 
 
Правки
 
 
  Rev. Язык Кто Когда Δ Комментарий
en10 Английский Zlobober 2015-06-09 20:25:45 25
en9 Английский Zlobober 2015-06-09 20:24:31 0 (published)
en8 Английский Zlobober 2015-06-09 20:22:02 644
en7 Английский Zlobober 2015-06-09 20:03:43 1134
en6 Английский Zlobober 2015-06-09 19:54:32 72
en5 Английский Zlobober 2015-06-09 19:51:19 7104
en4 Английский Zlobober 2015-06-09 17:34:25 4655 Tiny change: 'blem). |\n| Wrong ' -
en3 Английский Zlobober 2015-06-09 16:37:33 198
en2 Английский Zlobober 2015-06-09 16:36:43 120
en1 Английский Zlobober 2015-06-09 16:36:29 730 Initial revision (saved to drafts)