jamjury's blog

By jamjury, history, 5 months ago, In English

Memory-mapped input file is the fastest possible input method I know of. I used it for last ICPC Challenge.

Unfortunately, it is not possible — without exploiting Windows kernel bugs — to also map output file into memory on CF, because the handle lacks GENERIC_READ, which is required for FILE_MAP_WRITE mapping. So the best we can do for output is to write the whole buffer at once (but it doesn't really speed things up).

#include <Windows.h> // GetStdHandle, GetFileSize, CreateFileMapping, MapViewOfFile, UnmapViewOfFile, WriteFile, CloseHandle
#include <charconv> // from_chars, to_chars
#include <cstring> // memcpy
#include <cstdlib> // malloc, free
#include <cstddef> // size_t

// #define FREE_IO_ON_EXIT
 
struct input {
    LPVOID view;
    const char *first, *last;
 
    input() noexcept {
        HANDLE input_handle = GetStdHandle(STD_INPUT_HANDLE);
        DWORD input_size = GetFileSize(
            input_handle, // [in]            HANDLE  hFile,
            NULL          // [out, optional] LPDWORD lpFileSizeHigh
        );
        HANDLE mapping_object = CreateFileMapping(
            input_handle,  // [in]           HANDLE                hFile
            NULL,          // [in, optional] LPSECURITY_ATTRIBUTES lpFileMappingAttributes
            PAGE_READONLY, // [in]           DWORD                 flProtect
            0, /* whole */ // [in]           DWORD                 dwMaximumSizeHigh
            0, /*  file */ // [in]           DWORD                 dwMaximumSizeLow
            NULL           // [in, optional] LPCSTR                lpName
        );
        view = MapViewOfFile(
            mapping_object,  // [in] HANDLE hFileMappingObject
            FILE_MAP_READ,   // [in] DWORD  dwDesiredAccess
            0,               // [in] DWORD  dwFileOffsetHigh
            0,               // [in] DWORD  dwFileOffsetLow
            0 /*whole file*/ // [in] SIZE_T dwNumberOfBytesToMap
        );

        first = (char*) view;
        last = first + input_size;
        #ifdef FREE_IO_ON_EXIT
            CloseHandle(input_handle);
            CloseHandle(mapping_object);
        #endif
    }
 
    int take_int() noexcept {
        int result;
        first = std::from_chars(first, last, result).ptr + 1;
        return result;
    }
 
    double take_double() noexcept {
        double result;
        first = std::from_chars(first, last, result).ptr + 1;
        return result;
    }
 
    #ifdef FREE_IO_ON_EXIT
        ~input() noexcept {
            UnmapViewOfFile(view);
        }
    #endif
} in;

struct output {
    constexpr static std::size_t buf_size = 32*1024*1024; // 32MB
    char *buf, *first, *last;
 
    output() noexcept
        : buf((char*) std::malloc(buf_size))
        , first(buf)
        , last(buf + buf_size)
    {}
 
    void put_int(int value) noexcept {
        first = std::to_chars(first, last, value).ptr;
    }
 
    template <std::size_t size>
    void put_str(const char (&str)[size]) noexcept {
        std::memcpy(first, str, size - 1);
        first += size - 1;
    }
 
    void put_char(char c) noexcept {
        *first++ = c;
    }
 
    ~output() noexcept {
        HANDLE output_handle = GetStdHandle(STD_OUTPUT_HANDLE);
        WriteFile(output_handle, buf, first - buf, NULL, NULL);
        #ifdef FREE_IO_ON_EXIT
            CloseHandle(output_handle);
            std::free(buf);
        #endif
    }
} out;

int main() {
    out.put_int(in.take_int());
}

The benchmark (webarchive) on my machine gcc 13.2.0 (MinGW-W64), compiled with -std=c++20 -O2

int, scanf         4.01   3.93   4.05
int, cin           9.31   0.64   9.21
int, mmap in       0.06   0.06   0.06

int, printf        2.25   2.27   2.22
int, cout          0.39   0.40   0.40
int, mmap out      0.39   0.38   0.39

*mmap in/out is this implementation

Full text and comments »

  • Vote: I like it
  • +9
  • Vote: I do not like it

By jamjury, history, 6 months ago, In English

In the problem from currently running Huawei contest (Accuracy-Preserving Summation Algorithm) a part of the task is to choose between IEEE-754 binary64, binary32 and binary16 floating point formats to use for number summation.

Apparently, people in charge of the contest don't know that both Intel and AMD support fp16 since 2011-2012 (AMD Bulldozer and Intel Ivy Bridge), it's supported by GCC 12+ and Clang 15+ as _Float16 and since C++23 as std::float16_t.

So for the checker they wrote their own implementation, which differs from IEEE in two places:

  1. the exponent range is [-16; +16] instead of [-15; +15];

  2. during conversion from fp64 least significant bits are just thrown away instead of being rounded.

Considering it's importance in the problem, I decided to check what's the range of fp64 that doesn't overflow when converted to fp16 and the difference that aforementioned differences make. Therefore I wrote a little program which prints the ranges and decided to share results here with anyone interested. It shows ranges for both "Huawei FP16", IEEE and "Corrected Huawei" — without rounding, but with correct exponent range.

Here are the results:

Huawei FP16
  MAX
    fp64: 131071.99999999999 (0x1.fffffffffffffp+16)
    fp16: 131008 (0x1.ffcp+16)
  OVERFLOW
    fp64: 131072 (0x1p+17)
    fp16: inf
  RANGE
    fp64: (-131072; 131072)
    fp16: [-131008; 131008]

IEEE-754 FP16
  MAX
    fp64: 65519.999999999993 (0x1.ffdffffffffffp+15)
    fp16: 65504 (0x1.ffcp+15)
  OVERFLOW
    fp64: 65520 (0x1.ffep+15)
    fp16: inf
  RANGE
    fp64: (-65520; 65520)
    fp16: [-65504; 65504]

Huawei FP16 with correct range
  MAX
    fp64: 65535.999999999993 (0x1.fffffffffffffp+15)
    fp16: 65504 (0x1.ffcp+15)
  OVERFLOW
    fp64: 65536 (0x1p+16)
    fp16: inf
  RANGE
    fp64: (-65536; 65536)
    fp16: [-65504; 65504]

https://godbolt.org/z/3s8GoKe8j

Source

I think it's kinda sloppy that the problem statement is inconsistent with the checker, but at least they've shared the checker's code.

Full text and comments »

  • Vote: I like it
  • +19
  • Vote: I do not like it