For a long time I've been upset with C++ standard input/output. First of all, I heard that fread
/fwrite
are much faster than everything else, and it's impossible to get good times on problems with huge input or output without using those. Secondly, it's really annoying to write formatting string and ampersands in scanf
, especially with many variables to read. Thirdly, the only way to expand I/O to custom types is by overloading <<
and >>
operators on streams, but they are the slowest. I tried to tackle all these issues in my implementation. Remember, that it's targeted for the common use case in programming contests, so it's not as flexible as one might wish.
The code is here Doesn't compile with MSVS.
I apologize in advance to everyone, who will be scrolling through this 500 lines trying to read my solutions. Also it's not advised for people without broad experience with C++ to try to understand the entirety of it (dangerous for your mental health). There are 3 major components.
1. File management
This is done by classes InputFile
and OutputFile
. Indeed, I use fread
/fwrite
with a buffer of size 212, but there is one catch. fread
doesn't work for interactive problems, because input isn't fully buffered there, it's line buffered. I use fgets
in this case, but make sure to call correct constructor, it should look like this (input and output are smart pointers):
#ifdef ONLINE_JUDGE
input.reset(new InputFile(stdin, false)); // usual problem
input.reset(new InputFile()); // interactive problem
output.reset(new OutputFile());
#else
input.reset(new InputFile()); // local testing using a console
input.reset(new InputFile("input.txt")); // local testing using a file
output.reset(new OutputFile()); // output to console
output.reset(new OutputFile("output.txt")); // output to a file
#endif
Also it's possible to read or write to a string using InputString
and OutputString
classes. They are used similarly:
string s;
input.reset(new InputString(s));
output.reset(new OutputString(s));
2. String parsing
Next step is to convert standard types to/from a buffer of characters. To my deepest disappointment, even this problem isn't solved efficiently in standard C++ library. Just look at this benchmarks (situation with atoi and atod is similar):
https://github.com/miloyip/itoa-benchmark
https://github.com/miloyip/dtoa-benchmark
I didn't want to go too deep, so I used the simplest code to read/write integer types similar to this one 16792284. Then, I cheated with double treating it as two integers separated by a decimal point. This approach won't work for huge numbers or precision of more than 18 digits after the decimal point, but I've never seen such cases in the programming contests. All related methods are located inside InputDevice
and OutputDevice
classes.
3. User interface
Everything above was about being efficient. But the main goal during a contest is to code fast, so I wrapped all the efficiency into functions read
/write
, similar to scanf
/printf
. Of course, they offer much more. You don't need to write format strings. Range input and output are supported. You can read a string until the character satisfies some termination criterion. Write will insert delimiters for you automatically and can be configured with setPrecision
, setFill
and other modifiers similar to the ones in iomanip header. Full list of supported parameters is given in the comment after the code. Here's an example of reading and writing n, m, then n by m character grid (without whitespaces), then array a of length n, then array b of length m:
const int N = 1001;
int n, m;
char s[N][N];
int a[N], b[N];
read(n, m, s, n, a, n, b, m);
writeln(n, m, '\n', setDelimiter("\n"), s, n);
writeln(setDelimiter(", "), a, n, '\n', b, m);
Another feature is extensibility, for example you can create your own Point
class, implement read
and write
methods for it, and use it later combined with all other features, for example read an array and then 2 more points.
template <class T>
struct Point {
T x, y;
bool read(InputDevice& input) { return input.read(x, y); }
int write(OutputDevice& output) const { return output.write(x, ',', y); }
};
// later in the code
const int N = 1001;
int n;
Point<int> p[N], s, t;
read(n, p, n, s, t);
writeln(n, setDelimiter("\n"), p, n, s, setDelimiter(), t);
Speed
I spent a lot of time optimizing the code. To prove it, I included my methods in this benchmark. Below are the results with clang compiler (my methods are named fast).
Update 01/29/2017
- Custom types must implement member functions
read
andwrite
to be supported. read
now returns bool success flag instead of number of arguments read.- Floating point write bug was fixed. It caused errors on numbers like 1 - ε because of rounding.
is_iterator
trait was implemented and SFINAE was improved. This fixed ambiguous overload resolution that happened in some cases.write
andread
methods were moved inside classes to improve flexibility. Out of class methods now simply forward the parameters.- Oversight was fixed which made calls like
write(vector.begin(), vector.size())
impossible, because onlyint
size was supported. complex
andstring
are now fully supported.- Some erroneous calls could compile with unexpected results instead of giving CE. This was fixed.
I think (if it's not implemented yet), output buffering (not static size (like yours 1<<12), but all buffer) could be useful for the kind of offline problems (output answer in one time), at least it works for me — reduces the number of fwrite calls. Like:
fwrite calls at the current buffer size are already heavily dominated by conversion procedures. Reducing the number of calls should improve speed a bit, but at the cost of more memory consumption. In my opinion, the gain is too small to be worth it.
What do you think about things like __getchar_nolock or getchar_unlocked?
I tested
getc_unlocked
andputc_unlocked
, they are 10% faster than my methods to read/write files character by character. This is to be expected, because I have some overhead.However, they become slower when we need more than one character to parse a value, like integer or any other type. Since this is the dominant case, in my opinion, it's not worth using unlocked functions.
Also they look like a hack. Did you know that gcc also has
fread_unlocked
andfwrite_unlocked
? I would consider using them, but clang doesn't have them. This convinces me even more that they are a hack.I tried to call
fread_unlocked
andfwrite_unlocked
but did not manage to do that.Also I found somewhere that in MinGW they're called
_fread_nolock
and_fwrite_nolock
but then linking (!) failes with the following message:These functions are well-defined and standardized: http://pubs.opengroup.org/onlinepubs/9699919799/functions/getc_unlocked.html.
getchar()
is required to be thread-safe. If you call this function many times in a row, the added cost of locking/unlocking on each call accumulates. To remove the unnecessary overhead, you can lock the stream once withflockfile()
and use the fastgetchar_unlocked()
instead.Nice results!
But IMHO there's not much practical need for this nowadays.
For the last few years I've been using only cin/cout for contests, and it was always fast enough. Not even once I had to resort to printf/scanf/gets/whatever. (For floating-point numbers too.) The following well-known tweaks at the top of main() are sufficient:
ios_base::sync_with_stdio(false);cin.tie(0);cout.precision(20);
For this problem, cin/cout doesn't pass even with these optimisations.
It does, tested it now. 22392545
There wasn't c++ 14 5 months ago at codeforces.
Recently saw a blog about c++14 being slow for scanf,printf. Didn't expect it to be faster in case of cin,cout!!
And about c++11: putting cin.tie(0), cout.tie(0) passes some more cases, still TL at test 29.
You are a happy man. I definitely prefer iostream rather than cstdio, but once in few months it happens for me that I need to rewrite my IO. Locally there doesn't seem to be any difference, but on codeforces iostream is slower and that saddens me. Not by a very large factor, but when handling with really big inputs it becomes significant (problem linked by _index was one of those when I needed to rewrite it). (ofc I use all optimizations you mentioned)
Few comments about code style and correctness. And, yes, I think all of them make sense in competitive programming as well. I'm sorry if I missed something or my point is invalid — that may happen.
isdigit
/isalpha
fromcctype
and write your own code instead? Is it because you do not want problems with non-ASCII characters?static inline bool isDigit(char c) { return static_cast<unsigned char>(c - '0') < 10; }
— I believe you have undefined behavior right there because you are subtracting two chars and it's potential overflow. Which may be signed on some platforms (like, all that I know :), and it's undefined behavior. Convertc
or'0'
tounsigned char
beforehand. UPD: riadwaw pointed out that two expressions in subtraction are subject to promotion toint
first, so no signed overflow happens as long asint
andchar
have different sizes.Detail
namespace) — you can use bracket-list initialization if there are no constructors specified. E.g.return { 1 }
orreturn Width({ 1 })
readFloatingPoint
andwriteFloatingPoint
do not work with big numbers (which does not fit in 64 bits). That can easily happen.__builtin_expect
really necessary? Does it really help? Have you measured it? It clutters code and makes it harder to read and debug.*--last = i["fnI"];
— are you kidding me? Is there any rational reason for writing this instead of"fnI"[i]
?return read(arg.first) & read(arg.second);
— I believe that order of evaluation is not guaranteed for&
or overloaded&&
, is it? That can read second element of pair first which would we very fun to debug.readChar
andreadString
functions return int when they actually return boolean values of 0 or 1. It's misleading and can make people think that they return number of characters read, like in libc.static_cast
everywhere instead of C-style cast? You do not cast anything else, only numbers, no pointers or objects, and you can't do anything else thanstatic_cast
with numbers.input
andoutput
to be pointers instead of static variables which can be statically initialized?That you for the valuable input. Now the answers.
isSpace
was 10 times faster, other methods were 3-5 times faster.char
is an integer type and integer arithmetic is well defined by the standart.&
operator, but for+
as well. Fixed.static_cast
.2) Isn't this true only for unsigned integer type?
2) It's well-defined for unsigned overflows only. Say,
(1 << 31) + (1 << 31)
is undefined behavior if we have 32-bit ints. See here.8) Conversion from
bool
toint
is implicit, isn't it? Is it correct that you don't want it because you don't like implicit conversions and will have to throw in somestatic_cast<int>(bool)
later in code (inread
functions only, as I believe?). Well, right now only two functions return actual length — it'sreadUnsignedIntGeneral
andreadUnsignedInt
, which is very misleading and looks inconsistent at the first glance. Later I understood that it's because these are only functions used inreadFloatingPoint
, but that's still very weird-looking for me.10) Got it. Just in case you'll use C++14 in the future,
= make_unique(...)
is preferable over.reset(new T(...))
because it offers exception safety.2) it's subject to promotion first
I missed that. But even in this case I think they get promoted to
int
, notunsigned int
..?If we forget about case when sizeif int = size of char, it won't overthrow int
Oh, yes. Absolutely, thanks. Because we had two chars in the beginning, promoted them to int and now we have two very small ints.
Today I got a message from timus online judge that my solution has been rejudged, and it got TLE. I opened that problem and changed scanf/printf to FastIo. And my solution got accepted in 100 ms, while scanf/printf worked > 2500ms.
UPD Submitting solution without changes in msvc gets accepted in 1500 ms.
deleted
Never checked this but I think this works
It's been a very long time but I want to ask, how do you read/write 1D char array ? 2D char array works, but 1D char array gives:
In instantiation of 'bool read(Ts&& ...) [with Ts = {char (&)[4098], int&}]':| |488|error: call of overloaded 'read(char [4098], int&)' is ambiguous|
What i'm trying to do is reading bytes from a binary file then write bytes to a binary file. My code is: char buff[N]; ... read(buff,N); write(buff,N);
It's been a very long time but I want to ask, how do you handle input like __int128 ? I tried this code in a school online judge, everything worked out fine, but __int128 problems results in INTEGER_DIVIDE_BY_ZERO.