Explanation to weird/strange floating point behaviors in C++

№	Пользователь	Рейтинг
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

№	Пользователь	Вклад
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

I'm writing this blog because of the large number of blogs asking why about they get strange floating arithmetic behavior in C++. For example:

"WA using GNU C++17 (64) and AC using GNU C++17" https://codeforces.net/blog/entry/78094

"Why does this happen?" https://codeforces.net/blog/entry/51884

"Why can this code work strangely?" https://codeforces.net/blog/entry/18005

and many many more.

The issue is caused by something called excess precision. In C and C++ there are multiple different modes (referred to as methods) of how floating point arithmetic is done, see (https://en.wikipedia.org/wiki/C99#IEEE_754_floating-point_support). You can detect which one is being used by the value of FLT_EVAL_METHOD found in float.h. In mode 2 (which is what 32 bit g++ uses by default) all floating point arithmetic is done using long double. In mode 0 (which is what 64 bit g++ uses by default) the arithmetic is done using each corresponding type, so there is no excess precision.

Here is a simple example of how to detect excess precision (partly taken from https://stackoverflow.com/a/20870774)


#include <stdio.h>
#include <stdlib.h>
#include <float.h>

int main() {
  printf("This is compiled in mode %d\n", FLT_EVAL_METHOD);
  printf("0 means no excess precision.\n");
  printf("2 means it has excess precision.\n\n");
  
  printf("The following test detects excess precision\n");
  printf("0 if no excess precision, or 8e-17 if excess precision is used.\n");
  double a = atof("1.2345678");
  double b = a*a;
  printf("%.20e\n", b - 1.52415765279683990130);
  return 0;
}

If b is rounded (as one would "expect" since it is a double), then the result is zero. Otherwise it is something like 8e-17 because of excess precision. I tried running this in custom invocation. MSVC, Clang and g++17(64bit) all use mode 0 and round b to 0, while g++11, g++14 and g++17 as expected all use mode 2 and b = 8e-17.

The culprit behind all of this is misery is the old x87 instruction set, which only supports (80 bit) long double arithmetic. The modern solution is to on top of this use the SSE (version 2 or later) instruction set, which supports both float and double arithmetic. GCC got a flag to turn this on -mfpmath=sse -msse2. This will not change the value of FLT_EVAL_METHOD, but it will effectively turn off excess precision, see 81988982.

It is also possible to turn on excess precision with -mfpmath=387, see 81989001.

Rev.	Кто	Когда	Δ	Комментарий
en8	pajenegod	2020-06-02 01:30:31	571
en7	pajenegod	2020-05-31 19:26:56	420
en6	pajenegod	2020-05-31 18:14:03	1551	Tiny change: 'ostream>\n#include <cstdlib>\nusing n' -> 'ostream>\nusing n'
en5	pajenegod	2020-05-31 16:03:51	82
en4	pajenegod	2020-05-30 23:33:06	1	(published)
en3	pajenegod	2020-05-30 23:24:33	626	Tiny change: 'there are multiple differen' -> 'there are differen'
en2	pajenegod	2020-05-30 21:45:15	28
en1	pajenegod	2020-05-30 21:40:28	2584	Initial revision (saved to drafts)

Rev.

Язык

Кто

Когда

Комментарий

en8