Xiaohuba's blog

By Xiaohuba, history, 36 hours ago, In English

UPD: I've reduced the code size.

I've recently found that the following code will generate wrong output.

#include <bitset>
#include <iostream>
const int N = 105;
std::bitset<N> ok[N][N];
int n = 5;
int main() {
  ok[2][2].set(2);
  for (int i = n; i; i--)
    for (int j = i; j <= n; j++) {
      ok[i][j] = ok[i][j] | ok[i + 1][j] | ok[i][j - 1];
    }
  std::cout << ok[2][5][2] << '\n';
  return 0;
}

Compiled with -O3 -mtune=skylake -march=skylake, the code outputs 0.

However if you simulate the code you will know that the correct answer should be 1.

Note that the compiler seems to generate wrong sse instruction.

Godbolt link

Again, I believe this code is ub-free, and has nothing to do with implementation-defined stuff.

  • Vote: I like it
  • +49
  • Vote: I do not like it

»
29 hours ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by Xiaohuba (previous revision, new revision, compare).

»
28 hours ago, # |
Rev. 8   Vote: I like it +13 Vote: I do not like it

Further reduced:

typedef struct {
  unsigned long words[2];
} Child;

typedef struct {
  Child child;
} Parent;

Parent my_or(Parent x, const Parent *y) {
  const Child *y_child = &y->child;
  for (int i = 0; i < 2; i++) {
    x.child.words[i] |= y_child->words[i];
  }
  return x;
}

int main() {
  Parent bs[4];
  __builtin_memset(bs, 0, sizeof(bs));

  bs[0].child.words[0] = 1;
  for (int i = 1; i <= 3; i++) {
    bs[i] = my_or(bs[i], &bs[i - 1]);
  }
  return bs[2].child.words[0];
}
»
28 hours ago, # |
  Vote: I like it -16 Vote: I do not like it

I bet AI couldn't do that

»
27 hours ago, # |
Rev. 2   Vote: I like it +1 Vote: I do not like it

It's good with O2. I am always scared to use O3, now I have an additional reason to keep being scared. While searching I found something related to O3 and AVX probably not being addressed for a long time (>10 years) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49001 . This may not be this issue though, as I tried to adjust alignas and that didn't help.