Why are these 8 byte-writes not optimized into a MOV?

时间:2017-10-23 20:48:49

标签: c++ gcc optimization x86 micro-optimization

My colleague and myself are unsuccessful in explaining why GCC, ICC and Clang do not optimize this function

void f(std::uint64_t a, void * p) {
    std::uint8_t *x = reinterpret_cast<std::uint8_t *>(p);
    x[7] = a >> 56;
    x[6] = a >> 48;
    x[5] = a >> 40;
    x[4] = a >> 32;
    x[3] = a >> 24;
    x[2] = a >> 16;
    x[1] = a >> 8;
    x[0] = a;
}

Into this

mov     QWORD PTR [rsi], rdi

If we formulate f in terms of memcpy, it emits just that mov. Why does it not happen if we do the seemingly trivial sequence of byte writes?

1 个答案:

答案 0 :(得分:6)

我不是专家,但是gcc只能在gcc 7中为立即常量合并相邻商店:

如果我不得不猜测,第二个错误,等待可能不会太长。