Question

Chandler Carruth在他的CppCon2015 talk中引入了两个函数，可以用来对优化器进行细粒度的抑制。它们可用于编写微基准测试，优化器不会简单地将其变为无意义。

void clobber() {
  asm volatile("" : : : "memory");
}

void escape(void* p) {
  asm volatile("" : : "g"(p) : "memory");
}

这些使用内联汇编语句来更改优化程序的假设。

clobber中的汇编语句指出其中的汇编代码可以在内存中的任何位置读写。实际的汇编代码为空，但优化器不会查看它，因为它是asm volatile。当我们告诉它代码可能在内存中的任何地方读写时，它都相信它。这有效地防止优化器在调用clobber之前重新排序或丢弃内存写入，并在调用clobber†后强制执行内存读取。

escape中的那个，另外使指针p对程序集块可见。同样，因为优化器不会查看代码可以为空的实际内联汇编代码，并且优化器仍将假定该块使用指针p指向的地址。这有效地强制在内存中的任何p点，而不是在寄存器中，因为汇编块可能会从该地址执行读取。

（这很重要，因为clobber函数不会强制读取或写入编译器决定放入寄存器的任何内容，因为clobber中的汇编语句没有说明任何内容特别是必须对组件可见。）

所有这些都发生在没有任何额外代码由这些“障碍”直接生成的情况下。它们纯粹是编译时的工件。

但这些使用GCC和Clang支持的语言扩展。有没有办法在使用MSVC时有类似的行为？

†要理解为什么优化器必须这样思考，想象一下汇编块是否是一个循环，为内存中的每个字节添加1。

Answer 1

鉴于your approximation of escape()，您应该对clobber()的以下近似值处于罚款状态（请注意，这是一个草案构思，将某些解决方案推迟到函数nextLocationToClobber()的实现中）：

// always returns false, but in an undeducible way
bool isClobberingEnabled();

// The challenge is to implement this function in a way,
// that will make even the smartest optimizer believe that
// it can deliver a valid pointer pointing anywhere in the heap,
// stack or the static memory.
volatile char* nextLocationToClobber();

const bool clobberingIsEnabled = isClobberingEnabled();
volatile char* clobberingPtr;

inline void clobber() {
    if ( clobberingIsEnabled ) {
        // This will never be executed, but the compiler
        // cannot know about it.
        clobberingPtr = nextLocationToClobber();
        *clobberingPtr = *clobberingPtr;
    }
}

<强>更新

问题：您如何确保isClobberingEnabled以不可约束的方式返回false＆＃34;＆＃34;？当然，将定义放在另一个翻译单元中是微不足道的，但是当你启用LTCG时，该策略就会失败。你有什么想法？

答案：我们可以利用数论中难以证明的属性，例如Fermat's Last Theorem：

bool undeducible_false() {
    // It took mathematicians more than 3 centuries to prove Fermat's
    // last theorem in its most general form. Hardly that knowledge
    // has been put into compilers (or the compiler will try hard
    // enough to check all one million possible combinations below).

    // Caveat: avoid integer overflow (Fermat's theorem
    //         doesn't hold for modulo arithmetic)
    std::uint32_t a = std::clock() % 100 + 1;
    std::uint32_t b = std::rand() % 100 + 1;
    std::uint32_t c = reinterpret_cast<std::uintptr_t>(&a) % 100 + 1;

    return a*a*a + b*b*b == c*c*c;
}

Answer 2

我使用以下内容代替escape。

#ifdef _MSC_VER
#pragma optimize("", off)
template <typename T>
inline void escape(T* p) {
    *reinterpret_cast<char volatile*>(p) =
        *reinterpret_cast<char const volatile*>(p); // thanks, @milleniumbug
}
#pragma optimize("", on)
#endif

它不完美，但我认为它足够接近。

可悲的是，我没有办法模仿clobber。

MSVC中微基准测试的优化障碍：告诉优化器你破坏内存？

2 个答案: