Question

在运行我的代码的优化版本（在NaN和g++ 4.8.2中进行编译）时发现导致所有内容都变为4.9.3的错误，我发现问题是{ {1}}选项，特别是它包含的-Ofast标记。

代码的一部分涉及使用-ffinite-math-only从FILE*读取浮点数，然后用数值替换所有fscanf s。然而，正如可以预料的那样，NaN启动并删除这些检查，从而留下-ffinite-math-only s。

在尝试解决此问题时，我偶然发现了this，它建议添加NaN作为方法属性来禁用特定方法的优化。以下说明了问题和尝试修复（实际上没有修复它）：

-fno-finite-math-only

如果使用#include <cstdio> #include <cmath> __attribute__((optimize("-fno-finite-math-only"))) void replaceNaN(float * arr, int size, float newValue){ for(int i = 0; i < size; i++) if (std::isnan(arr[i])) arr[i] = newValue; } int main(void){ const size_t cnt = 10; float val[cnt]; for(int i = 0; i < cnt; i++) scanf("%f", val + i); replaceNaN(val, cnt, -1.0f); for(int i = 0; i < cnt; i++) printf("%f ", val[i]); return 0; }编译/运行代码，则代码无法正常运行，具体而言，它输出echo 1 2 3 4 5 6 7 8 nan 10 | (g++ -ffinite-math-only test.cpp -o test && ./test)（应该已被nan替换） - 它表现良好如果-1.0f标志被省略。这不应该工作吗？我是否遗漏了gcc中属性语法的内容，或者这是另外一个“与某些版本的GCC相关的问题”（来自链接的SO问题）

我知道的一些解决方案，但宁愿更清洁/更便携的东西：

使用-ffinite-math-only（我的interrim解决方案）编译代码：我怀疑这个优化在我的上下文中可能对程序的其余部分非常有用;
手动在输入流中查找字符串-fno-finite-math-only，然后在那里替换值（输入阅读器位于库的不相关部分，产生设计不佳以包含此测试）。
假设一个特定的浮点架构并制作我自己的"nan"：我可以这样做，但它有点hackish和不便携。
使用没有isNaN标志的单独编译的程序预过滤数据，然后将其提供给主程序：维护两个二进制文件并让它们相互通信的额外复杂性是不值得的。

编辑：正如接受的答案所示，在-ffinite-math-only的旧版本（例如g++和4.82）中，这似乎是一个编译器“错误”。较新的版本，例如4.9.3和5.1。

如果由于某种原因更新编译器不是一个相当简单的选项（例如：没有root访问权限），或者将此属性添加到单个函数仍然无法完全解决6.1.1检查问题，那么备用解决方案，如果您可以确定代码将始终在NaN浮点环境中运行，则手动检查浮点的位以获得IEEE754签名。

接受的答案建议使用位字段来执行此操作，但是，编译器将元素放在位字段中的顺序是非标准的，事实上，{{1}的旧版本和较新版本之间的更改，甚至拒绝遵循旧版本（NaN和g++中的所需定位，始终将尾数放在首位），无论它们在代码中出现的顺序如何。

但是，使用位操作的解决方案可以保证在所有4.8.2兼容编译器上运行。下面是我的这种实现，我最终用来解决我的问题。它检查4.9.3符合性，并且我已将其扩展为允许双精度，以及其他更常规的浮点操作。

IEEE754

现在IEEE754函数变为：

#include <limits> // IEEE754 compliance test
#include <type_traits> // enable_if

template<
    typename T, 
    typename = typename std::enable_if<std::is_floating_point<T>::value>::type,
    typename = typename std::enable_if<std::numeric_limits<T>::is_iec559>::type,
    typename u_t = typename std::conditional<std::is_same<T, float>::value, uint32_t, uint64_t>::type
>
struct IEEE754 {

    enum class WIDTH : size_t {
        SIGN = 1, 
        EXPONENT = std::is_same<T, float>::value ? 8 : 11,
        MANTISSA = std::is_same<T, float>::value ? 23 : 52
    };
    enum class MASK : u_t {
        SIGN = (u_t)1 << (sizeof(u_t) * 8 - 1),
        EXPONENT = ((~(u_t)0) << (size_t)WIDTH::MANTISSA) ^ (u_t)MASK::SIGN,
        MANTISSA = (~(u_t)0) >> ((size_t)WIDTH::SIGN + (size_t)WIDTH::EXPONENT)
    };
    union {
        T f;
        u_t u;
    };

    IEEE754(T f) : f(f) {}

    inline u_t sign() const { return u & (u_t)MASK::SIGN >> ((size_t)WIDTH::EXPONENT + (size_t)WIDTH::MANTISSA); }
    inline u_t exponent() const { return u & (u_t)MASK::EXPONENT >> (size_t)WIDTH::MANTISSA; }
    inline u_t mantissa() const { return u & (u_t)MASK::MANTISSA; }

    inline bool isNan() const {
        return (mantissa() != 0) && ((u & ((u_t)MASK::EXPONENT)) == (u_t)MASK::EXPONENT);
    }
};
template<typename T>
inline IEEE754<T> toIEEE754(T val) { return IEEE754<T>(val); }

检查这些函数的汇编表明，正如预期的那样，所有掩码都成为编译时常量，从而产生以下（看似）高效的代码：

replaceNaN

这是一条指令少于工作位字段解决方案（无移位），并且使用了相同数量的寄存器（虽然很容易说这单独使其更有效，但还有其他问题，例如流水线可能使一个解决方案比另一个解决方案效率更高或更低。）

Answer 1

对我来说看起来像编译器错误。通过GCC 4.9.2，该属性被完全忽略。 GCC 5.1及其后的注意事项。也许是时候升级你的编译器了？

__attribute__((optimize("-fno-finite-math-only"))) 
void replaceNaN(float * arr, int size, float newValue){
    for(int i = 0; i < size; i++) if (std::isnan(arr[i])) arr[i] = newValue;
}

在GCC 4.9.2上使用-ffinite-math-only编译：

replaceNaN(float*, int, float):
        rep ret

但是在GCC 5.1中使用完全相同的设置：

replaceNaN(float*, int, float):
        test    esi, esi
        jle     .L26
        sub     rsp, 8
        call    std::isnan(float) [clone .isra.0]
        test    al, al
        je      .L2
        mov     rax, rdi
        and     eax, 15
        shr     rax, 2
        neg     rax
        and     eax, 3
        cmp     eax, esi
        cmova   eax, esi
        cmp     esi, 6
        jg      .L28
        mov     eax, esi
.L5:
        cmp     eax, 1
        movss   DWORD PTR [rdi], xmm0
        je      .L16
        cmp     eax, 2
        movss   DWORD PTR [rdi+4], xmm0
        je      .L17
        cmp     eax, 3
        movss   DWORD PTR [rdi+8], xmm0
        je      .L18
        cmp     eax, 4
        movss   DWORD PTR [rdi+12], xmm0
        je      .L19
        cmp     eax, 5
        movss   DWORD PTR [rdi+16], xmm0
        je      .L20
        movss   DWORD PTR [rdi+20], xmm0
        mov     edx, 6
.L7:
        cmp     esi, eax
        je      .L2
.L6:
        mov     r9d, esi
        lea     r8d, [rsi-1]
        mov     r11d, eax
        sub     r9d, eax
        lea     ecx, [r9-4]
        sub     r8d, eax
        shr     ecx, 2
        add     ecx, 1
        cmp     r8d, 2
        lea     r10d, [0+rcx*4]
        jbe     .L9
        movaps  xmm1, xmm0
        lea     r8, [rdi+r11*4]
        xor     eax, eax
        shufps  xmm1, xmm1, 0
.L11:
        add     eax, 1
        add     r8, 16
        movaps  XMMWORD PTR [r8-16], xmm1
        cmp     ecx, eax
        ja      .L11
        add     edx, r10d
        cmp     r9d, r10d
        je      .L2
.L9:
        movsx   rax, edx
        movss   DWORD PTR [rdi+rax*4], xmm0
        lea     eax, [rdx+1]
        cmp     eax, esi
        jge     .L2
        add     edx, 2
        cdqe
        cmp     esi, edx
        movss   DWORD PTR [rdi+rax*4], xmm0
        jle     .L2
        movsx   rdx, edx
        movss   DWORD PTR [rdi+rdx*4], xmm0
.L2:
        add     rsp, 8
.L26:
        rep ret
.L28:
        test    eax, eax
        jne     .L5
        xor     edx, edx
        jmp     .L6
.L20:
        mov     edx, 5
        jmp     .L7
.L19:
        mov     edx, 4
        jmp     .L7
.L18:
        mov     edx, 3
        jmp     .L7
.L17:
        mov     edx, 2
        jmp     .L7
.L16:
        mov     edx, 1
        jmp     .L7

在GCC 6.1上，输出类似，但并不完全相同。

用

替换属性

#pragma GCC push_options
#pragma GCC optimize ("-fno-finite-math-only")
void replaceNaN(float * arr, int size, float newValue){
    for(int i = 0; i < size; i++) if (std::isnan(arr[i])) arr[i] = newValue;
}
#pragma GCC pop_options

完全没有区别，所以这不仅仅是属性被忽略的问题。这些旧版本的编译器显然不支持在函数级粒度下控制浮点优化行为。

但请注意，GCC 5.1及更高版本上生成的代码仍然显着比没有-ffinite-math-only开关时编译函数更糟糕：

replaceNaN(float*, int, float): test esi, esi jle .L1 lea eax, [rsi-1] lea rax, [rdi+4+rax*4] .L5: movss xmm1, DWORD PTR [rdi] ucomiss xmm1, xmm1 jnp .L6 movss DWORD PTR [rdi], xmm0 .L6: add rdi, 4 cmp rdi, rax jne .L5 rep ret .L1: rep ret

我不知道为什么会出现这种差异。有些东西让编译器严重抛弃它的游戏;这是比完全禁用优化的代码更糟糕的代码。如果我不得不猜测，我推测它是std::isnan的实现。如果这个replaceNaN方法不是速度关键的，那么它可能无关紧要。如果您需要重复解析文件中的值，您可能更愿意采用合理有效的实现。

就个人而言，我会编写自己的std::isnan非便携式实现。 IEEE 754格式都有很好的文档记录，并且假设您对代码进行了全面的测试和评论，我无法看到它的危害，除非您绝对需要将代码移植到所有不同的体系结构中。它会将纯粹主义者推向墙上，但也应该使用-ffinite-math-only之类的非标准选项。对于single-precision float，例如：

bool my_isnan(float value) { union IEEE754_Single { float f; struct { #if BIG_ENDIAN uint32_t sign : 1; uint32_t exponent : 8; uint32_t mantissa : 23; #else uint32_t mantissa : 23; uint32_t exponent : 8; uint32_t sign : 1; #endif } bits; } u = { value }; // In the IEEE 754 representation, a float is NaN when // the mantissa is non-zero, and the exponent is all ones // (2^8 - 1 == 255). return (u.bits.mantissa != 0) && (u.bits.exponent == 255); }

现在，无需注释，只需使用my_isnan代替std::isnan。在使用-ffinite-math-only启用时编译产生以下对象代码：

replaceNaN(float*, int, float): test esi, esi jle .L6 lea eax, [rsi-1] lea rdx, [rdi+4+rax*4] .L13: mov eax, DWORD PTR [rdi] ; get original floating-point value test eax, 8388607 ; test if mantissa != 0 je .L9 shr eax, 16 ; test if exponent has all bits set and ax, 32640 cmp ax, 32640 jne .L9 movss DWORD PTR [rdi], xmm0 ; set newValue if original was NaN .L9: add rdi, 4 cmp rdx, rdi jne .L13 rep ret .L6: rep ret

NaN检查比简单ucomiss稍微复杂一点，然后是奇偶校验标志的测试，但只要您的编译器符合IEEE 754标准，它就保证是正确的。这适用于所有版本的GCC和任何其他编译器。

成功启用-fno-finite-math-only-NaN删除方法

1 个答案: