Question

如何使用最少数量的英特尔指令以及没有分支或条件移动来实现以下目标：

unsigned compare(unsigned x
  ,unsigned y) {
    return (x == y)? ~0 : 0;
}

这是热门代码路径，我需要挤出最多。

Answer 1

GCC很好地解决了这个问题，它在使用-O2和up编译时知道了否定技巧：

unsigned compare(unsigned x, unsigned y) {
    return (x == y)? ~0 : 0;
}

unsigned compare2(unsigned x, unsigned y) {
    return -static_cast<unsigned>(x == y);
}

compare(unsigned int, unsigned int):
        xor     eax, eax
        cmp     edi, esi
        sete    al
        neg     eax
        ret
compare2(unsigned int, unsigned int):
        xor     eax, eax
        cmp     edi, esi
        sete    al
        neg     eax
        ret

Visual Studio生成以下代码：

compare2, COMDAT PROC
        xor      eax, eax
        or       r8d, -1                    ; ffffffffH
        cmp      ecx, edx
        cmove    eax, r8d
        ret      0
compare2 ENDP
compare, COMDAT PROC
        xor      eax, eax
        cmp      ecx, edx
        setne    al
        dec      eax
        ret      0
compare ENDP

这里似乎第一个版本避免了条件移动（注意功能的顺序已经改变）。

要查看其他编译器的解决方案，请尝试将代码粘贴到 https://gcc.godbolt.org/（添加优化标记）。

有趣的是，第一个版本在icc上生成更短的代码。基本上，您必须使用编译器为每个版本测量实际性能，并选择最佳版本。

此外，我不太确定条件寄存器移动比其他操作慢。

我假设您编写的函数只是为了向我们展示代码的相关部分，但是像这样的函数将是内联的理想候选者，可能允许编译器执行更有用的优化，这些优化涉及代码，这是实际使用过。这可能允许编译器/ CPU将此计算与其他代码并行化，或合并某些操作。

因此，假设这确实是代码中的一个函数，请使用inline关键字编写它并将其放在标题中。

Answer 2

return -int(x==y)非常简洁的C ++。当然，编译器仍然需要将其转换为高效的汇编。

适用于int(true)==1和unsigned (-1)==~0U。

如果相等则将无符号的所有位设置为1，否则设置为0

2 个答案: