Question

我正在尝试编写一些代码，以了解有关汇编和诸如JIT编译器之类的更多信息。到目前为止，我已经能够提出一个XOR函数，从理论上讲，它应该可以在Windows和Linux环境中的x86或x64机器上工作。

假设我理解正确，[RE]AX寄存器用于保存整数返回值，而[RE]DX是可在函数之间传递整数的可用寄存器之一。我选择不严格遵循ABI并使用[RE]AX传递第一个参数，因为它保存了MOV指令而不会影响结果。

是否有更好的（更优雅或更有效的）方式来发布跨平台程序集，或者在开发此程序时是否犯了任何错误？

#include <cstdint>
#include <iostream>

template<typename TInput>
static auto Xor(TInput const highPart, TInput const lowPart) {
    constexpr bool is16Bit = (std::is_same<TInput, int16_t>::value || std::is_same<TInput, uint16_t>::value);
    constexpr bool is32Bit = (std::is_same<TInput, int32_t>::value || std::is_same<TInput, uint32_t>::value);
    static_assert(is16Bit || is32Bit, "type must be a member of the type family: [u]int{16, 32}_t");

    if constexpr (is16Bit) {
        uint16_t result;

        #if (defined(__linux__) || defined(__unix__) || defined(_WIN32))
            asm volatile ("xorw %%dx, %%ax;" : "=a" (result) : "a" (highPart), "d" (lowPart));
        #else
            #error "Unsupported platform detected."
        #endif

        return result;
    }
    else if constexpr (is32Bit) {
        uint32_t result;

        #if (defined(__linux__) || defined(__unix__) || defined(_WIN32))
            asm volatile ("xorl %%edx, %%eax;" : "=a" (result) : "a" (highPart), "d" (lowPart));
        #else
            #error "Unsupported platform detected."
        #endif

        return result;
    }
}

#define HIGH_PART 4;
#define LOW_PART 8;

int main() {
    int16_t const a = HIGH_PART;
    int16_t const b = LOW_PART;
    int16_t const c = Xor(a, b);

    uint32_t const x = HIGH_PART;
    uint32_t const y = LOW_PART;
    uint32_t const z = Xor(x, y);

    std::cout << c << "\n";
    std::cout << z << "\n";
    getchar();

    return 0;
}

下面是一个可以如何改进的示例。通过“吊起” result变量和if defined(...)检查上方的constexpr检查，我们可以使事情变得更通用。

template<typename T>
static auto Xor(T const highPart, T const lowPart) {
    constexpr bool is16Bit = (std::is_same<T, int16_t>::value || std::is_same<T, uint16_t>::value);
    constexpr bool is32Bit = (std::is_same<T, int32_t>::value || std::is_same<T, uint32_t>::value);
    static_assert(is16Bit || is32Bit, "type must be a member of the type family: [u]int{16, 32}_t");

    #if !(defined(__linux__) || defined(__unix__) || defined(_WIN32))
        #error "Unsupported platform detected."
    #endif

    T result;

    if constexpr (is16Bit) {
        asm volatile ("xorw %%dx, %%ax;" : "=a" (result) : "a" (highPart), "d" (lowPart));
    }
    else if constexpr (is32Bit) {
        asm volatile ("xorl %%edx, %%eax;" : "=a" (result) : "a" (highPart), "d" (lowPart));
    }

    return result;
}

Answer 1

您不能使编译器以64位模式在EAX / RAX中传递函数arg。在32位模式下，您可以使用gcc“ regparm”调用约定，例如__attribute__((regparm(3))) int my_func(int,int);来按顺序在EAX，ECX，EDX中传递args。（因此，编译器将需要在内联汇编中使用mov，该汇编在EAX中具有arg函数）。

或者，即使在Windows上进行编译，也可以使用__attribute__((sysv_abi))声明函数以始终使用SysV ABI。但这仅在所有调用者均由GCC / clang / ICC而不是MSVC编译的情况下有效。而且在32位模式下更糟。 i386 System V的调用约定是废话：在堆栈上传递所有arg，并且在edx：eax中仅返回int64_t，而不是2成员64位结构。

调用sysv_abi函数也可能会使用ms_abi函数来保存/恢复所有xmm6..15，除非sysv_abi函数调用可以内联并优化。因此，总的来说，如果该功能尚未大量使用XMM规则并保存/恢复大多数XMM规则，则可能是个不好的计划。

使用固定寄存器输入/输出约束通常并不有用，除非您使用带隐式寄存器的指令（例如，cl中的移位计数，如果您不能使用BMI2 shlx / shrx。

让编译器通过使用"r"和"+r"约束来注册分配。（或"=r"和"0"匹配约束），因此无论值在哪里，您的函数都可以高效地内联。也可以将"re"用于可以注册或32位立即数的输入。甚至"rem"也可以用作内存输入。但是，如果您反复使用输入，最好让编译器在asm之前为您加载它。

另请参阅https://stackoverflow.com/tags/inline-assembly/info

对寄存器分配进行硬编码会部分地抵消使用内联asm而不是编译器必须调用而不是内联的独立asm函数的目的。

查看编译器为您的代码生成的asm，以查看其生成了哪些周围的代码，以及如何通过选择操作数将其填充到模板中。

还要注意，"r"为16位类型选择16位寄存器，为32位类型选择32位寄存器，因此基本上不需要所有这些类型大小调整工作。（尽管取决于输入的编写方式，但使用32位xor可能比16位xor更好，如果以后再读取完整的32位或64位寄存器，则可以避免部分寄存器停顿。但是，如果输入reg是用16位操作数大小写的，然后在P6系列CPU上32位xor会创建部分寄存器停顿。）您可以用{{覆盖为"xor %0"模板替换填充的大小1}}（32位大小等）。请参见x86 Operand Modifiers in the GCC manual。

跨平台组装（（x64 || x86）&&（Microsoft x64 || SystemV））

1 个答案: