Question

我希望优化使用popcnt来计算uint8_t之间差异的一段代码。我认为将8 uint8_t组合成单个uintmax_t并使用popcnt64代替更快，以便popcnt操作不必被调用超过必要的8倍。将uint8_t 8 popcnt64送入vprintf的最快方法是什么？我可以使用某种铸造吗？我应该使用位操作吗？我不知道C ++的内部工作原因，所以我不确定进行这种转换的最快方法是什么。

Answer 1

假设您不关心字节顺序 - 您只是想将uint8_t视为uint64_t并且您不关心uint8_t s的顺序 - 那么你可以使用std::memcpy进行打字：

std::uint64_t combine(std::array<std::uint8_t, 8> b) {
    static_assert(sizeof(b) == sizeof(std::uint64_t));
    static_assert(std::is_trivially_copyable_v<std::uint64_t>);
    static_assert(std::is_trivially_copyable_v<decltype(b)>);

    std::uint64_t result;
    std::memcpy(&result, b.data(), sizeof(result));
    return result;
}

generated assembly只返回参数：

combine(std::array<unsigned char, 8ul>): # @combine(std::array<unsigned char, 8ul>)
  mov rax, rdi
  ret

使用其他任何类型进行惩罚使得您必须担心严格别名规则或类型对齐。只需使用std::memcpy并让编译器处理它就足够了

请注意，从C ++调用popcnt的任何变体的最简单方法是使用std::bitset::count。因此，您可以只编写__builtin_popcountll(my_u64)而不是__popcnt64(my_u64)或std::bitset<64>{my_u64}.count()，而是立即获得可移植代码。

将8 uint8_t组合成单个uintmax_t的最快方法是什么？

1 个答案: