Question

伙计们，想象一下我有一个模板功能：

template <typename T> Vector<T>* Vector<T>::overwrite(const Vector<T>* copy) {
    this->_normalized = copy->_normalized;

    this->_data[0] = copy->_data[0];
    this->_data[1] = copy->_data[1];
    this->_data[2] = copy->_data[2];
    this->_data[3] = copy->_data[3];

    return this;
}

及其规范：

template <> Vector<float>* Vector<float>::overwrite(const Vector<float>* copy) {
    __m128 data = _mm_load_ps(copy->_data);

    _mm_store_ps(this->_data, data);

    return this;
}

现在我想确保处理器支持SSE，特别是该处理器具有XMM寄存器以使用1条指令复制4个浮点数。那么我想对 double 有相同的功能，所以我需要YMM寄存器。

所以我想知道是否有办法在运行时确定XMM和YMM的可用性。

另一个更优选的选择是以某种方式知道在预处理器中解决问题。即所以我写了类似的东西：

template <typename T> Vector<T>* Vector<T>::overwrite(const Vector<T>* copy) {
    this->_normalized = copy->_normalized;

    this->_data[0] = copy->_data[0];
    this->_data[1] = copy->_data[1];
    this->_data[2] = copy->_data[2];
    this->_data[3] = copy->_data[3];

    return this;
}

#ifdef XMM_ARE_AVAILABLE
template <> Vector<float>* Vector<float>::overwrite(const Vector<float>* copy) {
    __m128 data = _mm_load_ps(copy->_data);

    _mm_store_ps(this->_data, data);

    return this;
}
#endif

#ifdef YMM_ARE_AVAILABLE
template <> Vector<double>* Vector<double>::overwrite(const Vector<double>* copy) {
    /* code that moves four doubles */

    return this;
}
#endif

谢谢！

Answer 1

最好说服编译器对其进行矢量化，而不是使用内在函数或asm编写专用代码。麻烦的部分是指针（或引用，如果它们已被使用），Vector参数指针和“this”。

不幸的是，C ++不直接支持“限制”，但大多数编译器应该具有某种特定于实现的属性来执行此操作。将__restrict__与g ++一起使用，这也适用于函数类型以重新定义“this”：

http://gcc.gnu.org/onlinedocs/gcc/Restricted-Pointers.html

某些编译器（例如llvm）可以将展开的循环矢量化为直线代码，如示例所示。其他人希望重新循环。通常，对于自动向量化目的，最好编写循环而不是手工展开的代码。

此示例是自动矢量化的一个简单案例，您将拥有更多可移植代码。

我怎么知道处理器是否有xmm寄存器或ymm寄存器

1 个答案: