Question

下面的程序（好吧，“从这里”之后的行）是一个我必须经常使用的构造。我想知道是否可能（最终使用特征库中的函数）矢量化或以其他方式使该程序运行得更快。

基本上，给定float x的向量，此构造已恢复索引 x向量int中SIndex的已排序元素。例如，如果是第一个 SIndex的条目是10，这意味着x的第10个元素是最小的元素 x。

#include <algorithm>
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <vector>

using std::vector;
using namespace std;

typedef pair<int, float> sortData;
bool sortDataLess(const sortData& left, const sortData& right){
    return left.second<right.second;
}

int main(){
    int n=20,i;
    float LO=-1.0,HI=1.0;
    srand (time(NULL));
    vector<float> x(n);
    vector<float> y(n);
    vector<int> SIndex(n);  
    vector<sortData> foo(n);
    for(i=0;i<n;i++) x[i]=LO+(float)rand()/((float)RAND_MAX/(HI-LO));
    //from here:
    for(i=0;i<n;i++) foo[i]=sortData(i,x[i]);
    sort(foo.begin(),foo.end(),sortDataLess);
    for(i=0;i<n;i++){
        sortData bar=foo[i];
        y[i]=x[bar.first];
        SIndex[i]=bar.first;
    }
    for(i=0;i<n;i++) std::cout << SIndex[i] << std::endl;

    return 0;
}

Answer 1

这是一个排序问题，而且矢量化并不一定能很好地改进排序。例如，quicksort的分区步骤可以并行进行比较，但是需要选择并存储通过比较的0- n 值。这绝对可以完成，但它开始抛弃你从vectorization中获得的优势 - 你需要从比较掩码转换为shuffle掩码，这可能是一个查找表（坏），你需要一个可变大小的存储，这意味着没有对齐（坏，虽然maybe not that bad）。 Mergesort需要合并两个排序列表，其中一些案例可以通过向量化来改进，但在最坏的情况下（我认为）需要与标量案例相同的步骤数。

当然，有一个很好的机会，你可以在标准库的std::sort实现中完成从矢量化中获得的任何主要速度提升。但是，要获得它，您需要使用默认比较运算符对基元类型进行排序。

如果您担心性能问题，可以轻松避免最后一次循环。只需使用float数组作为比较对索引列表进行排序：

struct IndirectLess {
    template <typename T>
    IndirectLess(T iter) : values(&*iter) {}

    bool operator()(int left, int right)
    {
        return values[left] < values[right];
    }

    float const* values;
};

int main() {
    // ...
    std::vector<int> SIndex;
    SIndex.reserve(n);
    for (int i = 0; i < n; ++i)
        SIndex.push_back(n);

    std::sort(SIndex.begin(), SIndex.end(), IndirectLess(x.begin()));
    // ...
}

现在您只生成了已排序索引列表。您可能会丢失一些缓存局部性，因此对于非常大的列表，可能会变慢。此时，可能会根据体系结构对最后一个循环进行矢量化。这只是数据操作，虽然 - 读取四个值，在一个地方存储第一个和第三个，在另一个地方存储第二个和第四个 - 所以我不希望Eigen在那时帮助很多。

如何矢量化这个程序

1 个答案: