Question

在C ++中使用VexCL我试图将向量中的所有值都计算在某个最小值以上，并且我想在设备上执行此计数。默认的Reductor仅提供MIN，MAX和SUM的方法，并且示例不清楚如何执行此类操作。此代码很慢，因为它可能在主机而不是设备上执行：

int amount = 0;
int minimum = 5;

for (vex::vector<int>::iterator i = vector.begin(); i != vector.end(); ++i)
{
    if (*i >= minimum)
    {
        amount++;
    }
}

我使用的矢量将包含大量的值，比如数百万，大多数是零。除了超出最小值的值之外，我还想检索包含这些值的vector-ID列表。这可能吗？

Answer 1

如果您只需要计算超出最小值的元素，这就像

一样简单

vex::Reductor<int, vex::SUM> sum(ctx);
int amount = sum( vec >= minimum );

vec >= minimum表达式会生成1和0的序列，然后sum会计算1。

现在，既然您还需要将元素的位置置于最小值之上，那么它会变得更复杂：

#include <iostream>
#include <vexcl/vexcl.hpp>

int main() {
    vex::Context ctx(vex::Filter::Env && vex::Filter::Count(1));

    // Input vector
    vex::vector<int> vec(ctx, {1, 3, 5, 2, 6, 8, 0, 2, 4, 7});
    int n = vec.size();
    int minimum = 5;

    // Put result of (vec >= minimum) into key, and element indices into pos:
    vex::vector<int> key(ctx, n);
    vex::vector<int> pos(ctx, n);

    key = (vec >= minimum);
    pos = vex::element_index();

    // Get number of interesting elements in vec.
    vex::Reductor<int, vex::SUM> sum(ctx);
    int amount = sum(key);

    // Sort pos by key in descending order.
    vex::sort_by_key(key, pos, vex::greater<int>());

    // First 'amount' of elements in pos now hold indices of interesting
    // elements. Lets use slicer to extract them:
    vex::vector<int> indices(ctx, amount);

    vex::slicer<1> slice(vex::extents[n]);
    indices = slice[vex::range(0, amount)](pos);

    std::cout << "indices: " << indices << std::endl;
}

这给出了以下输出：

indices: {
    0:      2      4      5      9
}

Answer 2

@ddemidov

感谢您的帮助，它正在发挥作用。但是，它比我的原始代码慢得多，后者将设备向量复制到主机并使用Boost进行排序。以下是包含一些时间的示例代码：

#include <iostream>
#include <cstdio>
#include <vexcl/vexcl.hpp>
#include <vector>
#include <boost/range/algorithm.hpp>

int main()
{
    clock_t start, end;

    // initialize vector with random numbers
    std::vector<int> hostVector(1000000);
    for (int i = 0; i < hostVector.size(); ++i)
    {
        hostVector[i] = rand() % 20 + 1;
    }

    // copy to device
    vex::Context cpu(vex::Filter::Type(CL_DEVICE_TYPE_CPU) && vex::Filter::Any);
    vex::Context gpu(vex::Filter::Type(CL_DEVICE_TYPE_GPU) && vex::Filter::Any);
    vex::vector<int> vectorCPU(cpu, 1000000);
    vex::vector<int> vectorGPU(gpu, 1000000);
    copy(hostVector, vectorCPU);
    copy(hostVector, vectorGPU);

    // sort results on CPU
    start = clock();
    boost::sort(hostVector);
    end = clock();
    cout << "C++: " << (end - start) / (CLOCKS_PER_SEC / 1000) << " ms" << endl;

    // sort results on OpenCL
    start = clock();
    vex::sort(vectorCPU, vex::greater<int>());
    end = clock();
    cout << "vexcl CPU: " << (end - start) / (CLOCKS_PER_SEC / 1000) << " ms" << endl;

    start = clock();
    vex::sort(vectorGPU, vex::greater<int>());
    end = clock();
    cout << "vexcl GPU: " << (end - start) / (CLOCKS_PER_SEC / 1000) << " ms" << endl;

    return 0;
}

导致：

C++: 17 ms
vexcl CPU: 737 ms
vexcl GPU: 1670 ms

使用i7 3770 CPU和（慢速）HD4650显卡。在我读取OpenCL时，应该能够对大型顶点执行快速排序。您对如何使用OpenCL和vexcl执行快速排序有任何建议吗？

VexCL：计算向量中的值的数量高于最小值

2 个答案: