在C ++中使用VexCL我试图将向量中的所有值都计算在某个最小值以上,并且我想在设备上执行此计数。默认的Reductor仅提供MIN,MAX和SUM的方法,并且示例不清楚如何执行此类操作。此代码很慢,因为它可能在主机而不是设备上执行:
int amount = 0;
int minimum = 5;
for (vex::vector<int>::iterator i = vector.begin(); i != vector.end(); ++i)
{
if (*i >= minimum)
{
amount++;
}
}
我使用的矢量将包含大量的值,比如数百万,大多数是零。除了超出最小值的值之外,我还想检索包含这些值的vector-ID列表。这可能吗?
答案 0 :(得分:1)
如果您只需要计算超出最小值的元素,这就像
一样简单vex::Reductor<int, vex::SUM> sum(ctx);
int amount = sum( vec >= minimum );
vec >= minimum
表达式会生成1和0的序列,然后sum
会计算1。
现在,既然您还需要将元素的位置置于最小值之上,那么它会变得更复杂:
#include <iostream>
#include <vexcl/vexcl.hpp>
int main() {
vex::Context ctx(vex::Filter::Env && vex::Filter::Count(1));
// Input vector
vex::vector<int> vec(ctx, {1, 3, 5, 2, 6, 8, 0, 2, 4, 7});
int n = vec.size();
int minimum = 5;
// Put result of (vec >= minimum) into key, and element indices into pos:
vex::vector<int> key(ctx, n);
vex::vector<int> pos(ctx, n);
key = (vec >= minimum);
pos = vex::element_index();
// Get number of interesting elements in vec.
vex::Reductor<int, vex::SUM> sum(ctx);
int amount = sum(key);
// Sort pos by key in descending order.
vex::sort_by_key(key, pos, vex::greater<int>());
// First 'amount' of elements in pos now hold indices of interesting
// elements. Lets use slicer to extract them:
vex::vector<int> indices(ctx, amount);
vex::slicer<1> slice(vex::extents[n]);
indices = slice[vex::range(0, amount)](pos);
std::cout << "indices: " << indices << std::endl;
}
这给出了以下输出:
indices: {
0: 2 4 5 9
}
答案 1 :(得分:0)
@ddemidov
感谢您的帮助,它正在发挥作用。但是,它比我的原始代码慢得多,后者将设备向量复制到主机并使用Boost进行排序。以下是包含一些时间的示例代码:
#include <iostream>
#include <cstdio>
#include <vexcl/vexcl.hpp>
#include <vector>
#include <boost/range/algorithm.hpp>
int main()
{
clock_t start, end;
// initialize vector with random numbers
std::vector<int> hostVector(1000000);
for (int i = 0; i < hostVector.size(); ++i)
{
hostVector[i] = rand() % 20 + 1;
}
// copy to device
vex::Context cpu(vex::Filter::Type(CL_DEVICE_TYPE_CPU) && vex::Filter::Any);
vex::Context gpu(vex::Filter::Type(CL_DEVICE_TYPE_GPU) && vex::Filter::Any);
vex::vector<int> vectorCPU(cpu, 1000000);
vex::vector<int> vectorGPU(gpu, 1000000);
copy(hostVector, vectorCPU);
copy(hostVector, vectorGPU);
// sort results on CPU
start = clock();
boost::sort(hostVector);
end = clock();
cout << "C++: " << (end - start) / (CLOCKS_PER_SEC / 1000) << " ms" << endl;
// sort results on OpenCL
start = clock();
vex::sort(vectorCPU, vex::greater<int>());
end = clock();
cout << "vexcl CPU: " << (end - start) / (CLOCKS_PER_SEC / 1000) << " ms" << endl;
start = clock();
vex::sort(vectorGPU, vex::greater<int>());
end = clock();
cout << "vexcl GPU: " << (end - start) / (CLOCKS_PER_SEC / 1000) << " ms" << endl;
return 0;
}
导致:
C++: 17 ms
vexcl CPU: 737 ms
vexcl GPU: 1670 ms
使用i7 3770 CPU和(慢速)HD4650显卡。在我读取OpenCL时,应该能够对大型顶点执行快速排序。您对如何使用OpenCL和vexcl执行快速排序有任何建议吗?