Question

我正在尝试使用Thrust来检测是否可以在另一个数组中找到数组的每个元素以及在哪里（两个数组都已排序）。我遇到了矢量化搜索例程（lower_bound和binary_search）。

lower_bound将为每个值返回索引所在的索引，该列表可以在列表中插入，与其顺序相符。

我还需要知道是否找到了值（可以使用binary_search完成），而不仅仅是它的位置。

是否可以在不进行两次搜索的情况下有效地实现这两种搜索（调用binary_search然后调用lower_bound）？

我知道在标量情况下，如果无法找到值，lower_bound将返回指向数组末尾的指针，但这在矢量化版本中不会发生。

谢谢！

Answer 1

您可以检查lower_bound返回的元素是否与您搜索的元素相同。例如。给定a = {1,3,5}并搜索b = {1,4}，结果将为c = {0,2}。我们有a[c[0]] == b[0]，因此b[0]位于a，但a[c[1]] != b[1]因此b[1]不在a。

（注意，您需要确保不进行任何越界内存访问，因为lower_bound可以返回超出数组末尾的索引。）

Answer 2

@ tat0：你也可以玩Arrayfire：使用lower_bound（）的矢量化搜索不会立即给出答案使用arrayfire中的setintersect（），您可以直接获得两个数组的“交集”：

float A_host[] = {3,22,4,5,2,9,234,11,6,17,7,873,23,45,454};
int szA = sizeof(A_host) / sizeof(float); 

float B_host[] = {345,5,55,6,7,8,19,2,63}; 
int szB = sizeof(B_host) / sizeof(float); 

// initialize arrays from host data
array A(szA, 1, A_host);
array B(szB, 1, B_host);

array U = setintersect(A, B); // compute intersection of 2 arrays

int n_common = U.elements();
std::cout << "common: ";     
print(U);

输出是：常见：U = 2.0000 5.0000 6.0000 7.0000

要获取阵列A中这些元素的实际位置，您可以使用以下内容构造（假设A中的元素是唯一的）：

int n_common = U.elements();
array loc = zeros(n_common); // empty array      

gfor(array i, n_common) // parallel for loop
     loc(i) = sum((A == U(i))*seq(szA));
print(loc);

然后：loc = 4.0000 3.0000 8.0000 10.0000

此外，thrust :: lower_bound（）似乎比setintersect（）慢，我用以下程序对其进行了基准测试：

int *g_data = 0;
int g_N = 0;

void thrust_test() {
 thrust::device_ptr<int> A = thrust::device_pointer_cast((int *)g_data),
     B = thrust::device_pointer_cast((int *)g_data + g_N);
 thrust::device_vector<int> output(g_N);
 thrust::lower_bound(A, A + g_N, B, B + g_N, 
                  output.begin(),
                  thrust::less<int>());
 std::cout << "thrust: " << output.size() << "\n";
}
void af_test() 
{   
  array A(g_N, 1, g_data, afDevicePointer);
  array B(g_N, 1, g_data + g_N, afDevicePointer);
  array U = setintersect(A, B);
  std::cout << "intersection sz: " << U.elements() << "\n";
}
int main()
{
  g_N = 3e6; // 3M entries
  thrust::host_vector< int > input(g_N*2);
  for(int i = 0; i < g_N*2; i++) {  // generate some input
    if(i & 1)
       input[i] = (i*i) % 1131;
    else
       input[i] = (i*i*i-1) % 1223 ;
 }
 thrust::device_vector< int > dev_input = input;
 // sort the vector A
 thrust::sort(dev_input.begin(), dev_input.begin() + g_N);
 // sort the vector B
 thrust::sort(dev_input.begin() + g_N, dev_input.begin() + g_N*2);
 g_data = thrust::raw_pointer_cast(dev_input.data());
 try {
    info();
    printf("thrust:  %.5f seconds\n", timeit(thrust_test));
    printf("af:  %.5f seconds\n", timeit(af_test));
 } catch (af::exception& e) {
     fprintf(stderr, "%s\n", e.what());
 }
return 0;
}

和结果：

CUDA工具包4.2，驱动程序295.59

GPU0 GeForce GT 650M，2048 MB，Compute 3.0（单，双）

内存使用：1937 MB免费（总共2048 MB）

推力：0.13008秒

阵火：0.06702秒

推力向量化搜索：有效地组合lower_bound和binary_search以找到位置和存在

2 个答案: