Question

我正在CUDA中进行简单的天真字符串搜索。

我是CUDA的新手。它适用于较小的文件（aprox .~1MB）。在我将这些文件变大之后（在记事本++中ctrl + a ctrl + c几次），我的程序结果比

更高（约+ 1％）

 grep -o text file_name | wc -l

这是非常简单的功能，所以我不知道是什么原因引起的。我需要它来处理更大的文件（~500MB）。

内核代码（ gpuCount 是__device__ int global variable）：

__global__ void stringSearchGpu(char *data, int dataLength, char *input, int inputLength){ 
     int id = blockDim.x*blockIdx.x + threadIdx.x;
     if (id < dataLength)
     {
         int fMatch = 1;
         for (int j = 0; j < inputLength; j++)
         {
            if (data[id + j] != input[j]) fMatch = 0;
         }
         if (fMatch)
         {
             atomicAdd(&gpuCount, 1);
         }
     }
 }

这是在main函数中调用内核：

    int blocks = 1, threads = fileSize;

    if (fileSize > 1024)
    {
        blocks = (fileSize / 1024) + 1;
        threads = 1024;
    }

    clock_t cpu_start = clock();
    // kernel call
    stringSearchGpu<<<blocks, threads>>>(cudaBuffer, strlen(buffer), cudaInput, strlen(input));
    cudaDeviceSynchronize();

在此之后，我只是将结果复制到主机并打印出来。

任何人都可以帮我这个吗？

Answer 1

首先，您应该始终检查CUDA函数的返回值以检查错误。最好的方法是：

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true)
{
   if (code != cudaSuccess) 
   {
      fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
      if (abort) exit(code);
   }
}

包裹您的CUDA电话，例如：

gpuErrchk(cudaDeviceSynchronize());

其次，你的内核访问越界内存。假设dataLength=100，inputLength=7和id=98。在你的内核代码中：

if (id < dataLength) // 98 is less than 100, so condition true
     {
         int fMatch = 1;
         for (int j = 0; j < inputLength; j++) // j runs from [0 - 6]
         {
            // if j>1 then id+j>=100, which is out of bounds, illegal operation
            if (data[id + j] != input[j]) fMatch = 0;
         }

将条件更改为：

if (id < dataLength - inputLength)

CUDA字符串在大文件中搜索，结果错误

1 个答案: