我正在使用cuda进行编程:
逐步检查,每件事情都按预期进行,直到累计总和。当我启动代码时,软件计算的值是104347,但有时来自CUDA我得到一个nan结果,有时我得到任何数字,例如2425.非常奇怪的是如果我坚持运行内核20或30次,则值变为预期的104347:S。
我正在使用每个矩阵:
h_Data = (float *)malloc(data_size);
h_diff = (float *)malloc(data_size);
h_A = (float *)malloc(data_size);
和
cudaFree(d_A);
cudaFree(d_diff);
cudaFree(d_Av);
所以当我运行足够的时间时,我不明白为什么代码越来越接近正确的结果。顺便说一句,当达到正确的值时,无论我运行代码多少次,它都不会再移动。
代码:
__global__ void spam(float *d_Data, float *d_diff, float *d_A, int dw, int dh, float *d_Av){
long bx = blockIdx.x; long by = blockIdx.y;
long tx = threadIdx.x; long ty = threadIdx.y;
// Identify the row and column of the Pd element to work on
long Row = by * TILE_WIDTH + ty;
long Col = bx * TILE_WIDTH + tx;
long tid = Row*dw+Col;
long i=512*512;
long r = MASK_DIM/2;
long s = 0;
__shared__ int tile[BLOCK_WIDTH][BLOCK_WIDTH];
for (int k=0; k<=8; k++)
d_Av[k]=0;
if(tid < dw*dh)
{
// to shared memory.
tile[ty + r][tx + r]=d_Data[Row*dw+Col];
if (Col-r >=0) tile[ty + r] [tx] = d_Data[Row*dw+Col-r];
if (Col+r <dw) tile[ty + r] [tx + 2*r] = d_Data[Row*dw+Col+r];
if (Row-r >=0) tile[ty] [tx + r] = d_Data[(Row - r)*dw + Col];
if (Row+r <dw) tile[ty + 2*r][tx + r] = d_Data[(Row + r)*dw + Col];
if (Row - r >= 0 && Col - r >= 0) tile[ty] [tx] = d_Data[(Row-r)*dw+Col-r];
if(Row - r >= 0 && Col + r < dw) tile[ty] [tx + 2*r] = d_Data[(Row-r)*dw+Col+r];
if (Row + r < dw && Col - r >= 0) tile[ty + 2*r][tx] = d_Data[(Row+r)*dw+Col-r];
if(Row + r <dw && Col + r < dw) tile[ty + 2*r][tx + 2*r] = d_Data[(Row-r)*dw+Col+r];
//Calculates the difference matrix
d_diff[tid] = (tile[ty + r][tx +r] - tile[ty + r][tx + r + 1]);
d_A[tid]=0;
//Set a 1 in each position in d_A where 0 was found in d_diff.
if (d_diff[tid] == 0)
{ d_A[tid]=1;}
__syncthreads();
//cumulative sum to get the frecuency of value 0 in d_diff. // The error is HERE
for (s = (i/2); s>=1; s=s/2) {
if (tid < s)
{ d_A[tid] += d_A[tid+s];
}
}
// set the frequency value in frequencies vector.
d_Av[0] = d_A[0];
}} // END IF tid < dw*dh
欢迎任何想法:D
答案 0 :(得分:1)
您可以尝试使用以下代码替换if语句:
d_A[tid] += d_A[tid+s] * (tid < s);
并确保此代码不会导致竞争条件。它通常可以是并行求和的情况。
MK