Question

CPU OCL上的版本会产生正确的结果，其中某些地方的GPU OCL在某些地方会产生略微不同的结果，这些结果会影响结果的正确性。我在Intel OCL SDK上进行了调试，得到了正确的结果。我没有发现任何竞争条件或同时访问内存。我在内核（一行代码）中引入了pown函数后出现了这个问题。

void kernel knapsack(global int *input_f, global int *output_f, global uint *m_d,  int cmax, int weightk, int pk, int maxelem, int i){

int c = get_global_id(0)+cmax;

if(get_global_id(0)<maxelem){
    if(input_f[c] < input_f[c - weightk] + pk){
        output_f[c] = input_f[c - weightk] + pk;
        m_d[c-1] = pown(2.0,i); *//previous version: m_d[c-1] = 1;*
    } 
    else{
    output_f[c] = input_f[c];

    }   
 }
}

pown的目的是压缩保存结果的m_d缓冲区。

For example 
1 0 1 0    2^0+2^2, 2^1, 2^0, 2^1  
0 1 0 1 =>
1 0 0 0

在gpu上我得到这样的东西：

    2^0+2^2, 2^1, 2^0+2^2, 2^1  in the 
    3rd column I access to pown one more again, when I'm not supposed to.

这给了我＆＃34;轻微＆＃34;不同的结果。 Here you can find full code

这项工作基于这篇文章 ：

背包算法：在gpu上使用pown（）的奇怪行为

0 个答案: