TL; DR

Question

在尝试在OpenCL中实现矩阵乘法时，我试图编写自己的方法;但似乎某些工作项目的工作似乎被其他工作项覆盖，我真的不知道如何处理这个问题。

我真正确定的是问题出在OpenCL程序中。

我的主机代码是C / C ++。

程序构建并返回输出（错误，但程序成功退出）。

这是我的方法：

__kernel void matrixMultiplication(
         __global double* matrix1,
         __global double* matrix2,
         __global double* output,
         const unsigned int ROWS_M1, // ROWS_M1 = 3
         const unsigned int ROWS_M1, // COLS_M1 = 2
         const unsigned int ROWS_M2, // ROWS_M2 = 2
         const unsigned int ROWS_M2, // COLS_M2 = 4
         const unsigned int ROWS_M3, // ROWS_M3 = 3
         const unsigned int ROWS_M3) { // COLS_M3 = 4

    int i = get_global_id(0);
    int j = get_global_id(1);

    // for each value in the matrix1 (for each work-item)
    // and for each value in the "jth" row in the second matrix...
    // multiply the values and then add them according to the right offset.

    for(int k =0; k < COLS_M2; k++){
        int offsetM1 = (i*COLS_M1)+j;
        int offsetM2 = (j*COLS_M2)+k;
        int offsetM3 = (i*COLS_M3)+k;

        //output[i][k] += matrix1[i][j]*matrix2[j][k];
        output[offsetM3] += matrix1[offsetM1]*matrix2[offsetM2];
    }

}

在代码中指定为每个“const unsigned int”设置的值。

Matrixes的值是：

矩阵1：

1 2
3 4
5 6

矩阵2：

2 3 4 5
6 7 8 9

给定输出：

12 14 16 18
24 28 32 36
36 42 48 54

期望的输出：

14 17 20 23
30 37 44 51
46 57 68 79

Answer 1

我认为您在索引方面做错了。 *offsetM3*应该等于*i\*COLS_M3+j*，*offsetM1*应该等于*i\*COLS_M1+k*，*offsetM2*应该等于*k\*COLS_M2+j*。

将矩阵写在纸上并进行数学运算，然后将矩阵写入内存中的数组中，然后将它们相乘，然后您将看到索引模式。请记住，每个线程（工作项）都是新矩阵的一个元素。如果通过for循环更改新矩阵的索引，则不会跟踪一个矩阵元素的逻辑一个工作项，如果您希望这样，则应考虑另一个逻辑。希望这有帮助

Answer 2

TL; DR

问题是我的循环。不要那样做很糟糕

现在，我已经完成了大学学业，我将花点时间写出自己的问题的正确答案，以便其他偶然发现同一问题的人都能找到答案。

在我编写循环的过程中，有一种情况是，各种工作项会与其他工作项重叠，从而在不同的执行测试之间产生不同的结果；基本上是一个互斥问题，您可以使用信号量轻松解决。

解决方案是在计算特定偏移量时使用不同的方法重写整个循环。

以下是为任何可能觉得有趣或有用的人解决我的问题的消息源

#pragma OPENCL EXTENSION cl_khr_fp64 : enable
__kernel void multiplyMatrix(                                  
   __global double* matrix1,                                   
   __global double* matrix2,                                   
   __global double* output,                                    
   const unsigned int ROWS_M1,                                 
   const unsigned int COLS_M1,                                          
   const unsigned int ROWS_M2,                                          
   const unsigned int COLS_M2,                                          
   const unsigned int ROWS_M3,                                          
   const unsigned int COLS_M3) {                                        

   int i = get_global_id(0);                                            
   int j = get_global_id(1);                                            
   double aux = 0.0;                                                    
   int offsetM1;                                                        
   int offsetM2;                                                        
   int offsetM3;                                                        
    // foreach value in the matrix1 (each process in the workgroup) 
    // and foreach row in the second matrix multiply the values 
    // adding to the according calculating offest/position      
    for(int k=0; k < COLS_M2; k++){                                 

        offsetM1 = (i*COLS_M1)+j;                                
        offsetM2 = (j*COLS_M2)+k;                                
        offsetM3 = (i*COLS_M3)+k;                                

        //output[i][k] += matrix1[i][j]*matrix2[j][k]              
        aux = 0.0;                                                 
        aux = (matrix1[offsetM1]*matrix2[offsetM2])  +aux;   

    }                                                            
    output[offsetM3] =aux;                                                                
}

OpenCL的。矩阵乘法绕过一些工作项

2 个答案:

TL; DR