好几周前我发布了关于我的openCL实现中的一个错误,但似乎我必须从开始启动。那么,应该如何在OpenCL中实现下一个算法。
int m = 10;
int n = 10;
//arrA[] has m elements
//arrB[] has n elements
//arrC[] has m x n elements
for(int i = 0; i < m; i++)
{
for(int j = 0; j < n; j++)
{
arrC[i x j] = arrA[i] x arrB[j];
}
}
对于这种情况,我只需知道如何使用全局和本地ID来处理这个问题....因为我有点失落。非常感谢你
答案 0 :(得分:0)
这是我目前拥有的代码(这是实际代码的提取,因为我需要获得最大值)。
"sampleKernel(__global const double *bufferX,"
" __global const double *bufferY,"
" __global double* result,"
" __const int lengthX,"
" __const int lengthY){"
" const int index_a = get_global_id(0);"//Get the global indexes for 2D reference
" const int index_b = get_global_id(1);"
" const int local_index = get_local_id(0);"//Current thread id -> Should be the same as index_a * lengthY + index_b;
" if (local_index < (lengthX * lengthY)) {"// Load data into local memory
" if(index_a < lengthX && index_b < lengthY)"
" {"
" result[local_index] = bufferX[index_a] * bufferY[index_b];"
" }"
" } "
"}";
也许我应该使用get_local_id(1),并使用线程Id作为local_id_1 * N + local_id_2,其中N是local_id_2的最大值。