我想知道我们是否可以使用OpenMP或CUDA加速此循环。目前,它通过顺序处理运行良好,但我试图优化我的编码:
#pragma omp parallel for private(curCol) shared(curIndex)
为了使处理并行化,我尝试了以下操作但没有效果:
list = [2, 3, 4, 6, 8, 9, 10, 11, 20]
mid = len(list) / 2
left = 0
right = len(list)
def searchNumber(left, right, number, mid):
**while left < right:**
mid = (right - left) / 2 + left
if list[mid] == number:
print("the local is in %d" % (mid))
return mid
break
elif list[mid] > number:
right = mid - 1
else:
left = mid + 1
我怀疑是使用.push_back,但我可能错了......
如何改进此代码?
答案 0 :(得分:0)
首先预先记录整个事情:
for (int curCol = 0; curCol < numRows; ++curCol)
{
vec_L_val[curCol].resize( SIZE_OF_THE_INNER_VECTOR );
vec_L_indices[curCol].resize( SIZE_OF_THE_INNER_VECTOR );
vec_U_val[curCol].resize(SIZE_OF_THE_INNER_VECTOR )
vec_U_indices[curCol].resize(SIZE_OF_INNER_VECTOR )
}
然后你的内部循环可能会更快,因为内部向量结构内部不需要realloc。
for (int curCol = 0; curCol < numRows; ++curCol){ //Long Loop
int lb = csc_colIndices[curCol];
int ub = csc_colIndices[curCol + 1];
// push back the diagonal value to L matrix
vec_L_val[curCol].push_back(1.0f);
vec_L_indices[curCol].push_back(curCol);
for (int curIndex = lb; curIndex < ub; ++curIndex){
int curRow = csc_indices[curIndex];
float curVal = csc_val[curIndex];
if (!Equal(curVal, 0) && curRow <= curCol){// U entry
vec_U_val[curCol].push_back(curVal);
vec_U_indices[curCol].push_back(curRow);
}
else if (!Equal(curVal, 0) && curRow > curCol){// L entry
vec_L_val[curCol].push_back(curVal);
vec_L_indices[curCol].push_back(curRow);
}
}
}