应用错误收集

我正在编写一个代码，用于从相机实时处理图像。我正在使用带有Anaconda Accelerate / Numba软件包的Python 3.5来执行GPU上的大部分计算。我在实现一个函数时遇到问题，该函数将在float32 2d数组中找到最大元素的位置。该阵列已经在GPU内存中。问题是：它非常慢。这是我整个代码的瓶颈。代码：

@n_cuda.jit('void(float32[:,:], float32, float32, float32)')
def d_findcarpeak(temp_mat, height, width, peak_flat):
    row, col = cuda.grid(2)
    if row < height and col < width:
        peak_flat = temp_mat.argmax()

这就是我所说的：

d_findcarpeak[number_of_blocks, threads_per_block](
            d_temp_mat, height, width, d_peak_flat)

如何重写此代码？

Python numba：如何找到数组

0 个答案: