Question

我目前正在尝试在Python中实现一个函数，该函数应该在图像中找到某个颜色值的出现，以便确定颜色区域的边界框。这似乎有效，尽管速度非常慢。在单个1920x1080图像上迭代大约需要30秒。我尝试将其转换为Cython代码，每个图像只提高了约2秒的性能。这仍然是我正在寻找的羞涩。因为我是Cython的新手，所以我希望你能给我一些改进的提示。你可以在下面看到我的代码，非常感谢！

cimport cython

import numpy as np
cimport numpy as np

@cython.wraparound(False)
@cython.boundscheck(False)
cdef _cget_bboxes_(img):

    cdef int y_lim = img.shape[0]
    cdef int x_lim = img.shape[1]

    cdef np.ndarray img_array = img

    color_dict = {}


    cdef int y, x

    for y in range(y_lim):
        for x in range(x_lim):

            pix = img_array[y][x]
            pix = tuple(pix)

            if np.any(pix >= (10, 10, 10)):
                if pix not in color_dict:

                    color_dict[pix] = {"min_x": x, "max_x": x, "min_y": y, "max_y": y, "count": 1}

                else:

                    if color_dict[pix]["min_x"] >= x:
                        color_dict[pix]["min_x"] = x

                    if color_dict[pix]["max_x"] <= x:
                        color_dict[pix]["max_x"] = x

                    if color_dict[pix]["min_y"] >= y:
                        color_dict[pix]["min_y"] = y

                    if color_dict[pix]["max_y"] <= y:
                        color_dict[pix]["max_y"] = y

                color_dict[pix]["count"] += 1

    return color_dict

Answer 1

使用字典查找颜色三元组是一个非常糟糕的主意。你有三元组值的固定范围（我假设0..255）。用尺寸为256x256x256的3D数组替换字典会极大地加速代码（查找将非常简单）

请注意，您正在做的是计算颜色直方图。如果这在某个地方并不存在并且在Python中可用，我会感到惊讶。

此外，颜色直方图通常在更粗略量化的颜色值上计算，例如在每个维度中使用64个区间。这将减少内存使用并提高速度，并且在大多数应用程序中都不太重要。

Answer 2

我发现您能够让代码在大约1秒钟内运行，并且对性能感到满意。但是，您可以使用numpy structured arrays的强大功能使代码更快！

根据@chrisb和@CrisLuengo的建议，您不仅要向变量添加类型信息，还要选择适当的数据结构。我建议你看一下this blog post但简而言之，像dict这样的Python容器不会在内存中连续存储数据，而是在访问特定元素时需要“解锁”指向python对象的指针。这很慢并且会损害CPU缓存性能。

以下是我的_cget_bboxes_函数的版本：

cimport cython
from libc.stdint cimport uint8_t
import numpy as np
cimport numpy as np

cdef packed struct ColorData:
    np.uint16_t min_x, max_x, min_y, max_y
    np.uint32_t count

@cython.wraparound(False)
@cython.boundscheck(False)
cpdef get_histogram(np.uint8_t[:, :, :] img):
    cdef int y_lim = img.shape[0]
    cdef int x_lim = img.shape[1]
    cdef int y, x
    cdef uint8_t r, g, b

    """
    #You can define a numpy structured array dtype by hand using tuples...
    cdef np.dtype color_dtype = np.dtype([
        ("min_x", np.uint16),
        ("max_x", np.uint16),
        ("min_y", np.uint16),
        ("max_y", np.uint16),
        ("count", np.uint32)])
    """

    """
    Or, instead of rewriting the struct's definition as a numpy dtype, you can use this generic approach:
    1- making a temp object
    2- getting its pointer
    3- converting to memoryview
    4- converting to numpy array
    5- then getting that numpy array's dtype
    """
    cdef ColorData _color
    cdef np.dtype color_dtype = np.asarray(<ColorData[:1]>(&_color)).dtype


    #cdef ColorData[:, :, :] out#this alternatively works
    cdef np.ndarray[ColorData, ndim=3] out
    out = np.zeros(shape=(256, 256, 256), dtype=color_dtype)

    for y in range(y_lim):
        for x in range(x_lim):
            r = img[y, x, 0]
            g = img[y, x, 1]
            b = img[y, x, 2]
            if r >= 10 or g >= 10 or b >= 10:
                if out[r, g, b].count == 0:
                    out[r, g, b] = [x, x, y, y, 1]
                    """
                    out[r, g, b].min_x = x
                    out[r, g, b].max_x = x
                    out[r, g, b].min_y = y
                    out[r, g, b].max_y = y
                    out[r, g, b].count = 1
                    """
                else:
                    if out[r, g, b].min_x >= x:
                        out[r, g, b].min_x = x
                    if out[r, g, b].max_x <= x:
                        out[r, g, b].max_x = x
                    if out[r, g, b].min_y >= y:
                        out[r, g, b].min_y = y
                    if out[r, g, b].max_y <= y:
                        out[r, g, b].max_y = y
                    out[r, g, b].count += 1
    return out

要“键入”一个numpy结构化数组，我必须包含一个与数组的dtype对应的结构定义。我也在我的循环中注意避免生成元组以索引到out数组。相比之下，对于笔记本电脑上的1920x1080图像，此代码大约需要0.02秒。希望这有助于演示如何充分利用Cython的编译性质！

Answer 3

使用--annotate运行cython会突出显示与python交互很多的部分，这将为您提供更好的指示。有几件事情立即跳出来：

1）只是清理但是img应该直接在函数sig中输入，img_array的赋值是不必要的

2）np.ndarray不是一个特定的类型，你还需要底层的dtype。我喜欢memoryview语法，所以你的函数sig可能是

def _cget_boxes(np.uint8_t[:, :, :] img)

3）任何可以打字的东西都应该

4）与阵列和c型标量相比，元组和dicts很慢。尝试将color_dict重构为一组数组可能（或可能不会）更好。

如何更快地使这个图像处理功能？已经尝试过Cython

3 个答案: