Question

我正在使用下一个算法来执行最近邻居调整大小。无论如何都要优化它的速度？输入和输出缓冲区采用ARGB格式，但已知图像始终不透明。谢谢。

void resizeNearestNeighbor(const uint8_t* input, uint8_t* output, int sourceWidth, int sourceHeight, int targetWidth, int targetHeight)
{
    const int x_ratio = (int)((sourceWidth << 16) / targetWidth);
    const int y_ratio = (int)((sourceHeight << 16) / targetHeight) ;
    const int colors = 4;

    for (int y = 0; y < targetHeight; y++)
    {
        int y2_xsource = ((y * y_ratio) >> 16) * sourceWidth;
        int i_xdest = y * targetWidth;

        for (int x = 0; x < targetWidth; x++)
        {
            int x2 = ((x * x_ratio) >> 16) ;
            int y2_x2_colors = (y2_xsource + x2) * colors;
            int i_x_colors = (i_xdest + x) * colors;

            output[i_x_colors]     = input[y2_x2_colors];
            output[i_x_colors + 1] = input[y2_x2_colors + 1];
            output[i_x_colors + 2] = input[y2_x2_colors + 2];
            output[i_x_colors + 3] = input[y2_x2_colors + 3];
        }
    }
}

Answer 1

假设没有别名，

restrict关键字会有很大帮助。

另一个改进是将另一个pointerToOutput和pointerToInput声明为uint_32_t，这样四个8位副本分配就可以合并为一个32位，假设指针是32位对齐。

Answer 2

你可以采取一些措施来加快速度，因为你已经按正确的顺序排列了循环并巧妙地使用了定点运算。正如其他人所建议的那样，尝试一次性移动32位（希望编译器还没有看到）。

如果显着放大，有可能：您可以确定每个源像素需要复制多少次（您需要处理整数关系Xd = Wd.Xs / Ws的属性），并执行k次写入的单个像素读取。这也适用于y，你可以记忆相同的行而不是重新计算它们。您可以使用游程编码对X和Y的映射进行预计算和制表。

但是有一个障碍你不会通过：你需要填写目标图像。

如果您正在拼命寻找加速，可以选择使用矢量运算（SEE或AVX）一次处理多个像素。可以使用随机指令来控制像素的复制（或抽取）。但是由于复杂的复制模式与向量寄存器的固定结构相结合，您可能需要集成一个复杂的决策表。

Answer 3

算法很好，但您可以通过将图像提交到GPU来利用大规模并行化。如果使用opengl，只需创建新大小的上下文并提供正确大小的四边形就可以为您提供固有的最近邻居计算。另外，opengl可以让你通过简单地改变你读取的纹理的属性来访问其他调整大小的采样技术（这相当于一个gl命令，这可能是调整大小函数的一个简单参数）。

在开发后期，你可以简单地换一个着色器进行其他混合技术，这也可以让你利用你精彩的GPU处理器处理图像处理。

此外，由于您没有使用任何花哨的几何体，因此编写程序几乎是微不足道的。它会比您的算法更复杂，但它可以根据图像大小更快地执行幅度。

Answer 4

我希望我没有破坏任何东西。这结合了迄今为止发布的一些建议，速度提高了约30％。我很惊讶这就是我们得到的一切。我实际上没有检查目标图像，看它是否正确。

的变化： - 从内循环中删除倍数（提高10％） - uint32_t而不是uint8_t（提高10％） - __restrict关键字（1％改进）

这是在运行Windows的i7 x64计算机上，使用MSVC 2013编译。您必须更改其他编译器的__restrict关键字。

void resizeNearestNeighbor2_32(const uint8_t* __restrict input, uint8_t* __restrict output, int sourceWidth, int sourceHeight, int targetWidth, int targetHeight)
{
    const uint32_t* input32 = (const uint32_t*)input;
    uint32_t* output32 = (uint32_t*)output;

    const int x_ratio = (int)((sourceWidth << 16) / targetWidth);
    const int y_ratio = (int)((sourceHeight << 16) / targetHeight);

    int x_ratio_with_color = x_ratio;

    for (int y = 0; y < targetHeight; y++)
    {
        int y2_xsource = ((y * y_ratio) >> 16) * sourceWidth;
        int i_xdest = y * targetWidth;

        int source_x_offset = 0;
        int startingOffset = y2_xsource;
        const uint32_t * inputLine = input32 + startingOffset;
        for (int x = 0; x < targetWidth; x++)
        {
            i_xdest += 1;
            source_x_offset += x_ratio_with_color;
            int sourceOffset = source_x_offset >> 16;

            output[i_xdest] = inputLine[sourceOffset];
        }
    }
}

针对速度优化最近邻居大小调整算法

4 个答案: