Question

所以这里有一些背景。我正在开发一款名为ShiftOS的游戏，该游戏发生在一个操作系统中，该操作系统起初是80年代80年代操作系统的运行系统，功能不多。

我试图在用户必须以二进制（双色）颜色深度开始的位置添加机制，并且只能在屏幕上显示黑白。然后他们必须将颜色深度从1位升级到2位到4位一直升级到24位。它是一个非常整洁的机械师，但在实践中它看起来非常困难。

当然，这个时代的旧系统至少可以让图像看起来不错，但当然它们受到工程师给出的调色板的限制，所以他们不得不抖动图像以便以一种方式排列像素在所有现实中它看起来像图像使用更多颜色它只能使用2。

所以我查了一些好的抖动算法并开始学习Floyd-Steinberg算法，并很快将其移植到C#和System.Drawing。

这是我使用的代码。

var bmp = new Bitmap(source.Width, source.Height);
var sourceBmp = (Bitmap)source;
int error = 0;
for (int y = 0; y < bmp.Height; y++)
{
    for (int x = 0; x < bmp.Width; x++)
    {
        Color c = sourceBmp.GetPixel(x, y);
        int gray = ((c.R + c.G + c.B) / 3);
        if (gray >= 127)
        {
            error = gray - 255;
            bmp.SetPixel(x, y, Color.White);
        }
        else
        {
            error = gray;
            bmp.SetPixel(x, y, Color.Black);
        }
        /*
         * Pixel error diffusion map: Floyd-Steinberg. Thanks to Wikipedia.
         * 
         *  pixel[x + 1][y    ] := pixel[x + 1][y    ] + quant_error * 7 / 16
         *  pixel[x - 1][y + 1] := pixel[x - 1][y + 1] + quant_error * 3 / 16
         *  pixel[x    ][y + 1] := pixel[x    ][y + 1] + quant_error * 5 / 16
         *  pixel[x + 1][y + 1] := pixel[x + 1][y + 1] + quant_error * 1 / 16
         */

        if(x - 1 >= 0 && y + 1 != bmp.Height)
        {
            var bottomRightColor = sourceBmp.GetPixel(x - 1, y + 1);
            int bottomRightGray = ((bottomRightColor.R + bottomRightColor.G + bottomRightColor.B) / 3) + ((error * 3) / 16);
            if (bottomRightGray < 0)
                bottomRightGray = 0;
            if (bottomRightGray > 255)
                bottomRightGray = 255;
            sourceBmp.SetPixel(x - 1, y + 1, Color.FromArgb(bottomRightGray, bottomRightGray, bottomRightGray));
        }
        if (x + 1 != sourceBmp.Width)
        {
            var rightColor = sourceBmp.GetPixel(x + 1, y);
            int rightGray = ((rightColor.R + rightColor.G + rightColor.B) / 3) + ((error * 7) / 16);
            if (rightGray < 0)
                rightGray = 0;
            if (rightGray > 255)
                rightGray = 255;
            sourceBmp.SetPixel(x + 1, y, Color.FromArgb(rightGray, rightGray, rightGray));
        }
        if (x + 1 != sourceBmp.Width && y + 1 != sourceBmp.Height)
        {
            var bottomRightColor = sourceBmp.GetPixel(x + 1, y + 1);
            int bottomRightGray = ((bottomRightColor.R + bottomRightColor.G + bottomRightColor.B) / 3) + ((error) / 16);
            if (bottomRightGray < 0)
                bottomRightGray = 0;
            if (bottomRightGray > 255)
                bottomRightGray = 255;
            sourceBmp.SetPixel(x + 1, y + 1, Color.FromArgb(bottomRightGray, bottomRightGray, bottomRightGray));
        }
        if (y + 1 != sourceBmp.Height)
        {
            var bottomColor = sourceBmp.GetPixel(x, y + 1);
            int bottomGray = ((bottomColor.R + bottomColor.G + bottomColor.B) / 3) + ((error * 5) / 16);
            if (bottomGray < 0)
                bottomGray = 0;
            if (bottomGray > 255)
                bottomGray = 255;
            sourceBmp.SetPixel(x, y + 1, Color.FromArgb(bottomGray, bottomGray, bottomGray));
        }
    }
}

请注意source是一个Image，它通过参数传递给函数。

此代码运行良好，但问题是，抖动发生在一个单独的线程上，以最大限度地减少游戏中的减速/滞后，并且在抖动发生时，操作系统的常规24位颜色/图像显示。如果抖动花了这么长时间，那就没问题了。

但是我注意到这个代码中的算法非常慢，并且根据图像的大小，抖动过程可能需要花费超过一分钟的时间！

我已经应用了我能想到的所有优化 - 例如在游戏线程的单独线程中运行事物并调用线程完成时给予函数的Action但这只会削弱一点点时间，如果有的话。

因此，我想知道是否有任何进一步的优化可以使其更快地运行，如果可能的话，总共几秒钟。我还要注意，当抖动操作正在发生时，我有明显的系统滞后 - 鼠标有时甚至会抖动和跳跃。对于那些必须拥有60FPS PC主赛人的人来说并不酷。

Answer 1

首先，我想到的是处理Bitmap，因为它将是数组。默认情况下它不是一个选项，因为没有接口可以做到这一点，但你可以通过一些黑客实现这一点。快速搜索跟着我this answer。因此，您必须将方法设置为unsafe，使用LockBits获取像素值，并使用指针数学访问它们（请参阅完整代码的原始答案）：

System.Drawing.Imaging.BitmapData bmpData =
    bmp.LockBits(rect, System.Drawing.Imaging.ImageLockMode.ReadWrite,
    bmp.PixelFormat);
var pt = (byte*)bmpData.Scan0;
// for loop
var row = pt + (y * bmpData.Stride);
var pixel = row + x * bpp; // bpp is a number of dimensions for the bitmap

pixel将是一个数组，其中包含有关byte值中编码的颜色的信息。正如您已经看到的那样，GetPixel和SetPixel很慢，因为它们实际上是调用LockBits来确保操作。 Array将帮助您删除读取操作，但是，“SetPixel”仍然可能是一个瓶颈，因为您可能需要尽快更新位图。如果你可以最后更新它，那么就这样做。

第二个想法是创建一些Task队列，它将逐步更新你的数组。正如我所看到的，您从一个角度更新图像，因此，您可以设置更新的并行版本。也许你可以通过版本化创建一个不可变的当前状态数组，所以最后你只需要总结新版本的bmp。

Answer 2

@VMAtm 的回答可能是最重要的。

            if (bottomRightGray < 0)
                bottomRightGray = 0;
            if (bottomRightGray > 255)
                bottomRightGray = 255;

可能被重构为

bottomRightGray = Clamp(bottomRightGray, 0, 255);

如果使用一些 ASM 魔法来实现，则可能会提高性能。

((error * X) / 16)

可以在程序中为四个 X 中的每一个预先计算一次，因为误差只能是 0..255，形成一个 256*4 值的表格。这可能会也可能不会提高速度。

Answer 3

您必须获得一个像素的缓冲区，而不是使用 Get/Set 函数
您必须避免在循环内进行计算。通过预先计算可能的情况来减少它们
完成后，将像素放回图像
您可以使用有序抖动，因为它使用更少的计算并且比 Floyd-Steinberg 快得多。它可以产生良好的质量，并在显示器颜色较少的时候使用。

此代码是用 C 编写的，但您可以轻松地将其重写为 C#。

#define f7_16   112
#define f5_16    80
#define f3_16    48
#define f1_16    16

//  Black-white Floyd-Steinberg dither
void    makeDitherFS( BYTE* pixels, int width, int height ) noexcept
{
    const int   size    = width * height;

    int*    error   = (int*)malloc( size * sizeof(int) );

    //  Clear the errors buffer.
    memset( error, 0, size * sizeof(int) );

    //~~~~~~~~

    int i   = 0;

    for( int y = 0; y < height; y++ )
    {
        BYTE*   prow   = pixels + ( y * width * 3 );

        for( int x = 0; x < width; x++,i++ )
        {
            const int   blue    = prow[x * 3 + 0];
            const int   green   = prow[x * 3 + 1];
            const int   red     = prow[x * 3 + 2];

            //  Get the pixel gray value.
            int newVal  = (red+green+blue)/3 + (error[i] >> 8); //  PixelGray + error correction

            int newc    = (newVal < 128 ? 0 : 255);
            prow[x * 3 + 0] = newc; //  blue
            prow[x * 3 + 1] = newc; //  green
            prow[x * 3 + 2] = newc; //  red

            //  Correction - the new error
            const int   cerror  = newVal - newc;

            int idx = i+1;
            if( x+1 < width )
                error[idx] += (cerror * f7_16);

            idx += width - 2;
            if( x-1 > 0 && y+1 < height )
                error[idx] += (cerror * f3_16);

            idx++;
            if( y+1 < height )
                error[idx] += (cerror * f5_16);

            idx++;
            if( x+1 < width && y+1 < height )
                error[idx] += (cerror * f1_16);
        }
    }

    free( error );
}

有关更多抖动算法，请参阅here

我的抖动算法非常慢

3 个答案: