Question

我正在处理由摄像机拍摄的1000万像素图像。

目的是在矩阵（二维阵列）中注册每个像素的灰度值。

我第一次使用GetPixel但是花了25秒才完成。现在我使用Lockbits但是它需要10秒，如果我不将结果保存在文本文件中，则需要3秒。

我的导师说他们不需要注册结果，但3秒仍然太慢。我在程序中做错了什么，或者我的应用程序有比Lockbits更快的东西吗？

这是我的代码：

public void ExtractMatrix()
{
    Bitmap bmpPicture = new Bitmap(nameNumber + ".bmp");

    int[,] GRAY = new int[3840, 2748]; //Matrix with "grayscales" in INTeger values

    unsafe
    {
        //create an empty bitmap the same size as original
        Bitmap bmp = new Bitmap(bmpPicture.Width, bmpPicture.Height);

        //lock the original bitmap in memory
        BitmapData originalData = bmpPicture.LockBits(
           new Rectangle(0, 0, bmpPicture.Width, bmpPicture.Height),
           ImageLockMode.ReadOnly, PixelFormat.Format24bppRgb);

        //lock the new bitmap in memory
        BitmapData newData = bmp.LockBits(
           new Rectangle(0, 0, bmpPicture.Width, bmpPicture.Height),
           ImageLockMode.WriteOnly, PixelFormat.Format24bppRgb);

        //set the number of bytes per pixel
        // here is set to 3 because I use an Image with 24bpp
        int pixelSize = 3;

        for (int y = 0; y < bmpPicture.Height; y++)
        {
            //get the data from the original image
            byte* oRow = (byte*)originalData.Scan0 + (y * originalData.Stride);

            //get the data from the new image
            byte* nRow = (byte*)newData.Scan0 + (y * newData.Stride);

            for (int x = 0; x < bmpPicture.Width; x++)
            {
                //create the grayscale version
                byte grayScale =
                   (byte)((oRow[x * pixelSize] * .114) + //B
                   (oRow[x * pixelSize + 1] * .587) +  //G
                   (oRow[x * pixelSize + 2] * .299)); //R

                //set the new image's pixel to the grayscale version
                //   nRow[x * pixelSize] = grayScale; //B
                //   nRow[x * pixelSize + 1] = grayScale; //G
                //   nRow[x * pixelSize + 2] = grayScale; //R

                GRAY[x, y] = (int)grayScale;
            }
        }

Answer 1

以下是一些可能有所帮助的优化：

使用锯齿状数组（[][]）;在.NET中，accessing them is faster than multidimensional;
将在循环内使用的缓存属性。虽然this answer表明JIT会优化它，但我们不知道内部发生了什么;
Multiplication is (generally) slower than addition;

正如其他人所说，float比double，which applies to older processors（~10 +年）快。这里唯一的优点是你将它们用作常量，因此消耗更少的内存（特别是因为多次迭代）;

Bitmap bmpPicture = new Bitmap(nameNumber + ".bmp");

// jagged instead of multidimensional 
int[][] GRAY = new int[3840][]; //Matrix with "grayscales" in INTeger values
for (int i = 0, icnt = GRAY.Length; i < icnt; i++)
    GRAY[i] = new int[2748];

unsafe
{
    //create an empty bitmap the same size as original
    Bitmap bmp = new Bitmap(bmpPicture.Width, bmpPicture.Height);

    //lock the original bitmap in memory
    BitmapData originalData = bmpPicture.LockBits(
       new Rectangle(0, 0, bmpPicture.Width, bmpPicture.Height),
       ImageLockMode.ReadOnly, PixelFormat.Format24bppRgb);

    //lock the new bitmap in memory
    BitmapData newData = bmp.LockBits(
       new Rectangle(0, 0, bmpPicture.Width, bmpPicture.Height),
       ImageLockMode.WriteOnly, PixelFormat.Format24bppRgb);

    //set the number of bytes per pixel
    // here is set to 3 because I use an Image with 24bpp
    const int pixelSize = 3; // const because it doesn't change
    // store Scan0 value for reuse...we don't know if BitmapData caches it internally, or recalculated it every time, or whatnot
    int originalScan0 = originalData.Scan0;
    int newScan0 = newData.Scan0;
    // incrementing variables
    int originalStride = originalData.Stride;
    int newStride = newData.Stride;
    // store certain properties, because accessing a variable is normally faster than a property (and we don't really know if the property recalculated anything internally)
    int bmpwidth = bmpPicture.Width;
    int bmpheight = bmpPicture.Height;

    for (int y = 0; y < bmpheight; y++)
    {
        //get the data from the original image
        byte* oRow = (byte*)originalScan0 + originalStride++; // by doing Variable++, you're saying "give me the value, then increment one" (Tip: DON'T add parenthesis around it!)

        //get the data from the new image
        byte* nRow = (byte*)newScan0 + newStride++;

        int pixelPosition = 0;
        for (int x = 0; x < bmpwidth; x++)
        {
            //create the grayscale version
            byte grayScale =
               (byte)((oRow[pixelPosition] * .114f) + //B
               (oRow[pixelPosition + 1] * .587f) +  //G
               (oRow[pixelPosition + 2] * .299f)); //R

            //set the new image's pixel to the grayscale version
            //   nRow[pixelPosition] = grayScale; //B
            //   nRow[pixelPosition + 1] = grayScale; //G
            //   nRow[pixelPosition + 2] = grayScale; //R

            GRAY[x][y] = (int)grayScale;

            pixelPosition += pixelSize;
        }
    }

Answer 2

您的代码正在从行主要表示转换为列主要表示。在位图中，像素（x，y）后跟（x + 1，y）在内存中;但在GRAY数组中，像素（x，y）后跟（x，y + 1）。

这会导致写入时内存访问效率低下，因为每次写入都会触及不同的缓存行;如果图像足够大，你最终会破坏CPU缓存。如果您的图像大小是2的幂（参见Why is transposing a matrix of 512x512 much slower than transposing a matrix of 513x513?），那么这尤其糟糕。

尽可能以行主要顺序存储数组，以避免低效的内存访问（将GRAY[x,y]替换为GRAY[y,x]）。

如果您确实需要按列主要顺序，请查看更多适合缓存的矩阵转置算法（例如A Cache Efficient Matrix Transpose Program?）

Answer 3

您的代码可能不是最佳的，但快速浏览似乎表明即使这个版本应该在几分之一秒内运行。这表明存在一些其他问题：

你是：

在发布模式下编译？调试模式关闭各种优化
附带调试器运行？如果您使用F5从Visual Studio运行（使用默认的C＃keyshortcuts），则会附加调试器。这可能会大大减慢您的程序速度，特别是如果您启用了任何断点或intellitrace。
在某些有限的设备上运行？听起来你在PC上运行，但如果你不是，那么设备特定的限制可能是相关的。
I / O有限？虽然您谈论的是摄像机，但您的代码建议您处理文件系统。任何文件系统交互都可能成为瓶颈，特别是一旦网络磁盘，病毒扫描程序，物理盘片和碎片发挥作用。一个10 mp的图像是30MB（如果是没有alpha通道的未压缩RGB），读取/写入可能需要3秒钟，具体取决于文件系统的细节。

Answer 4

我不确定为什么内部for循环的第二部分被注释掉了，但是如果你不需要它，你就会做一些不必要的投射。删除它可能会提高您的性能。

此外，正如leppie建议的那样，您可以使用单精度浮点数：

        for (int x = 0; x < bmpPicture.Width; x++)
        {
            //create the grayscale version
           GRAY[x, y] =
               (int)((oRow[x * pixelSize] * .114f) + //B
               (oRow[x * pixelSize + 1] * .587f) +  //G
               (oRow[x * pixelSize + 2] * .299f)); //R

        }

Answer 5

您可以尝试避免乘法和增量设置带有x * pixelSize起始值的指针并将代码更改为：

for (int x = 0; x < bmpPicture.Width; x++)
            {    
               int *p = x * pixelSize;

                GRAY[x, y]=
                   (int)((oRow[*p] * .114) + //B
                   (oRow[*p++] * .587) +  //G
                   (oRow[*p++] * .299)); //R
             }

这会加快您的代码速度，但我不确定它会明显加快。

注意：这只会在迭代值类型数组时加速代码，如果oRow更改为其他类型，则无效。

Answer 6

这是一个只使用整数运算的替代变换，它略有不同（由于因子的四舍五入），但没有用肉眼注意到的任何东西:(未经测试）

byte grayScale = (byte)((
      (oRow[pixelPosition] * 29) +
      (oRow[pixelPosition + 1] * 151) +
      (oRow[pixelPosition + 2] * 105)) >> 8);

比例因子大约是旧的，乘以256，结尾的偏移除以256。

Answer 7

使用1D array代替2D array，可以实现

大量优化。

所有其他人都不会给你一个高速加速...

LockBits似乎对我的需求来说太慢 - 替代方案？

7 个答案: