Question

这是关于堆栈溢出的第一个问题，到目前为止，我立即找到了所有问题的答案。非常感谢！通常我主要进入PLC编程，我对PC世界的了解相当有限，这是我第一次使用C＃。

因此，我碰巧尝试将两个位图中的两个像素区域交叉关联，根据这篇论文： http://users.ox.ac.uk/~atdgroup/publications/Rankov,%20V.,%20Proceedings%20of%20Spie,%20Vol.%205701,2005.pdf

[编辑] 目标是找到匹配的确切位置，以便执行两个图像的拼接。我还拿出了一些注释代码来使概述更好（我将为移动平均部分打开另一个问题）。 [/编辑]

我的问题是移动平均线和一般性能调整的正确实现，我希望你们可以帮助我。

位图在所有方向都有固定的重叠，我知道（10％），所以我可以保持搜索区域（在下面的源代码中称为复合区域）相当小，但看起来不够小。我还假设它们具有相同的大小和像素格式。但是，我的算法的性能并不能满足我的要求。我有一种感觉（主要是因为我缺乏“深刻的”知识和经验），还有很大的改进空间。

我发现主要表现为“食客”如下（见下面的源代码）：

以单独的方法计算像素值（主要是为了可读性而引入，快速丢弃）
四个嵌套for循环

以下是两个950px * 950px，24RGB位图的发布时间（Core Duo 2.4GHz，4GB）。搜索区域（合成图像区域）为70px * 800px，样本区域为8px * 400px。

单独的平均功能：5519ms
内联平均功能：5350ms（仅限？）
[编辑]由Yaur建议的变化：700ms！[/ EDIT]

通常，使用较小的样本和搜索区域（4x40和30x100）可以提供非常快的时间，范围从几毫秒到几千毫秒。不幸的是，为了安全地找到匹配，我必须使用大的区域。在进入子采样等之前，我想确定我当前的算法并不完全脱离世界。

您可以想到任何调整/技巧或一般改进吗？每一个提示都会很高兴。

[编辑] 相关方法（大幅改进）：

private unsafe void CrossCorrelate(ref float CCCoefficient, ref Point SampleMatchLocation)
{
    float res = 0;
    float tmpRes = 0;

    // get bit data of sample area
    BitmapData bmdSample = m_bmpSampleRaw.LockBits(m_rectSampleArea, ImageLockMode.ReadOnly, m_bmpSampleRaw.PixelFormat);
    byte* pSample = (byte*)(void*)bmdSample.Scan0;

    // calculate sample average and coefficient 1 (stays same for all iterations)
    int SampleAvg = GetAverage(bmdSample, 0, bmdSample.Width, 0, bmdSample.Height);
    float CN1     = GetCN1(bmdSample, SampleAvg);

    int CompAvg         = 0;
    BitmapData bmdComp  = null;
    Rectangle compRect;

    int SearchHeightLimit   = m_rectSearchArea.Height - m_rectSampleArea.Height;
    int SearchWidthLimit    = m_rectSearchArea.Width - m_rectSampleArea.Width;
    int SearchLocX          = m_rectSearchArea.X;
    int SearchLocY          = m_rectSearchArea.Y;
    int SampleHeight        = m_rectSampleArea.Height;
    int SampleWidth         = m_rectSampleArea.Width;

    int a = 0; // used to calculate power of 2 without using Math.Pow

    // iterate through search area, 
    // in case of equal sizes make sure it iterates at least once
    if (SearchHeightLimit == 0) SearchHeightLimit++;
    if (SearchWidthLimit == 0) SearchWidthLimit++;

    for (int i = 0; i < SearchHeightLimit; i++)
    {
        for (int j = 0; j < SearchWidthLimit; j++)
        {
            int CN0Sum = 0;
            int CN2Sum = 0;

            // create composite pixel data at current search location
            compRect    = new Rectangle(SearchLocX + j, SearchLocY + i, SampleWidth, SampleHeight);
            bmdComp     = m_bmpCompositeRaw.LockBits(compRect, ImageLockMode.ReadOnly, m_bmpCompositeRaw.PixelFormat);
            byte* pComp = (byte*)(void*)bmdComp.Scan0;

            // get average pixel value of sample area
            CompAvg     = GetAverage(bmdComp, 0, bmdComp.Width, 0, bmdComp.Height); 

            for (int y = 0; y < SampleHeight; y++)
            {
                for (int x = 0; x < SampleWidth; x++)
                {
                    int Sidx = (y * bmdSample.Stride) + x * m_iPixelSize;

                    CN0Sum += (pSample[Sidx] + pSample[Sidx + 1] + pSample[Sidx + 2] - SampleAvg) * (pComp[Sidx] + pComp[Sidx + 1] + pComp[Sidx + 2] - CompAvg);
                    a =  pComp[Sidx] + pComp[Sidx + 1] + pComp[Sidx + 2] - CompAvg;
                    CN2Sum += (a * a);
                }
            }

            // release pixeldata of current search area (commented out when using moving average)
            m_bmpCompositeRaw.UnlockBits(bmdComp);

            float CN2 = (float)Math.Sqrt(CN2Sum);
            float CN0 = (float)CN0Sum;
            tmpRes = CN0 / (CN1 * CN2);

            if (tmpRes > res) { res = tmpRes; SampleMatchLocation.X = m_rectSearchArea.X + j; SampleMatchLocation.Y = m_rectSearchArea.Y + i; }

            // exit early if perfect match found
            if (res == 1)
            {
                m_bmpSampleRaw.UnlockBits(bmdSample);
                CCCoefficient = res;
                return;
            }
        }
    }

    m_bmpSampleRaw.UnlockBits(bmdSample);
    CCCoefficient = res;
}

[/编辑] 相关方法（原始）：

float res = 0;
float tmpRes = 0;

// get bit data of sample area
BitmapData bmdSample = m_bmpSampleRaw.LockBits(m_rectSampleArea, ImageLockMode.ReadOnly, m_bmpSampleRaw.PixelFormat);

// calculate sample average and coefficient 1 (stays same for all iterations)
int SampleAvg  = GetAverage(bmdSample, 0, bmdSample.Width, 0, bmdSample.Height);
float CN1      = GetCN1(bmdSample, SampleAvg);

int CompAvg = 0;
BitmapData bmdComp = null;
Rectangle compRect;

unsafe
{
// iterate through search area (I know it skips if areas have same size)
for (int i = 0; i < (m_rectSearchArea.Height - m_rectSampleArea.Height); i++)
{
    for (int j = 0; j < (m_rectSearchArea.Width - m_rectSampleArea.Width); j++)
    {
        int CN0Sum = 0;
        int CN2Sum = 0;

        // create composite pixel data at current search location
        compRect    = new Rectangle(m_rectSearchArea.X + j, m_rectSearchArea.Y + i,   m_rectSampleArea.Width, m_rectSampleArea.Height);
        bmdComp     = m_bmpCompositeRaw.LockBits(compRect, ImageLockMode.ReadOnly, m_bmpCompositeRaw.PixelFormat);

        CompAvg     = GetAverage(bmdComp, 0, bmdComp.Width, 0, bmdComp.Height);

        // the actual correlation loops
        byte* pSample = (byte*)(void*)bmdSample.Scan0;
        byte* pComp   = (byte*)(void*)bmdComp.Scan0;
        for (int y = 0; y < bmdSample.Height; y++)
        {
            for (int x = 0; x < bmdSample.Width; x++)
            {
                int Sidx = (y * bmdSample.Stride) + x * m_iPixelSize; // same stride assumed
                //CN0Sum += (GetPixelValue(pSample, Sidx) - SampleAvg) * (GetPixelValue(pComp, Sidx) - CompAvg);
                CN0Sum += (pSample[Sidx] + pSample[Sidx + 1] + pSample[Sidx + 2] - SampleAvg) * (pComp[Sidx] + pComp[Sidx + 1] + pComp[Sidx + 2] - CompAvg);
                //CN2Sum += (long)Math.Pow((GetPixelValue(pComp, Sidx) - CompAvg), 2);
                CN2Sum += (int)Math.Pow((pComp[Sidx] + pComp[Sidx + 1] + pComp[Sidx + 2] - CompAvg), 2);
            }
        }

        // release pixeldata of current search area
        m_bmpCompositeRaw.UnlockBits(bmdComp);
        tmpRes = (float)CN0Sum / (CN1 * (float)Math.Sqrt(CN2Sum));

        if (tmpRes > res) { res = tmpRes; SampleMatchLocation.X = m_rectSearchArea.X + j; SampleMatchLocation.Y = m_rectSearchArea.Y + i; }

        // exit early if perfect match found
        if (res == 1)
        {
            m_bmpSampleRaw.UnlockBits(bmdSample);
            CCCoefficient = res;
            return;
        }
    }
}
} // unsafe
m_bmpSampleRaw.UnlockBits(bmdSample);
CCCoefficient = res;

用于计算指定区域平均值的方法：

private int GetAverage(BitmapData bmpData, int X1, int X2, int Y1, int Y2)
{
    int total = 0;
    if (X2 == 0 || X2 == X1) X2++;
    if (Y2 == 0 || Y2 == Y1) Y2++;
    unsafe
    {
        byte* p = (byte*)(void*)bmpData.Scan0;
        for (int y = Y1; y < Y2; y++)
        {
            for (int x = X1; x <X2; x++)
            {
                int idx = (y * bmpData.Stride) + x * m_iPixelSize;
                //total += GetPixelValue(p, idx);
                total += p[idx] + p[idx + 1] + p[idx + 2];
            }
        }
    }
    return total / ((X2 - X1) * (Y2 - Y1));
}

计算像素平均值的小函数，快速丢弃这个：

private unsafe Int32 GetPixelValue(byte* pPixel, int idx)
{
    // add up all color values and return
    return pPixel[idx] + pPixel[idx + 1] + pPixel[idx + 2];
}

用于计算等式中永不改变部分的函数

private float GetCN1(BitmapData bmpData, long avg)
{
    double Sum = 0;
    unsafe
    {
        byte* p = (byte*)(void*)bmpData.Scan0;
        for (int y = 0; y < bmpData.Height; y++)
        {
            for (int x = 0; x < bmpData.Width; x++)
            {
                int idx = (y * bmpData.Stride) + x * m_iPixelSize;
                Sum += Math.Pow(p[idx] + p[idx + 1] + p[idx + 2] - avg, 2);
            }
        }
    }
    return (float)Math.Sqrt(Sum);
}

Answer 1

关于性能和“四个嵌套for循环”：

根据定义计算的相关性的计算复杂度是所有图像和图案尺寸O（W * H * PW * PH）的乘积。但是使用具有复杂度O（N ^ 2 * Log（N））的FFT（快速傅里叶变换）的快速方法，其中N是最大维度。

步骤：

零填充（以均衡大小）

图像和模式的FFT;

图像FT与复共轭图案FT的复合每分量乘法;

复杂产品的反向FFT;

正常化

<强>增加：通常，降低矩阵中的所有值是有用的 - 找到平均值并从所有值中减去它以得到“双极”信号。否则最大值为corr。矩阵可以处于初始矩阵的峰值而不是搜索的片段位置

Answer 2

在这方面：

for (int y = 0; y < bmdSample.Height; y++)
{
   for (int x = 0; x < bmdSample.Width; x++)
   {
     int Sidx = (y * bmdSample.Stride) + x * m_iPixelSize; // same stride assumed
     CN0Sum += (pSample[Sidx] + pSample[Sidx + 1] + pSample[Sidx + 2] - SampleAvg) * (pComp[Sidx] + pComp[Sidx + 1] + pComp[Sidx + 2] - CompAvg);
     CN2Sum += (int)Math.Pow((pComp[Sidx] + pComp[Sidx + 1] + pComp[Sidx + 2] - CompAvg), 2);
   }
}

当1表示正常时你正在使用两个循环 - 因为Sidx的范围从0到(bmdSample.Height * bmdSample.Stride) + bmdSample.Width * m_iPixelSize，你可以只有一个循环。没有计算Sidx。这应该是相同的功能：

for (int Sidx = 0; Sidx < (bmdSample.Height * bmdSample.Stride) + bmdSample.Width * m_iPixelSize; Sidx++)
{
   CN0Sum += (pSample[Sidx] + pSample[Sidx + 1] + pSample[Sidx + 2] - SampleAvg) * (pComp[Sidx] + pComp[Sidx + 1] + pComp[Sidx + 2] - CompAvg);
   CN2Sum += (int)Math.Pow((pComp[Sidx] + pComp[Sidx + 1] + pComp[Sidx + 2] - CompAvg), 2);
}

您可以使用GetAverage和GetCN1

执行类似的“技巧”

Answer 3

与许多与图像相关的算法一样，这看起来很容易并行化。

如果这是真的，那么在GPU上运行这个应该可以让你的速度大大增加......如果你愿意走得那么远。

您不能直接在GPU上运行.Net代码，但有libraries会将您的代码转换为GPU可运行的代码。否则，您需要学习shader language

Answer 4

您希望避免进行冗余数学运算并进行冗余函数调用，因此：

for (int i = 0; i < (m_rectSearchArea.Height - m_rectSampleArea.Height); i++)

应该更像：

int height = m_rectSearchArea.Height - m_rectSampleArea.Height;
for (int i = 0; i < height; i++)

修改的

您也可以尝试替换：

int Sidx = (y * bmdSample.Stride) + x * m_iPixelSize; // same stride assumed
//CN0Sum += (GetPixelValue(pSample, Sidx) - SampleAvg) * (GetPixelValue(pComp, Sidx) - CompAvg);
CN0Sum += (pSample[Sidx] + pSample[Sidx + 1] + pSample[Sidx + 2] - SampleAvg) * (pComp[Sidx] + pComp[Sidx + 1] + pComp[Sidx + 2] - CompAvg);
//CN2Sum += (long)Math.Pow((GetPixelValue(pComp, Sidx) - CompAvg), 2);
CN2Sum += (int)Math.Pow((pComp[Sidx] + pComp[Sidx + 1] + pComp[Sidx + 2] - CompAvg), 2);

使用：

var a = pComp[Sidx] + pComp[Sidx + 1] + pComp[Sidx + 2] - CompAvg; // this may already be happening
CN0Sum += (pSample[Sidx] + pSample[Sidx + 1] + pSample[Sidx + 2] - SampleAvg) * a;
CN2Sum += (int)(a * a); // replacing a function call with a multiply will get you a little speed

要记住的一件事是JITer对于这种代码并不是那么好，你可以通过将你的项目的一部分移动到C和P /从你的C＃app调用来获得更大的收益。

C＃位图交叉相关算法性能问题

4 个答案: