Question

我正在编写一个实时视频成像应用程序，需要加快这种方法。它目前需要大约10毫秒才能执行，我希望将其降低到2-3毫秒。

我已经尝试了Array.Copy和Buffer.BlockCopy，它们都需要大约30毫秒，这比手动副本长3倍。

一种想法是以某种方式将4个字节复制为整数，然后将它们粘贴为整数，从而将4行代码减少为一行代码。但是，我不知道该怎么做。

另一个想法是以某种方式使用指针和不安全的代码来做到这一点，但我不知道该怎么做。

非常感谢所有帮助。谢谢！

编辑：数组大小为：inputBuffer [327680]，lookupTable [16384]，outputBuffer [1310720]

public byte[] ApplyLookupTableToBuffer(byte[] lookupTable, ushort[] inputBuffer)
{
    System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
    sw.Start();

    // Precalculate and initialize the variables
    int lookupTableLength = lookupTable.Length;
    int bufferLength = inputBuffer.Length;
    byte[] outputBuffer = new byte[bufferLength * 4];
    int outIndex = 0;
    int curPixelValue = 0;

    // For each pixel in the input buffer...
    for (int curPixel = 0; curPixel < bufferLength; curPixel++)
    {
        outIndex = curPixel * 4;                    // Calculate the corresponding index in the output buffer
        curPixelValue = inputBuffer[curPixel] * 4;  // Retrieve the pixel value and multiply by 4 since the lookup table has 4 values (blue/green/red/alpha) for each pixel value

        // If the multiplied pixel value falls within the lookup table...
        if ((curPixelValue + 3) < lookupTableLength)
        {
            // Copy the lookup table value associated with the value of the current input buffer location to the output buffer
            outputBuffer[outIndex + 0] = lookupTable[curPixelValue + 0];
            outputBuffer[outIndex + 1] = lookupTable[curPixelValue + 1];
            outputBuffer[outIndex + 2] = lookupTable[curPixelValue + 2];
            outputBuffer[outIndex + 3] = lookupTable[curPixelValue + 3];

            //System.Buffer.BlockCopy(lookupTable, curPixelValue, outputBuffer, outIndex, 4);   // Takes 2-10x longer than just copying the values manually
            //Array.Copy(lookupTable, curPixelValue, outputBuffer, outIndex, 4);                // Takes 2-10x longer than just copying the values manually
        }
    }

    Debug.WriteLine("ApplyLookupTableToBuffer(ms): " + sw.Elapsed.TotalMilliseconds.ToString("N2"));
    return outputBuffer;
}

编辑：我已经更新了保存相同变量名称的方法，以便其他人可以根据下面的HABJAN解决方案查看代码的翻译方式。

    public byte[] ApplyLookupTableToBufferV2(byte[] lookupTable, ushort[] inputBuffer)
    {
        System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
        sw.Start();

        // Precalculate and initialize the variables
        int lookupTableLength = lookupTable.Length;
        int bufferLength = inputBuffer.Length;
        byte[] outputBuffer = new byte[bufferLength * 4];
        //int outIndex = 0;
        int curPixelValue = 0;

        unsafe
        {
            fixed (byte* pointerToOutputBuffer = &outputBuffer[0])
            fixed (byte* pointerToLookupTable = &lookupTable[0])
            {
                // Cast to integer pointers since groups of 4 bytes get copied at once
                uint* lookupTablePointer = (uint*)pointerToLookupTable;
                uint* outputBufferPointer = (uint*)pointerToOutputBuffer;

                // For each pixel in the input buffer...
                for (int curPixel = 0; curPixel < bufferLength; curPixel++)
                {
                    // No need to multiply by 4 on the following 2 lines since the pointers are for integers, not bytes
                    // outIndex = curPixel;  // This line is commented since we can use curPixel instead of outIndex
                    curPixelValue = inputBuffer[curPixel];  // Retrieve the pixel value 

                    if ((curPixelValue + 3) < lookupTableLength)
                    {
                        outputBufferPointer[curPixel] = lookupTablePointer[curPixelValue];
                    }
                }
            }
        }

        Debug.WriteLine("2 ApplyLookupTableToBuffer(ms): " + sw.Elapsed.TotalMilliseconds.ToString("N2"));
        return outputBuffer;
    }

Answer 1

我做了一些测试，我设法通过使用RtlMoveMemory API将代码变为不安全来实现最大速度。我发现Buffer.BlockCopy和Array.Copy比直接使用RtlMoveMemory慢得多。

所以，最后你会得到这样的结论：

fixed(byte* ptrOutput= &outputBufferBuffer[0])
{
    MoveMemory(ptrOutput, ptrInput, 4);
}

[DllImport("Kernel32.dll", EntryPoint = "RtlMoveMemory", SetLastError = false)]
private static unsafe extern void MoveMemory(void* dest, void* src, int size);

修改

好的，现在一旦我弄清楚你的逻辑，当我做了一些测试时，我设法加速你的方法几乎高达50％。因为你需要复制一个小数据块（总是4个字节），是的，你是对的，RtlMoveMemory在这里不会有帮助，最好将数据复制为整数。这是我提出的最终解决方案：

public static byte[] ApplyLookupTableToBufferV2(byte[] lookupTable, ushort[] inputBuffer) { int lookupTableLength = lookupTable.Length; int bufferLength = inputBuffer.Length; byte[] outputBuffer = new byte[bufferLength * 4]; int outIndex = 0, curPixelValue = 0; unsafe { fixed (byte* ptrOutput = &outputBuffer[0]) fixed (byte* ptrLookup = &lookupTable[0]) { uint* lkp = (uint*)ptrLookup; uint* opt = (uint*)ptrOutput; for (int index = 0; index < bufferLength; index++) { outIndex = index; curPixelValue = inputBuffer[index]; if ((curPixelValue + 3) < lookupTableLength) { opt[outIndex] = lkp[curPixelValue]; } } } } return outputBuffer; }

我将您的方法重命名为 ApplyLookupTableToBufferV1 。

这是我的测试结果：

int tc1 = Environment.TickCount; for (int i = 0; i < 200; i++) { byte[] a = ApplyLookupTableToBufferV1(lt, ib); } tc1 = Environment.TickCount - tc1; Console.WriteLine("V1: " + tc1.ToString() + "ms");

结果 - V1：998毫秒

int tc2 = Environment.TickCount; for (int i = 0; i < 200; i++) { byte[] a = ApplyLookupTableToBufferV2(lt, ib); } tc2 = Environment.TickCount - tc2; Console.WriteLine("V2: " + tc2.ToString() + "ms");

结果 - V2：473 ms

如何在C＃中优化数组的复制块？

1 个答案:

修改