Question

我正在尝试将rgba缓冲区转换为argb，有没有办法改进下一个算法，或者执行此类操作的其他任何更快的方法？考虑到在argb缓冲区中alpha值不重要，并且应该总是以0xFF结束。

int y, x, pixel;

for (y = 0; y < height; y++)
{
    for (x = 0; x < width; x++)
    {
     pixel = rgbaBuffer[y * width + x];
     argbBuffer[(height - y - 1) * width + x] = (pixel & 0xff00ff00) | ((pixel << 16) & 0x00ff0000) | ((pixel >> 16) & 0xff);
    }
}

Answer 1

我只关注交换功能：

typedef unsigned int Color32;

inline Color32 Color32Reverse(Color32 x)
{

    return
    // Source is in format: 0xAARRGGBB
        ((x & 0xFF000000) >> 24) | //______AA
        ((x & 0x00FF0000) >>  8) | //____RR__
        ((x & 0x0000FF00) <<  8) | //__GG____
        ((x & 0x000000FF) << 24);  //BB______
    // Return value is in format:  0xBBGGRRAA
}

Answer 2

假设代码没有错误（只是效率低下），我猜你想要做的就是每隔一秒（偶数）字节交换（当然是反转缓冲区），不是吗？

所以你可以通过以下方式实现一些优化：

避免转移和屏蔽操作
优化循环，例如节省指数计算

我会按如下方式重写代码：

int y, x;

for (y = 0; y < height; y++)
{
    unsigned char *pRGBA= (unsigned char *)(rgbaBuffer+y*width);
    unsigned char *pARGB= (unsigned char *)(argbBuffer+(height-y-1)*width);
    for (x = 4*(width-1); x>=0; x-=4)
    {
        pARGB[x  ]   = pRGBA[x+2];
        pARGB[x+1]   = pRGBA[x+1];
        pARGB[x+2]   = pRGBA[x  ];
        pARGB[x+3]   = 0xFF;
    }
}

请注意，更复杂的索引计算仅在外部循环中执行。对于每个像素，rgbaBuffer和argbBuffer都有四个acesses，但我认为这可以通过避免按位运算和indixes计算来抵消。另一种方法是（像在你的代码中一样）一次获取/存储一个像素（int），并在本地进行处理（这在内存访问中有效），但除非你有一些有效的方法来交换这两个字节并设置本地alpha（例如，一些内联汇编，以确保所有内容都在寄存器级别执行），它不会真正有用。

Answer 3

你提供的代码非常奇怪，因为它会混淆颜色成分而不是rgba-＆gt; argb，而是rgba-＆gt; rabg。

我已经制作了这个例程的正确和优化版本。

int pixel;
int size = width * height;

for (unsigned int * rgba_ptr = rgbaBuffer, * argb_ptr = argbBuffer + size - 1; argb_ptr >= argbBuffer; rgba_ptr++, argb_ptr--)
{
    // *argb_ptr = *rgba_ptr >> 8 | 0xff000000;  // - this version doesn't change endianess
    *argb_ptr = __builtin_bswap32(*rgba_ptr) >> 8 | 0xff000000;  // This does
}

我做的第一件事是简化你的洗牌表达。很明显，XRGB只是RGBA＆gt;＆gt; 8。此外，我已经删除了每次迭代的数组索引计算，并使用指针作为循环变量。这个版本比我机器上的原版快2倍。

如果此代码适用于x86 CPU，也可以使用SSE进行混洗。

Answer 4

我对这个已经很晚了。但是我在动态生成视频时遇到了完全相同的问题。通过重用缓冲区，我可以只为每一帧设置 R、G、B 值，并且只设置一次 A。

见下面的代码：

byte[] _workingBuffer = null;
byte[] GetProcessedPixelData(SKBitmap bitmap)
{
    ReadOnlySpan<byte> sourceSpan = bitmap.GetPixelSpan();

    if (_workingBuffer == null || _workingBuffer.Length != bitmap.ByteCount)
    {
        // Alloc buffer
        _workingBuffer = new byte[sourceSpan.Length];

        // Set all the alpha
        for (int i = 0; i < sourceSpan.Length; i += 4) _workingBuffer[i] = byte.MaxValue;
    }

    Stopwatch w = Stopwatch.StartNew();
    for (int i = 0; i < sourceSpan.Length; i += 4)
    {
        // A
        // Dont set alpha here. The alpha is already set in the buffer
        //_workingBuffer[i] = byte.MaxValue;
        //_workingBuffer[i] = sourceSpan[i + 3];

        // R
        _workingBuffer[i + 1] = sourceSpan[i];

        // G
        _workingBuffer[i + 2] = sourceSpan[i + 1];

        // B
        _workingBuffer[i + 3] = sourceSpan[i + 2];
    }
    Debug.Print("Copied " + sourceSpan.Length + " in " + w.Elapsed.TotalMilliseconds);

    return _workingBuffer;
}

这让我在 iPhone 上使用了大约 15 毫秒的 (1920 * 1080 * 4) 缓冲区，大约 8mb。

这对我来说还不够。我的最终解决方案是进行偏移内存复制（C# 中的 Buffer.BlockCopy），因为 alpha 并不重要。

    byte[] _workingBuffer = null;
    byte[] GetProcessedPixelData(SKBitmap bitmap)
    {
        ReadOnlySpan<byte> sourceSpan = bitmap.GetPixelSpan();
        byte[] sourceArray = sourceSpan.ToArray();

        if (_workingBuffer == null || _workingBuffer.Length != bitmap.ByteCount)
        {
            // Alloc buffer
            _workingBuffer = new byte[sourceSpan.Length];

            // Set first byte. This is the alpha component of the first pixel
            _workingBuffer[0] = byte.MaxValue;
        }

        // Converts RGBA to ARGB in ~2 ms instead of ~15 ms
        // 
        // Copies the whole buffer with a offset of 1
        //                                      R   G   B   A   R   G   B   A   R   G   B   A
        // Originally the source buffer has:    R1, G1, B1, A1, R2, G2, B2, A2, R3, G3, B3, A3
        //                                   A  R   G   B   A   R   G   B   A   R   G   B   A
        // After the copy it looks like:     0, R1, G1, B1, A1, R2, G2, B2, A2, R3, G3, B3, A3
        // So essentially we get the wrong alpha for every pixel. But all alphas should be 255 anyways.
        // The first byte is set in the alloc
        Buffer.BlockCopy(sourceArray, 0, _workingBuffer, 1, sourceSpan.Length - 1);

        // Below is an inefficient method of converting RGBA to ARGB. Takes ~15 ms on iPhone 12 Pro Max for a 8mb buffer (1920 * 1080 * 4 bytes)
        /*
        for (int i = 0; i < sourceSpan.Length; i += 4)
        {
            // A
            // Dont set alpha here. The alpha is already set in the buffer
            //_workingBuffer[i] = byte.MaxValue;
            //_workingBuffer[i] = sourceSpan[i + 3];

            byte sR = sourceSpan[i];
            byte sG = sourceSpan[i + 1];
            byte sB = sourceSpan[i + 2];

            if (sR == 0 && sG == byte.MaxValue && sB == 0)
                continue;

            // R
            _workingBuffer[i + 1] = sR;

            // G
            _workingBuffer[i + 2] = sG;

            // B
            _workingBuffer[i + 3] = sB;
        }
        */

        return _workingBuffer;
    }

代码注释了它是如何工作的。在我的同一部 iPhone 上，大约需要 2 毫秒，这对于我的用例来说已经足够了。

Answer 5

使用程序集，以下内容适用于Intel。

此示例交换红色和蓝色。

void* b = pixels;
UINT len = textureWidth*textureHeight;

__asm                                                       
{
    mov ecx, len                // Set loop counter to pixels memory block size
    mov ebx, b                  // Set ebx to pixels pointer
    label:                      
        mov al,[ebx+0]          // Load Red to al
        mov ah,[ebx+2]          // Load Blue to ah
        mov [ebx+0],ah          // Swap Red
        mov [ebx+2],al          // Swap Blue
        add ebx,4               // Move by 4 bytes to next pixel
        dec ecx                 // Decrease loop counter
        jnz label               // If not zero jump to label
}

快速将RGBA转换为ARGB

5 个答案: