Question

下面是用于可视化需要完成的代码。我正在寻找可以更快地完成的解决方案。其中之一是使用位操作（https://stackoverflow.com/a/55945544/4791668）对数组求和。我想知道是否有任何方法可以按照链接中所述的方式进行操作，并同时找到平均值。

    var random = new Random();
    byte[] bytes = new byte[20_000_000]; 
    byte[] bytes2 = new byte[20_000_000];

    for (int i = 0; i < bytes.Length; i++)
    {
        bytes[i] = (byte)random.Next(255);
    }

    for (int i = 0; i < bytes.Length; i++)
    {
        bytes2[i] = (byte)random.Next(255);
    }

    //how to optimize the part below
    for (int i = 0; i < bytes.Length; i++)
    {
        bytes[i] = (byte)((bytes[i] + bytes2[i]) / 2);
    }

////////////需要改进的解决方案。它没有做平均的部分。

    var random = new Random();
    byte[] bytes = new byte[20_000_000]; 
    byte[] bytes2 = new byte[20_000_000];

    int Len = bytes.Length >> 3; // >>3 is the same as / 8

    ulong MASK =    0x8080808080808080;
    ulong MASKINV = 0x7f7f7f7f7f7f7f7f;

    //Sanity check
    if((bytes.Length & 7) != 0) throw new Exception("bytes.Length is not a                 multiple of 8");
    if((bytes2.Length & 7) != 0) throw new Exception("bytes2.Length is not a multiple of 8");

    unsafe
    {
//Add 8 bytes at a time, taking into account overflow between bytes
       fixed (byte* pbBytes = &bytes[0])
       fixed (byte* pbBytes2 = &bytes2[0])
       {
          ulong* pBytes = (ulong*)pbBytes;
          ulong* pBytes2 = (ulong*)pbBytes2;
          for (int i = 0; i < Len; i++)
          {
            pBytes[i] = ((pBytes2[i] & MASKINV) + (pBytes[i] & MASKINV)) ^ ((pBytes[i] ^ pBytes2[i]) & MASK);
          } 
       }        
    }

Answer 1

使用位操作，您可以并行计算字节的平均值：

ulong NOLOW = 0xfefefefefefefefe;
unsafe {
    //Add 8 bytes at a time, taking into account overflow between bytes
    fixed (byte* pbBytes = &bytes[0])
    fixed (byte* pbBytes2 = &bytes2[0])
    fixed (byte* pbAns2 = &ans2[0]) {
        ulong* pBytes = (ulong*)pbBytes;
        ulong* pBytes2 = (ulong*)pbBytes2;
        ulong* pAns2 = (ulong*)pbAns2;
        for (int i = 0; i < Len; i++) {
            pAns2[i] = (pBytes2[i] & pBytes[i]) + (((pBytes[i] ^ pBytes2[i]) & NOLOW) >> 1);
        }
    }
}

我修改了代码以存储在单独的ans字节数组中，因为我需要源数组来比较这两种方法。显然，如果需要，您可以存储回原始的bytes[]。

这基于以下公式：x+y == (x&y)+(x|y) == (x&y)*2 + (x^y) == (x&y)<<1 + (x^y)，这意味着您可以计算(x+y)/2 == (x&y)+((x^y) >> 1)。既然我们知道我们一次要计算8个字节，那么我们可以屏蔽每个字节中的低位，因此当我们将所有8个字节移位时，我们将每个字节的高位移位0位。

在我的PC上，运行速度比（字节）总和快2到3倍（对于更长的阵列，趋势是2倍）。

有效地对N个字节数组求和

1 个答案: