Question

我正在编写一个函数来处理传入的32位缓冲区，表示在与相应的存储32位缓冲区进行比较时更改数据。改变位的位置表示需要处理的数字（即，值8表示位3），以及改变是否为0-> 1或1-> 0。这是目前的实施，请帮我改进吧！请注意，这不是实际代码，它已被简化为与上下文无关。

uint32_t temp = oldBuffer ^ newBuffer;
uint32_t number = 0;
while (temp != 0)
{
    if (temp & 0x1)
    {
        uint32_t bitValue = 0;
        if ((newBuffer& (1 << number)) != 0) bitValue = 1;
        processNumber(number, bitValue);
    }
    number++;
    temp = temp >> 1;
}
oldBuffer = newBuffer;

现在它有效，但我不喜欢它必须通过检查第1位并转移整个事物来检查每一位。如果有保证只有1位设置，这不会太难以弄清楚，但事实并非如此。

编辑：对于Neil，我想我希望找到一种方法来在恒定时间内获取XOR之后的位的位置，而不是一直移动缓冲区并逐个检查位。

Answer 1

uint32_t temp=oldBuffer^newBuffer, ntemp=newBuffer;
for(int b=0;temp;++b,temp>>=1,ntemp>>=1)
    if(temp&1) processNumber(b,ntemp&1);

Answer 2

你可以通过使用一个或多个'bit twiddling'hacks来获得一些性能

http://graphics.stanford.edu/~seander/bithacks.html

具体来说，算法'找到整数对数基2'（最高位集的位置）。这样可以让您确定哪些位的设置比在每个位上循环更直接。

如果必须按从低到高的顺序处理这些位，可以稍微修改一下Kernighan的方法来计算位数：

/* note: untested code */
while (temp) {

    uint32_t bit = temp & (~(temp & (temp - 1)); /* isolate lowest bit */

    temp &= ~bit;

    uint32_t bit_number = /* use find log base 2 hack */;

    /* etc... */

}

这应该使while循环完全迭代等于设置位数的次数。原始循环将迭代多次，等于最高设置位的位位置。

然而，如果这会产生任何可衡量的差异，我会感到惊讶，除非这是一个超级关键的代码。

Answer 3

使用标准库怎么样？无需移位，或等等......来测试位是否为真。对位集中位的测试保证是恒定时间。它写得更清洁，更难以理解。

const std::bitset<32> oldbits( oldBuffer );
const std::bitset<32> newbits ( newBuffer );

for( size_t index = 0; index != oldbits.size(); ++index ) {
   if( oldbits[ index ] != newbits[ index ] ) {
       processNumber( index, newbits[ index ] )
   }
}

注意：您也不需要这里的XOR，因为您可以直接访问这些位。但是，可能使用它来保存性能。

Answer 4

 uint32_t temp = oldBuffer ^ newBuffer;
 uint32_t number = 0;
 uint32_t bitmask=1;
 while (temp != 0)
 {
     if (temp & 0x1)
     {
         processNumber(number, ((newBuffer & bitmask) != 0));
     }
     number++;
     temp = temp >> 1;
     bitmask <<=1;
 }
 oldBuffer = newBuffer;

2个超小的变化......

您的代码已经非常高效

Answer 5

这取决于您对（oldBuffer ^ newBuffer）的分布的期望。如果它是完全随机的并且是32位的全范围，那么你平均有16个循环。

一种可能的解决方案是制作一个这样的表

int lookup[255][8] = {
  { -1, -1, -1, -1, -1, -1, -1, -1 }, // 0 has no bits set
  {  0, -1, -1, -1, -1, -1, -1, -1 }, // 1 has only the 0th bit set
  {  1, -1, -1, -1, -1, -1, -1, -1 }, // 2 has only the 1st bit set
  {  0,  1, -1, -1, -1, -1, -1, -1 }, // 3 has the 0th, 1st bit set
  {  2, -1, -1, -1, -1, -1, -1, -1 }, // 4 has only the 2nd bit set
  ...
  {  0,  1,  2,  3,  4,  5,  6,  7 }, // 255 has all bits set
}

有了这个，你必须循环4次（每个字节1个），然后每个位设置1次（平均4次） - 嘿，那是16。

但如果设置位数较低（平均值小于32位的一半），则表查找将会下降。

不幸的是，表查找会在每次使用时添加一个乘法并添加，所以它不一定好。你必须测试它。换句话说，它在恒定时间内找到设置位，但常量可能大于循环。这取决于您期望的设置位数。

Answer 6

你可以像B-Tree一样递归递送：）

go(oldBuffer ^ newBuffer, 16, 0, newBuffer);
...

void go(temp, half, pos, bitValue)
{
    if (half > 1) {
      uint32_t mask = (1 << half) - 1;
      if (temp & mask)    go(temp & mask, half/2, pos, bitValue & mask);
      temp >>= half;
      if (temp & mask)    go(temp & mask, half/2, pos + half, (bitValue >> half) & mask);
    } else {
      if (temp & 1) processNumber(pos, bitValue&1);
      if (temp & 2) processNumber(pos+1, bitValue/2&1);
    }
}

帮助我改进这个C ++位缓冲处理代码

6 个答案: