Question

问题是：给定一个整数val1找到最高位集（最高位）的位置然后，给定第二个整数val2找到左边的未设置位的连续区域从第一个整数产生的位置。 width指定必须在连续性中找到的最小未设置位数（即width零，其中不包含其中的一些。）

以下是我的解决方案的C代码：

#include <limits.h> /* for CHAR_BIT - number of bits in a char */

typedef unsigned int t;
unsigned const t_bits = sizeof(t) * CHAR_BIT;

_Bool test_fit_within_left_of_msb(  unsigned width,
                                    t val1, /* integer to find MSB of */
                                    t val2, /* integer to find width zero bits in */
                                    unsigned* offset_result)
{
    unsigned offbit = 0; /* 0 starts at high bit */
    unsigned msb = 0;
    t mask;
    t b;

    while(val1 >>= 1) /* find MSB! */
        ++msb;

    while(offbit + width < t_bits - msb)
    {
        /* mask width bits starting at offbit */
        mask = (((t)1 << width) - 1) << (t_bits - width - offbit);
        b = val2 & mask;

        if (!b) /* result! no bits set, we can use this */
        {
            *offset_result = offbit;
            return true;
        }

        if (offbit++) /* this conditional bothers me! */
            b <<= offbit - 1;

        while(b <<= 1)
            offbit++; /* increment offbit past all bits set */
    }
    return false; /* no region of width zero bits found, bummer. */
}

除了找到第一个整数的MSB的更快方法之外，对于offbit的注释测试似乎有点无关紧要，但如果设置了t，则必须跳过类型b的最高位。无条件地将offbit - 1移位val1位将导致无限循环，并且掩码永远不会超过val2的高位中的1（否则，如果高位为零则没有问题）。

我也实现了类似的算法，但在第一个数字的MSB右侧工作，因此它们不需要这个看似额外的条件。

如何摆脱这种额外的条件，甚至是否有更优化的解决方案？

编辑：某些背景并非严格要求。偏移结果是来自高位的位数，而不是可能预期的低位。这将是更宽的算法的一部分，该算法扫描2D阵列以获得零比特的2D区域。这里，为了测试，算法已经简化。 val2表示第一个整数，它没有在2D数组的一行中找到所有位集。由此，2D版本将向下扫描t_bits:32 t_high: 10000000000000000000000000000000 ( 2147483648 ) --------- ----------------------------------- *** fit within left of msb test *** ----------------------------------- val1: 00000000000000000000000010000000 ( 128 ) val2: 01000001000100000000100100001001 ( 1091569929 ) msb: 7 offbit:0 + width: 8 = 8 mask: 11111111000000000000000000000000 ( 4278190080 ) b: 01000001000000000000000000000000 ( 1090519040 ) offbit:8 + width: 8 = 16 mask: 00000000111111110000000000000000 ( 16711680 ) b: 00000000000100000000000000000000 ( 1048576 ) offbit:12 + width: 8 = 20 mask: 00000000000011111111000000000000 ( 1044480 ) b: 00000000000000000000000000000000 ( 0 ) offbit:12 iters:10 ***** found room for width:8 at offset: 12 ***** ----------------------------------- *** fit within left of msb test *** ----------------------------------- val1: 00000000000000000000000001000000 ( 64 ) val2: 00010000000000001000010001000001 ( 268469313 ) msb: 6 offbit:0 + width: 13 = 13 mask: 11111111111110000000000000000000 ( 4294443008 ) b: 00010000000000000000000000000000 ( 268435456 ) offbit:4 + width: 13 = 17 mask: 00001111111111111000000000000000 ( 268402688 ) b: 00000000000000001000000000000000 ( 32768 ) ***** mask: 00001111111111111000000000000000 ( 268402688 ) offbit:17 iters:15 ***** no room found for width:13 *****所代表的内容。

以下是一些显示成功和失败的输出：

val2 & mask

（iters是内部while循环的迭代次数，b是结果{{1}}）

Answer 1

这个http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious有几种方法可以计算无符号整数的无符号整数对数基数2（也就是最高位集的位置）。

我认为这是你想要的一部分。我怀疑，如果我真的知道你想要什么，我可以建议一个更好的计算方法或者用于同样目的的东西。

Answer 2

count_leading_zero_bits通常是编译器将为其提供内联函数的单个指令。否则就把它放在一个循环中。

如果前者是循环，则

count_trailing_zero_bits可以使用count_leading_zero_bits（x＆amp; -x）或debruijn查找。

为简单起见，我假设32位值。

int offset_of_zero_bits_over_msb_of_other_value( unsigned width , unsigned value , unsigned other ) {
  int count = 0;
  int offset = -1;
  int last = 1;
  int lz = count_leading_zero_bits( other );
  other |= ((1<<(32-lz2))-1); // set all bits below msb
  if ( value & ~other ) {
    value |= other; // set all bits below msb of other
    value = ~value; // invert so zeros are ones
    while ( value && count < width ) {
      count += 1; // the widest run of zeros
      last = value; // for counting trailing zeros
      value &= value >> 1; // clear leading ones from groups
    }
    offset = count_trailing_zero_bits( last );
  } else {
    count = lz2;
    offset = 32 - lz2;
  }
  return ( count < width ) ? -1 : offset;
}

代码背后的想法是：

  val1:  00000000000000000000000010000000 ( 128 )
  val2:  01000001000100000000100100001001 ( 1091569929 )
  lz1:   24
  lz2:   1
  val2:  01000001000100000000100011111111 // |= ((1<<(32-lz1))-1);
  val2:  10111110111011111111011100000000 // = ~val2
  val2:  00011110011001111111001100000000 // &= val2>>1 , count = 1
  val2:  00001110001000111111000100000000 // &= val2>>1 , count = 2
  val2:  00000110000000011111000000000000 // &= val2>>1 , count = 3
  val2:  00000010000000001111000000000000 // &= val2>>1 , count = 4
  val2:  00000000000000000111000000000000 // &= val2>>1 , count = 5
  val2:  00000000000000000011000000000000 // &= val2>>1 , count = 6
  val2:  00000000000000000001000000000000 // &= val2>>1 , count = 7
  val2:  00000000000000000000000000000000 // &= val2>>1 , count = 8

所以在每一步中，所有零的范围，现在是零，都从右边缩小。当值为零时，所采取的步数是最宽运行的宽度。在任何时候，计算尾随零的数量将使偏移量至少为count零的最近范围。

如果在任何点数超过宽度，您可以停止迭代。因此，最大迭代次数是宽度，而不是字长。您可以将此宽度设为O（log n），因为只要不超过宽度，就可以在每次迭代时将移位量加倍。

这是一个DeBruijn查找，用于计算32位值的尾随零位。

static const int MultiplyDeBruijnBitPosition[32] = {
  0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8, 
  31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition[((uint32_t)((v & -v) * 0x077CB531U)) >> 27];

我注意到在你的两个例子中，val1只有一个位集。如果是这种情况，您可以使用DeBruijn技巧找到MSB。

Answer 3

这是我新的改进算法：

int test_fit_within_left_of_msb(  unsigned width,
                                  unsigned val1,
                                  unsigned val2 )
{
    int offset = 32;
    int msb = 0;
    unsigned mask;
    unsigned b;

    msb = 32 - __builtin_clz(val1); /* GCC builtin to count Leading Zeros */

    while(offset - width > msb)
    {
        mask = (((unsigned)1 << width) - 1) << (offset - width);
        b = val2 & mask;

        if (!b)
            return 32 - offset;

        offset = __builtin_ctz(b); /* GCC builtin to Count Trailing Zeros */
    }

    return -1;
}

此代码比我的初始实现有很多改进。主要是通过简单地计算尾随零位来移除内部while循环。其次，我还使算法使用了一个使用自然位位置值的偏移量，从而删除了我原来使用的一些加法和减法操作，直到成功返回语句。你可以挑选从32减去偏移。

代码中的重点是算法 - 我意识到存在关于类型和大小的可移植性问题和假设。回顾页面到输出，其中在10次迭代中执行的位置12处可以找到宽度8，新的alogirthm在循环的2次迭代中执行相同的操作。

为了方便起见，我使用了GCC内置函数，可以使用drawonward提供的MultiplyDeBruijnBitPosition代码（来自：http://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightMultLookup）替换__builtin_ctz，而__bultin_clz可以替换为整数log base 2代码之一来自同一页。

~~这里有一个问题，就是数据（稀疏设置位）我用来测试这个使这个算法表现更好，这可能不是那么好看具有更密集设置位的整数。~~（不正确 - 通过计算尾随零来避免这种不良情况）。

Answer 4

在实现我之前的答案但是为MSB的权利工作之后，我看到除了非常小的差异之外，左右版本完全相同。这导致实现对算法没有实际要求从某个先前值开始使用MSB。

因此，虽然这个答案不符合问题的规格，但这是正确答案，因为规格不正确。

#include<stdint.h>

/* returns bit position within a 32bit integer, where
   a region of contiguous zero bits can be found whose
   count is equal to or greater than width. it returns
   -1 on failure.
*/

int binary_width_fit( unsigned width, uint32_t val )
{
    int offset = 32;
    uint32_t mask;
    uint32_t b;

    while(offset >= width)
    {
        mask = (((uint32_t)1 << width) - 1) << (offset - width);
        b = val & mask;
        if (!b)
            return offset;
        offset = __builtin_ctz(b); /* GCC builtin to Count Trailing Zeros */
    }
    return -1;
}

Answer 5

1（快速）方法是为每个8位字节使用预先计算的LOOKUP TABLES（LUT）：

PosOfFirst1，PosOfLast1，PosOfFirst0，PosOfLast0 - 所有256字节数组

使用以下方法预先计算表:( soz for poor，pascalish pseudocode）

PosOfLast1：

FOR EACH ByteVal (0..255):

if byteVal>127 return 8
elseif byteVal>63 return 7
...
elseif byteVal>0 return 1
else return 0

PosOfFirst1:

c:=0;
while c<8 do
begin
bv = byteVal and 1; 
if bv=1 then return c
else byteval shr 1;     
inc (c);
end;

我对这些algs使用简单的汇编程序。 PosOfFirst0和PosOfLast0 LUT也可以使用这两个表进行预先调整 - 正如TRAILING＆amp;领先0（或1）计数。对这些表的'减1'版本进行预计算也很有用....

然后你可以使用（8位字节） var InputByte：Byte; FirstBit：= PosOfFirst1 [InputByte] // v.fast

对于较大的大小（0,16,24,32 +++++），使用procs和循环检查每个组成8bit字节。可能需要对LUT进行内存访问，但这种方法仍然更快：

a）无需程序调用即可轻松使用。 b）扫描32位数字需要1个移位＆amp;比较为每字节0，需要1次查找（如果找到非零字节）而不是n（0..32）次移，并且比较... c）如果编程好，将在找到第1个/最后1个后停止

LUT原则适用于“人口数量”+其他位操作。例程...

干杯，PrivateSi

更快更好？！

在另一个整数的MSB位置左侧的整数中查找N个连续的零位

5 个答案: