Question

我在接受采访时被问到以下问题。

int countSetBits(void *ptr, int start, int end);

梗概： 假设ptr指向一大块内存。将此存储器视为连续的位序列，start和end是位位置。假设start和end 具有适当的值，ptr指向初始化的内存块。

问题： 写一个C代码来计算从start到end [包括]的位数，并返回计数。

只是为了让它更清晰

 ptr---->+-------------------------------+
         | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
         +-------------------------------+
         | 8 | 9 |                   |15 |
         +-------------------------------+
         |                               |
         +-------------------------------+
              ...
              ...
         +-------------------------------+
         |               | S |           |
         +-------------------------------+
              ...
              ...
         +-------------------------------+
         |    | E |                      |
         +-------------------------------+
              ...
              ...

我的解决方案：

int countSetBits(void *ptr, int start, int end )
{
    int count = 0, idx; 

    char *ch; 

    for (idx = start; idx <= end; idx++) 
    {     ch = ptr + (idx/8); 

          if((128 >> (idx%8)) & (*ch)) 
          {
                   count++; 
          }
    }

    return count; 
}

在采访期间，我提供了一个非常冗长且效率低下的代码。我后来处理了它并提出了上述解决方案。

我非常确定SO社区可以提供更优雅的解决方案。我很想知道他们的反应。

PS：以上代码未编译。它更像是一个伪代码，可能包含错误。

Answer 1

我认为最快速有效的方法是使用256个条目的表，其中每个元素代表索引中的位数。索引是内存位置的下一个字节。

类似的东西：

int bit_table[256] = {0, 1, 1, 2, 1, ...};
char* p = ptr + start;
int count = 0;
for (p; p != ptr + end; p++)
    count += bit_table[*(unsigned char*)p];

Answer 2

边界条件，他们不尊重......

这里的每个人似乎都专注于查找表来计算位数。这没关系，但我认为在回答面试问题时更重要的是确保你处理边界条件。

查找表只是一个优化。 获得正确的答案比获得快速更重要。如果这是我的采访，直接进入查找表，甚至没有提到有一些棘手的细节处理前几个和最后几个不在全字节边界的位将比提出一个计数的解决方案更糟糕每一点都很沉闷，但边界条件合适。

所以我认为Bhaskar在他的问题中的解决方案可能优于这里提到的大多数答案 - 它似乎处理边界条件。

这是一个使用查找表并试图仍然处理边界的解决方案（它只是经过轻微测试，因此我不会声称它是100％正确的）。它也比我想的更丑，但现在已经很晚了：

typedef unsigned char uint8_t;

static
size_t bits_in_byte( uint8_t val)
{
    static int const half_byte[] = { 0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4 };

    int result1 = half_byte[val & 0x0f];
    int result2 = half_byte[(val >> 4) & 0x0f];

    return result1 + result2;
}


int countSetBits( void* ptr, int start, int end) 
{
    uint8_t*    first;
    uint8_t*    last;
    int         bits_first;
    int         bits_last;
    uint8_t     mask_first;
    uint8_t     mask_last;

    size_t count = 0;

    // get bits from the first byte
    first = ((uint8_t*) ptr) + (start / 8);
    bits_first = 8 - start % 8;
    mask_first = (1 << bits_first) - 1;
    mask_first = mask_first << (8 - bits_first);


    // get bits from last byte
    last = ((uint8_t*) ptr) + (end / 8);
    bits_last = 1 + (end % 8);
    mask_last = (1 << bits_last) - 1;

    if (first == last) {
        // we only have a range of bits in  the first byte
        count = bits_in_byte( (*first) & mask_first & mask_last);        
    }
    else {
        // handle the bits from the first and last bytes specially
        count += bits_in_byte((*first) & mask_first);
        count += bits_in_byte((*last) & mask_last);

        // now we've collected the odds and ends from the start and end of the bit range
        // handle the full bytes in the interior of the range

        for (first = first+1; first != last; ++first) {
            count += bits_in_byte(*first);
        }
    }

    return count;
}

请注意，作为访谈的一部分必须解决的细节是，字节中的位是否从最低有效位（lsb）或最高有效位（msb）开始编制索引。换句话说，如果将起始索引指定为0，那么值为0x01的字节或值为0x80的字节是否会在该索引中设置该位？类似于判断索引是否将字节中的位顺序视为big-endian或little-endian。

对此没有'正确'的答案 - 面试官必须指明行为应该是什么。我还要注意，我的示例解决方案以与OP的示例代码相反的方式处理这个问题（我按照我如何解释图表，索引读作'位数'）。 OP的解决方案将位顺序视为big-endian，我的函数将它们视为little-endian。因此即使两者都处理星形和星形的部分字节。范围结束时，他们会给出不同的答案。哪个是正确答案取决于问题的实际规格是什么。

Answer 3

您可能会发现this page很有趣，它包含几种替代解决方案。

Answer 4

@dimitri的版本可能是最快的。但是在采访中很难为所有128个8位字符构建位数表。您可以获得一个非常快的版本，其中包含16个十六进制数字0x0,0x1，...，0xF的表，您可以轻松构建：

int countBits(void *ptr, int start, int end) {
    // start, end are byte indexes
    int hexCounts[16] =   {0, 1, 1, 2,   1, 2, 2, 3,
                           1, 2, 3, 3,   2, 3, 3, 4}; 
    unsigned char * pstart = (unsigned char *) ptr + start;
    unsigned char * pend = (unsigned char *) ptr + end;
    int count = 0;
    for (unsigned char * p = pstart; p <= pend; ++p) {
        unsigned char b = *p;
        count += hexCounts[b & 0x0F] + hexCounts[(b >> 4) & 0x0F];
    }
    return count;
}

编辑：如果start和end是位索引，则在调用上述函数之前，将首先计算第一个和最后一个字节中的位：

int countBits2(void *ptr, int start, int end) {
    // start, end are bit indexes
    if (start > end) return 0;
    int count = 0;
    unsigned char* pstart = (unsigned char *) ptr + start/8; // first byte
    unsigned char* pend = (unsigned char *) ptr + end/8;     // last byte
    int istart = start % 8;                                  // index in first byte
    int iend = end % 8;                                      // index in last byte 
    unsigned char b = *pstart;                               // byte
    if (pstart == pend) {                                    // count in 1 byte only
        b = b << istart;
        for (int i = istart; i <= iend; ++i) {               // between istart, iend
            if (b & 0x80) ++count; 
            b = b << 1;
        }
    }
    else {                                                   // count in 2 bytes
        for (int i = istart; i < 8; ++i) {                   // from istart to 7
            if (b & 1) ++count; 
            b = b >> 1;
        }
        b = *pend;
        for (int i = 0; i <= iend; ++i) {                    // from 0 to iend
            if (b & 0x80) ++count; 
            b = b << 1;
        }
    }
    return count + countBits(ptr, start/8 + 1, end/8 - 1);
}

Answer 5

有许多方法可以解决问题。 This是一个很好的帖子，可以比较最常见选项的效果。

Answer 6

最近的一项优秀研究，比较了几种最先进的技术，用于计算一系列内存中 'set'（1-value）位 的数量（ > aka Hamming Weight， bitset cardinality ，横向总和，人口数量或popcnt，等。）可以在Wojciech，Kurz和Lemire（2017）中找到，Faster population counts using AVX2 instructions ¹

以下是该论文中“Harley-Seal”算法的完整，经过测试和完全工作的 C＃改编，作者发现这是使用通用的最快方法按位运算（即不需要特殊硬件）。

<强> 1。托管阵列入口点
（可选）提供对托管阵列ulong[]的块优化位计数的访问。

/// <summary> Returns the total number of 1-valued bits in the array </summary>
[DebuggerStepThrough]
public static int OnesCount(ulong[] rg) => OnesCount(rg, 0, rg.Length);

/// <summary> Finds the total number of '1' bits in an array or its subset </summary>
/// <param name="rg"> Array of ulong values to scan </param>
/// <param name="index"> Starting index in the array </param>
/// <param name="count"> Number of ulong values to examine, starting at 'i' </param>
public static int OnesCount(ulong[] rg, int index, int count)
{
    if ((index | count) < 0 || index > rg.Length - count)
        throw new ArgumentException();

    fixed (ulong* p = &rg[index])
        return OnesCount(p, count);
}

<强> 2。标量API
由块优化计数器用于聚合来自进位保存加法器的结果，并且还用于完成块大小的任何余数，不能被16 x 8字节的优化块大小整除/ ulong = 128个字节。也适用于一般用途。

/// <summary> Finds the Hamming Weight or ones-count of a ulong value </summary>
/// <returns> The number of 1-bits that are set in 'x' </returns>
public static int OnesCount(ulong x)
{
    x -= (x >> 1) & 0x5555555555555555;
    x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333);
    return (int)((((x + (x >> 4)) & 0x0F0F0F0F0F0F0F0F) * 0x0101010101010101) >> 56);
}

第3。 “Harley-Seal”块优化的1位计数器
一次处理128字节的块，即每块16 ulong个值。使用进位保存加法器（如下所示）在相邻的ulong之间添加单个位，并将总计向上聚合为2的幂。

/// <summary> Count the number of 'set' (1-valued) bits in a range of memory. </summary>
/// <param name="p"> Pointer to an array of 64-bit ulong values to scan </param>
/// <param name="c"> Size of the memory block as a count of 64-bit ulongs </param>
/// <returns> The total number of 1-bits </returns>
public static int OnesCount(ulong* p, int c)
{
    ulong z, y, x, w;
    int c = 0;

    for (w = x = y = z = 0UL; cq >= 16; cq -= 16)
        c += OnesCount(CSA(ref w,
                            CSA(ref x,
                                CSA(ref y,
                                    CSA(ref z, *p++, *p++),
                                    CSA(ref z, *p++, *p++)),
                                CSA(ref y,
                                    CSA(ref z, *p++, *p++),
                                    CSA(ref z, *p++, *p++))),
                            CSA(ref x,
                                CSA(ref y,
                                    CSA(ref z, *p++, *p++),
                                    CSA(ref z, *p++, *p++)),
                                CSA(ref y,
                                    CSA(ref z, *p++, *p++),
                                    CSA(ref z, *p++, *p++)))));

    c <<= 4;
    c += (OnesCount(w) << 3) + (OnesCount(x) << 2) + (OnesCount(y) << 1) + OnesCount(z);

    while (--cq >= 0)
        c += OnesCount(*p++);

    return c;
}

<强> 4。保存加法器（CSA）

/// <summary> carry-save adder </summary>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
static ulong CSA(ref ulong a, ulong b, ulong c)
{
    ulong v = a & b | (a ^ b) & c;
    a ^= b ^ c;
    return v;
}

<小时/> 的备注

因为此处所示的方法通过一次进行128字节的块来计算1位的总数，所以只有在内存块大小较大时它才会变得最佳。例如，可能至少是16个qword（16 - ulong）块大小的一些（小）倍数。对于较小内存范围内的1位计数，此代码将正常工作，但是大大低于更天真的方法。有关详细信息，请参阅该文件。

从论文中，该图总结了Carry-Save Adder的工作原理：

<小时/> <强>参考

[1.]Muła，Wojciech，Nathan Kurz和Daniel Lemire。 “使用AVX2指令计算更快的人口数量。”计算机杂志61，没有。 1（2017）：111-120。

Answer 7

免责声明：未尝试编译以下代码。

/*
 * Table counting the number of set bits in a byte.
 * The byte is the index to the table.
 */
uint8_t  table[256] = {...};

/***************************************************************************
 *
 * countBits - count the number of set bits in a range
 *
 * The most significant bit in the byte is considered to be bit 0.
 *
 * RETURNS: 0 on success, -1 on failure
 */
int countBits (
    uint8_t *  buffer,
    int        startBit,  /* starting bit */
    int        endBit,    /* End-bit (inlcusive) */
    unsigned * pTotal     /* Output: number of consecutively set bits */
    ) {
    int      numBits;     /* number of bits left to check */
    int      mask;        /* mask to apply to byte from <buffer> */
    int      bits;        /* # of bits to end of byte */
    unsigned count = 0;   /* total number of bits set */
    uint8_t  value;       /* value read from the buffer */

    /* Return -1 if parameters fail sanity check (skipped) */

    numBits   = (endBit - startBit) + 1;

    index  = startBit >> 3;
    bits   = 8 - (startBit & 7);
    mask   = (1 << bits) - 1;

    value = buffer[index] & mask;  /* mask-out any bits preceding <startBit> */
    numBits -= bits;

    while (numBits > 0) {          /* Note: if <startBit> and <endBit> are in */
        count += table[value];     /* same byte, this loop gets skipped. */
        index++;
        value = buffer[index];
        numBits -= 8;
    }

    if (numBits < 0) {             /* mask-out any bits following <endBit> */
        bits   = 8 - (endBit & 7);
        mask   = 0xff << bits;
        value &= mask;
    }

    count += table[value];

    *pTotal = count;
    return 0;
}

编辑：功能标题已更新。

Answer 8

根据您应用的行业，查找表可能不是可接受的优化方法，而平台/编译器特定的优化是。知道大多数编译器和CPU指令集都有弹出计数指令，我会这样做。这是一种简单性与性能权衡，但因为现在我仍在迭代一系列字符。

另请注意，与大多数答案相反，我假设开始和结束都是字节偏移，因为它没有在问题中指明它们不是，并且在大多数情况下它是默认值。

int countSetBits(void *ptr, int start, int end )
{
    assert(start < end);

    unsigned char *s = ((unsigned char*)ptr + start);
    unsigned char *e = ((unsigned char*)ptr + end);

    int r = 0;

    while(s != e)
    {
        // __builtin_clz is not defined for 0 input.
        if(*s) r += 32 - __builtin_clz(*s);
        s++;
    }

    return r;
}

在连续的内存块中进行位计数

8 个答案: