所以我试图看看是否有一些偷偷摸摸的位操作系列允许我计算uint32中有多少位是1(或者更确切地说是计数模2)。
“显而易见”的方式是这样的:
uint32 count_1_bits_mod_2(uint32 word) {
uint32 i, sum_mod_2;
for(i = 0; i < 32; i++)
sum_mod_2 ^= word;
word >>= 1;
是否有一些“偷偷摸摸”的方式来获得正确的sum_mod_2而不使用循环?
答案 0 :(得分:6)
计算位is by using "magic numbers"的最快方法:
unsigned int v = 0xCF31; // some number
v = v - ((v >> 1) & 0x55555555); // reuse input as temporary
v = (v & 0x33333333) + ((v >> 2) & 0x33333333); // temp
unsigned int c = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; // count
这会打印9(link到ideone)。
对于32位数字,这需要12次操作 - 与基于查找的方法相同的数字,但您不需要查找表。
答案 1 :(得分:5)
“最佳”方式可能取决于运行代码的CPU架构。例如,从Nehalem / Barcelona开始的Intel / AMD CPU支持“popcnt”指令,该指令计算整数寄存器中的1位数,因此只需两条指令(popcnt和按位AND与1)就可以计算出值你寻求。
如果您正在使用相当新版本的GCC(或具有类似支持的其他编译器),您可以使用其__builtin_popcount()函数来计算人口数,使用“-mpopcount”或“-msse4”。 2“指定的编译标志使用popcnt指令。有关详细信息,请参阅this link。 E.g:
uint32_t parity = __builtin_popcount(x) & 1;
答案 2 :(得分:5)
最快最快使用CPU popcnt指令,第二个是SSSE3代码。最快的便携式是bitlice方法,然后是查找表:http://www.dalkescientific.com/writings/diary/archive/2011/11/02/faster_popcount_update.html
与所有内容一样,您应该对您的负载进行基准测试。然后优化它是否太慢。
对于AMD Phenom II X2 550,使用gcc 4.7.1(使用g++ -O3 popcnt.cpp -o popcnt -mpopcnt -msse2
):
Bitslice(7) 1462142 us; cnt = 32500610 Bitslice(24) 1221985 us; cnt = 32500610 Lauradoux 2347749 us; cnt = 32500610 SSE2 8-bit 790898 us; cnt = 32500610 SSE2 16-bit 825568 us; cnt = 32500610 SSE2 32-bit 864665 us; cnt = 32500610 16-bit LUT 1236739 us; cnt = 32500610 8-bit LUT 1951629 us; cnt = 32500610 gcc popcount 803173 us; cnt = 32500610 gcc popcountll 7678479 us; cnt = 32500610 FreeBSD version 1 2802681 us; cnt = 32500610 FreeBSD version 2 2167031 us; cnt = 32500610 Wikipedia #2 4927947 us; cnt = 32500610 Wikipedia #3 4212143 us; cnt = 32500610 HAKMEM 169/X11 3559245 us; cnt = 32500610 naive 16182699 us; cnt = 32500610 Wegner/Kernigan 12115119 us; cnt = 32500610 Anderson 61045764 us; cnt = 32500610 8x shift and add 6712049 us; cnt = 32500610 32x shift and add 6662200 us; cnt = 32500610
对于Intel Core2 Duo E8400,使用gcc 4.7.1(g++ -O3 popcnt.cpp -o popcnt -mssse3
,此CPU不支持-mpopcnt
)
Bitslice(7) 1353007 us; cnt = 32500610 Bitslice(24) 953044 us; cnt = 32500610 Lauradoux 534697 us; cnt = 32500610 SSE2 8-bit 458277 us; cnt = 32500610 SSE2 16-bit 555278 us; cnt = 32500610 SSE2 32-bit 634897 us; cnt = 32500610 SSSE3 414542 us; cnt = 32500610 16-bit LUT 1208412 us; cnt = 32500610 8-bit LUT 1400175 us; cnt = 32500610 gcc popcount 5428396 us; cnt = 32500610 gcc popcountll 2743358 us; cnt = 32500610 FreeBSD version 1 3025944 us; cnt = 32500610 FreeBSD version 2 2313264 us; cnt = 32500610 Wikipedia #2 1570519 us; cnt = 32500610 Wikipedia #3 1051828 us; cnt = 32500610 HAKMEM 169/X11 3982779 us; cnt = 32500610 naive 20951420 us; cnt = 32500610 Wegner/Kernigan 13665630 us; cnt = 32500610 Anderson 6771549 us; cnt = 32500610 8x shift and add 14917323 us; cnt = 32500610 32x shift and add 14494482 us; cnt = 32500610
Bitslice方法是一种并行机制,一次计算多个(7或24个)机器字,因此它对于通用函数具有边际可用性。在http://dalkescientific.com/writings/diary/popcnt.cpp之后:
static inline int popcount_fbsd2(unsigned *buf, int n)
{
int cnt=0;
do {
unsigned v = *buf++;
v -= ((v >> 1) & 0x55555555);
v = (v & 0x33333333) + ((v >> 2) & 0x33333333);
v = (v + (v >> 4)) & 0x0F0F0F0F;
v = (v * 0x01010101) >> 24;
cnt += v;
} while(--n);
return cnt;
}
static inline int merging2(const unsigned *data)
{
unsigned count1,count2,half1,half2;
count1=data[0];
count2=data[1];
half1=data[2]&0x55555555;
half2=(data[2]>>1)&0x55555555;
count1 = count1 - ((count1 >> 1) & 0x55555555);
count2 = count2 - ((count2 >> 1) & 0x55555555);
count1+=half1;
count2+=half2;
count1 = (count1 & 0x33333333) + ((count1 >> 2) & 0x33333333);
count2 = (count2 & 0x33333333) + ((count2 >> 2) & 0x33333333);
count1+=count2;
count1 = (count1&0x0F0F0F0F)+ ((count1 >> 4) & 0x0F0F0F0F);
count1 = count1 + (count1 >> 8);
count1 = count1 + (count1 >> 16);
return count1 & 0x000000FF;
}
static inline int merging3(const unsigned *data)
{
unsigned count1,count2,half1,half2,acc=0;
int i;
for(i=0;i<24;i+=3)
{
count1=data[i];
count2=data[i+1];
//w = data[i+2];
half1=data[i+2]&0x55555555;
half2=(data[i+2]>>1)&0x55555555;
count1 = count1 - ((count1 >> 1) & 0x55555555);
count2 = count2 - ((count2 >> 1) & 0x55555555);
count1+=half1;
count2+=half2;
count1 = (count1 & 0x33333333) + ((count1 >> 2) & 0x33333333);
count1 += (count2 & 0x33333333) + ((count2 >> 2) & 0x33333333);
acc += (count1 & 0x0F0F0F0F)+ ((count1>>4) &0x0F0F0F0F);
}
acc = (acc & 0x00FF00FF)+ ((acc>>8)&0x00FF00FF);
acc = acc + (acc >> 16);
return acc & 0x00000FFFF;
}
/* count 24 words at a time, then 3 at a time, then 1 at a time */
static inline int popcount_24words(unsigned *buf, int n) {
int cnt=0, i;
for (i=0; i<n-n%24; i+=24) {
cnt += merging3(buf+i);
}
for (;i<n-n%3; i+=3) {
cnt += merging2(buf+i);
}
cnt += popcount_fbsd2(buf+i, n-i);
return cnt;
}
答案 3 :(得分:2)
count = 0;
while (word != 0) {
word = word & (word-1);
count++;
}
声明
word = word & (word-1);
清除字中最低的1位。最终,你用完了1位。
答案 4 :(得分:1)
我认为这很容易理解,效率很高。
x = some number;
x ^= (x >> 1); // parity of every bit pair now in bits 0, 2, 4, ...
x ^= (x >> 2); // parity of every 4 bits now in bits 0, 4, 8, ...
x ^= (x >> 4); // ...etc
x ^= (x >> 8);
x ^= (x >> 16); // parity of all 32 bits now in bit 0
parity = x & 1;
答案 5 :(得分:0)
预先计算所有结果并进行简单的数组查找。对于一个简单的“count even或odd”布尔结果,你可以创建一个位数组。
答案 6 :(得分:0)
cnt = 0;
while (word != 0) {
word = word & (word-1);
cnt++;
这可以删除1位 有关详细信息,请访问http://pgrtutorials.blogspot.in/p/bit-manipulation.html