最有效的方法是删除任意长度位串中除最左边的1之外的所有内容

时间:2018-01-12 06:37:55

标签: c bit-manipulation mask bit

最近我发现我不知道该怎么做。删除字符串x的最右边1位的所有位可以使用(x & ~(x-1))完成。最左边的1位是否存在类似的表达式?

1 个答案:

答案 0 :(得分:2)

恕我直言,它无法在没有循环的情况下完成。

至少在C中,逐位操作仅支持整数类型(即intchar的所有变体),但不支持数组。

关于char,我不确定,因为我最近看到的热议是否可以考虑char。 (通常在C中,对于算术或位操作,较小类型的值会隐式转换为int(或unsigned int)。)

不直接支持任意长度的整数。它们必须类似于整数类型的数组。

这意味着,该算法实际上需要三个阶段:

  1. 在第一个非零元素

  2. 之前跳过所有数组元素
  3. 处理第一个非零元素,其名称为"地板功率为2"

  4. 将第一个非零元素后的所有数组元素设置为0

  5. 如果1.或2.到达数组的末尾,当然会跳过剩余的步骤。

    ""之前的实际含义""""取决于用于在数组中存储任意长度数的Endiannes(即以最少或最有意义的位开头)。

    实际上,我相信,这必须在每种更高的语言(比C)中完成 - 无论是明确的还是"在引擎盖下#34;。 (例外情况是,某个CPU会通过特定的操作代码支持此操作 - 恕我直言很难相信。)

    所以,这就是我在C中所得到的:

    #include <stdio.h>
    
    typedef unsigned char Byte;
    
    unsigned floorPow2Byte(unsigned v)
    {
      /* set all bits below left most bit */
      v |= v >> 1;
      v |= v >> 2;
      v |= v >> 4;
      /* clear all bits below left most bit */
      v &= ~(v >> 1);
      /* done */
      return v;
    }
    
    void leftMostBit(size_t size, Byte bits[])
    {
      size_t i = size;
      while (i--) if (bits[i]) break;
      if (i > size) return; /* wrap around -> everything was 0 */
      bits[i] = (Byte)floorPow2Byte(bits[i]);
      while (i) bits[--i] = 0;
    }
    
    void printBits(size_t size, Byte bits[])
    {
      static const char *tbl[] = {
        "0000", "0001", "0010", "0011", "0100", "0101", "0110", "0111",
        "1000", "1001", "1010", "1011", "1100", "1101", "1110", "1111"
      };
      while (size--) {
        printf("%s%s", tbl[bits[size] >> 4], tbl[bits[size] & 0xf]);
      }
    }
    
    #define SIZE(ARRAY) (sizeof ARRAY / sizeof *ARRAY)
    
    int main(void)
    {
      /* samples */
      Byte bits1[] = { 0x00, 0xef, 0xbe, 0xad, 0x0b, 0x00 };
      Byte bits2[] = { 0xff, 0xff, 0xff };
      Byte bits3[] = { 0x00, 0x00, 0x00 };
      Byte bits4[] = { 0x00, 0x00, 0x01 };
      Byte bits5[] = { 0x00, 0x00, 0x80 };
      Byte bits6[] = { 0x00, 0x01, 0x80 };
      Byte bits7[] = { 0x00, 0x80, 0x80 };
      Byte bits8[] = { 0x80, 0x80, 0x80 };
      /* check it out */
    #define DO(DATA) \
      printf("Input : "); printBits(SIZE(DATA), DATA); printf("\n"); \
      leftMostBit(SIZE(DATA), DATA); \
      printf("Output: "); printBits(SIZE(DATA), DATA); printf("\n")
    
      DO(bits1);
      DO(bits2);
      DO(bits3);
      DO(bits4);
      DO(bits5);
      DO(bits6);
      DO(bits7);
      DO(bits8);
    
    #undef DO
      /* done */
      return 0;
    }
    

    ideone上进行了编译和测试。

    输出:

    Input : 000000000000101110101101101111101110111100000000
    Output: 000000000000100000000000000000000000000000000000
    Input : 111111111111111111111111
    Output: 100000000000000000000000
    Input : 000000000000000000000000
    Output: 000000000000000000000000
    Input : 000000010000000000000000
    Output: 000000010000000000000000
    Input : 100000000000000000000000
    Output: 100000000000000000000000
    Input : 100000000000000100000000
    Output: 100000000000000000000000
    Input : 100000001000000000000000
    Output: 100000000000000000000000
    Input : 100000001000000010000000
    Output: 100000000000000000000000
    

    最值得注意的部分可能是函数floorPow2Byte()。这受到Round up to the next highest power of 2的强烈启发,但我不得不对其进行一些修改。

    这个想法很简单,但调试花费了很多时间。 (TGIF)

    更新

    当我使用unsigned代替Byte时,样本显然会更有效率。但是,这不会改变整个算法。 更新的来源:

    #include <assert.h>
    #include <stdio.h>
    
    /* Assuming that size_t has "machine word width"
     * this might be the unsigned int type which might be process most
     * efficiently.
     */
    typedef size_t Word;
    
    Word floorPow2(Word v)
    {
      assert(sizeof v <= 8);
      v |= v >> 1;
      v |= v >> 2;
      v |= v >> 4;
      if (sizeof v > 1) {
        v |= v >> 8;
        if (sizeof v > 2) {
          v |= v >> 16;
          if (sizeof v > 4) {
            v |= v >> 32;
          }
        }
      }
      v &= ~(v >> 1);
      return v;
    }
    
    void leftMostBit(size_t size, Word bits[])
    {
      size_t i = size;
      while (i--) if (bits[i]) break;
      if (i > size) return; /* wrap around -> everything was 0 */
      bits[i] = floorPow2(bits[i]);
      while (i) bits[--i] = 0;
    }
    
    void printBits(size_t size, Word bits[])
    {
      static const char *tbl[] = {
        "0000", "0001", "0010", "0011", "0100", "0101", "0110", "0111",
        "1000", "1001", "1010", "1011", "1100", "1101", "1110", "1111"
      };
      while (size--) {
        if (sizeof *bits > 1) {
          if (sizeof *bits > 2) {
            if (sizeof *bits > 4) {
              printf("%s%s", tbl[bits[size] >> 60 & 0xf], tbl[bits[size] >> 56 & 0xf]);
              printf("%s%s", tbl[bits[size] >> 52 & 0xf], tbl[bits[size] >> 48 & 0xf]);
              printf("%s%s", tbl[bits[size] >> 44 & 0xf], tbl[bits[size] >> 40 & 0xf]);
              printf("%s%s", tbl[bits[size] >> 36 & 0xf], tbl[bits[size] >> 32 & 0xf]);
            }
            printf("%s%s", tbl[bits[size] >> 28 & 0xf], tbl[bits[size] >> 24 & 0xf]);
            printf("%s%s", tbl[bits[size] >> 20 & 0xf], tbl[bits[size] >> 16 & 0xf]);
          }
          printf("%s%s", tbl[bits[size] >> 12 & 0xf], tbl[bits[size] >> 8 & 0xf]);
        }
        printf("%s%s", tbl[bits[size] >> 4 & 0xf], tbl[bits[size] & 0xf]);
      }
    }
    
    #define SIZE(ARRAY) (sizeof ARRAY / sizeof *ARRAY)
    
    int main(void)
    {
      /* samples */
      Word bits1[] = { 0x00, 0xef, 0xbe, 0xad, 0x0b, 0x00 };
      Word bits2[] = { 0xff, 0xff, 0xff };
      Word bits3[] = { 0x00, 0x00, 0x00 };
      Word bits4[] = { 0x00, 0x00, 0x01 };
      Word bits5[] = { 0x00, 0x00, 0x80 };
      Word bits6[] = { 0x00, 0x01, 0x80 };
      Word bits7[] = { 0x00, 0x80, 0x80 };
      /* check it out */
    #define DO(DATA) \
      printf("Input : "); printBits(SIZE(DATA), DATA); printf("\n"); \
      leftMostBit(SIZE(DATA), DATA); \
      printf("Output: "); printBits(SIZE(DATA), DATA); printf("\n")
    
      DO(bits1);
      DO(bits2);
      DO(bits3);
      DO(bits4);
      DO(bits5);
      DO(bits6);
      DO(bits7);
    
    #undef DO
      /* done */
      return 0;
    }
    

    许多if可能会感到不舒服,但是一个优秀的C编译器应该认识到条件是编译时可计算的并且会优化它们#34;远离#34;分别

    在Windows 10(64位)上使用VS2013进行编译和测试:

    Input : 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010110000000000000000000000000000000000000000000000000000000010101101000000000000000000000000000000000000000000000000000000001011111000000000000000000000000000000000000000000000000000000000111011110000000000000000000000000000000000000000000000000000000000000000
    Output: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
    Input : 000000000000000000000000000000000000000000000000000000001111111100000000000000000000000000000000000000000000000000000000111111110000000000000000000000000000000000000000000000000000000011111111
    Output: 000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
    Input : 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
    Output: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
    Input : 000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
    Output: 000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
    Input : 000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
    Output: 000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
    Input : 000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000
    Output: 000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
    Input : 000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000
    Output: 000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000