Question

由于内存约束，我必须将一些值存储在一个具有6位/对（3位/值）的数组中。当我想根据对的索引访问此数组时，问题就出现了。该数组看起来像这样

|--byte 0 | --byte 1 | --byte 2  
|00000011 | 11112222 | 22333333   ...  and so on, the pattern repeats.  
|------|-------|--------|------|  
 pair 0  pair 1  pair 2  pair 3 

 => 4 pairs / 3 bytes

您可以看到，有时（对于可被1和2整除的索引），需要2个字节来提取值我创建了一个给定索引的函数，返回该对中的第一个值（3位）和另一个（也是3位）。

void GetPair(char *array, int index, int &value1, int &value2) {
    int groupIndex = index >> 2; // Divide by 4 to get the index of the group of 3 bytes (with 4 pairs)
    // We use 16 bits starting with the first byte from the group for indexes divisible by 0 and 1,  
    // 16 bits starting with the second byte when divisible by 2 and 3
    short int value = *(short int *)(array + groupIndex + ((index & 0x02) >> 1));

    switch(index & 0x03) { // index % 4
        case 0: { 
            // extract first 3 bits
            value1 = (value & 0xE000) >> 13;
            // extract the next 3 bits
            value2 = (value & 0x1C00) >> 10;
            break;
        }
        case 1: {
            value1 = (value & 0x380) >> 7;
            value2 = (value & 0x70) >> 4;
            break;
        }
        case 2: {
            value1 = (value & 0xE00) >> 9;
            value2 = (value & 0x1C0) >> 6;
            break;
        }
        case 3: {
            value1 = (value & 0x38) >> 2;
            value2 = value & 0x7;
            break;
        }
}

现在我的问题是：有没有更快的方法来提取这些值？

我做了一个测试，当使用2个字节/对（1个字节/值）时，大约需要6秒才能访问所有对（总共53个）1亿次。使用紧凑型阵列时，大约需要22秒:(可能因为它需要计算所有这些掩码和位移）我试着尽可能清楚地解释......如果没有，请原谅我。

Answer 1

这是以内存效率换取速度的经典案例。我假设您在一个内存稀缺的环境中工作，并且需要将许多项目推入此阵列，否则这可能不值得您花时间。

您可以使用查找表来查找正确的移位和掩码值来消除switch语句。

short int shift1[4] = { 13, 7, 9, 2 };
short int shift2[4] = { 10, 4, 6, 0 };
short int mask1[4] = { 0xe000, 0x0380, 0x0e00, 0x38 };
short int mask2[4] = { 0x1c00, 0x0700, 0x1c, 0x07 };

int index = value % 4; /* you're not saving any time by using bitwise AND but you are making your code less readable */
value1 = (value & mask1[index]) >> shift1;
value2 = (value & mask2[index]) >> shift2;

这个想法是你消除任何分支。然而，每条路径都很短，可能无关紧要。在我的测试中（在PowerPC上的gcc）几乎没有任何区别。但是，这台机器的内存带宽足够慢，两个版本都比使用直接阵列访问速度快，每个值只有1个字节。

Answer 2

这个怎么样？它消除了掩码和移位值的内存访问。（当然，（非可移植）假设是char是8位而short是16位。还假设index * 6不会溢出int。）

void GetPair(char *array, int index, int &value1, int &value2)
{
   unsigned shift = 10 - index * 6 % 8;
   unsigned short data = (*(unsigned short *)(array + index * 6 / 8) >> shift) & 0x3f;
   value2 = data & 7;
   value1 = data >> 3;
}

但是，读取16位边界的短路可能会受到惩罚。当我还在跟踪这些事情时，曾经有过这样的问题。如果是这种情况，从16位边界开始读取32位值并相应地调整移位和掩码可能会更好。

Answer 3

现代架构甚至不再处理单个字节;它们可以处理4字节的单词并提取您请求的单词。因此，在库存硬件上，您可能会看到每对使用4个字节并自行提取部分的改进。每个条目4个字节也可能更快，但加载第二个字的成本可能大于屏蔽和移位的成本。或者可能不是;现代处理器很奇怪。简介并查看！

3对元素对的数组

3 个答案: