我的数据结构包含8位向量,每位长64位。但是,这些数组的各个字节在数据结构中交错,而不是一个接一个地交错,给定位向量中的每个连续字节在前一个之后是8个字节。有没有一种有效的方法(如并行加载和存储)在现有的x86-64 CPU上在这些交错数组和64位字之间移动数据?嵌入asm的C代码很好,但如果有一个使用gcc内在函数的解决方案会更好。
答案 0 :(得分:0)
我不知道x64 CPU指令可以处理交错数据。但是,由于这些CPU在移位和索引I / O上非常快,我将使用以下方法进行8个内联移位/复制操作,并将剩下的工作留给优化器:
void Write (unsigned char* bytes,
unsigned long long value,
int offset)
{
bytes [offset ] = (unsigned char) (value );
bytes [offset + 8] = (unsigned char) (value >> 8);
bytes [offset + 16] = (unsigned char) (value >> 16);
bytes [offset + 24] = (unsigned char) (value >> 24);
bytes [offset + 32] = (unsigned char) (value >> 32);
bytes [offset + 40] = (unsigned char) (value >> 40);
bytes [offset + 48] = (unsigned char) (value >> 48);
bytes [offset + 56] = (unsigned char) (value >> 56);
return;
}
void Read (unsigned char* bytes,
unsigned long long* value,
int offset)
{
*value = ((unsigned long long) bytes [offset ] ) |
((unsigned long long) bytes [offset + 8] << 8) |
((unsigned long long) bytes [offset + 16] << 16) |
((unsigned long long) bytes [offset + 24] << 24) |
((unsigned long long) bytes [offset + 32] << 32) |
((unsigned long long) bytes [offset + 40] << 40) |
((unsigned long long) bytes [offset + 48] << 48) |
((unsigned long long) bytes [offset + 56] << 56);
return;
}
此代码以little-endian顺序存储64位值。对于big-endian,只需按相反的顺序读/写字节。