是否有任何SSE2指令以相反的顺序从int
缓冲区加载128位int
向量寄存器?
答案 0 :(得分:10)
在正常加载后反转32位int
元素非常容易:
__m128i v = _mm_load_si128(buff); // MOVDQA
v = _mm_shuffle_epi32(v, _MM_SHUFFLE(0, 1, 2, 3)); // PSHUFD - mask = 00 01 10 11 = 0x1b
您可以对16位short
元素执行相同的操作,但需要更多指令:
__m128i v = _mm_load_si128(buff); // MOVDQA
v = _mm_shuffle_epi32(v, _MM_SHUFFLE(0, 1, 2, 3)); // PSHUFD - mask = 00 01 10 11 = 0x1b
v = _mm_shufflelo_epi16(v, _MM_SHUFFLE(2, 3, 0, 1)); // PSHUFLW - mask = 10 11 00 01 = 0xb1
v = _mm_shufflehi_epi16(v, _MM_SHUFFLE(2, 3, 0, 1)); // PSHUFHW - mask = 10 11 00 01 = 0xb1
请注意,如果SSSE3可用,则可以使用_mm_shuffle_epi8
(PSHUFB
)使用较少的指令执行此操作:
const __m128i vm = _mm_setr_epi8(14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1);
// initialise vector mask for use with PSHUFB
// NB: do this once, outside any processing loop
...
__m128i v = _mm_load_si128(buff); // MOVDQA
v = _mm_shuffle_epi8(v, vm); // PSHUFB
答案 1 :(得分:-2)
编辑:(以下是单精度浮点标量,以防万一)
最近似(也很方便)是_mm_loadr_ps
内在的。请注意,地址必须为16字节对齐。
虽然这种内在转换为超过指令(MOVAPS
+改组)。