Question

什么是将整数SSE寄存器中的较高或较低64位移动到另一个的最快方法？使用SSE 4.1，可以使用单个pblendw指令（_mm_blend_epi16）完成。但是旧的SSE版本呢？转移和拆包？ AND和OR？ movsd尽管有旁路延迟？

密切相关的问题：Best way to shuffle 64-bit portions of two __m128i's

Answer 1

要将低64位从src移到dst，保留dst的高64位：

movsd dst, src

要将高64位从src移到dst，保留dst的低64位：

shufps dst, src, E4h

绕过延迟通常只会增加延迟，而不是调度或执行或退出资源，因此在比较其他等效序列时通常只是一个问题（即如果有一个单指令等价物保留在整数域中，你会更喜欢用它来进行整数运算。）

Answer 2

Agner Fog的Optimizing Assembly指南提供了一套很好的各种数据移动指令表。（第13.3节）。

要将两个注册表中的数据合并为一个，您的选项包括：

$uploaded['name']

从Agner Fog的表中复制/粘贴的描述，他拥有该版权。

所以MOVLHPS # SSE. Low qword unchanged, high qword from low of source MOVHLPS # SSE. Low qword from high of source, high qword unchanged MOVSD # SSE2. Low qword from source (register only), high qword unchanged # memory-source-only insns: MOVLPS/D # SSE1/2. Low qword from memory, high qword unchanged MOVHPS/D # SSE1/2. High qword from memory, low qword unchanged SHUFPD # SSE2. Low qword from any position of destination. high qword from any position of source PUNPCKLQDQ # SSE2. Low qword unchanged, high qword from low of source PUNPCKHQDQ # SSE2. Low qword from high of destination, high qword from high of source MOVQ # SSE2. Low qword from source, high qword set to zero PBLENDW # SSE4.1 PINSRQ # SSE4.1 (only takes the low64 of src)看起来是从其他注册表中插入high64的最佳选择。其他选项要求它在src的low64中（对于shufpd或punpcklqdq）。

Answer 3

不知道最快，也许是最简单的

_mm_unpacklo_epi64(_mm_setzero_si128(), x)

[0，x0]

_mm_unpackhi_epi64(_mm_setzero_si128(), x)

[0，x1]

_mm_move_epi64(x)

[x0,0]

_mm_unpackhi_epi64(x, _mm_setzero_si128())

[x1,0]

在整数SSE寄存器中移动更高或更低64位的最快方法

3 个答案: