Question

我怎样才能在x86汇编中复制x64 MOVQ （移动四字）指令？

例如。给出：

movq xmm5, [esi+2h]
movq [edi+f1h], xmm5

这会有用吗？：

 push eax
 push edx
 mov eax, [esi+2h]
 mov edx, [esi+6h] ; +4 byte offset
 mov [edi+f1h], eax
 mov [edi+f5h], edx  ; +4 byte offset
 pop edx
 pop eax

Answer 1

尝试

fild  qword ptr [esi+2h]
fistp qword ptr [edi+f1h]

Answer 2

SSE2 movq xmm, xmm/m64适用于32位代码（在支持它的CPU上）。您显示的代码已经使用了32位寻址模式，因此在32位模式下它将保持不变。还有movq的另一种形式，只适用于64位模式，即movq xmm, r64/m64。允许您执行movq xmm0, rax。

的相同操作码的内存源形式

但无论如何，32位SSE2：

movq    xmm5, [esi+2h]
movq    [edi+f1h], xmm5

如果您只能假设SSE1而不是SSE2，则可以使用movlps

;; xorps  xmm5,xmm5     ; optional to break a dependency on old value
movlps   xmm5, [esi+2h]       ; merges into xmm5: false dependency
movlps   [edi+f1h], xmm5

根据您正在做的事情，如果您拥有MMX，那么使用MMX可能是值得的，但不是SSE1：

movq    mm0, [esi+2h]
movq    [edi+f1h], mm0

; emms required later, after a loop.

如果您真的想要对齐地址使用单指令64位加载/存储so it's atomic (on P5 and later)，那么fild / fistp是一个不错的选择。（gcc将此std::atomic<int64_t>与-m32 -mno-sse一起使用。）

永远不会破坏您的数据unless you (or MSVC++'s CRT) have the x87 precision bits set to less than a 64-bit mantissa。

fild    qword ptr [esi+2h]
fistp   qword ptr [edi+f1h]

对于复制分散的64位块，

fild / fistp甚至可能比使用32位整数加载/存储具有更好的吞吐量，至少在现代CPU上是这样。对于可能为32或64字节或更大的连续副本，请使用rep movsd。（通常rep movsd的阈值值得高得多，但我们在没有SIMD向量且只有32位整数或64位fild / fistp的情况下谈论多uop加载/存储指令。）

使用普通整数，只需选择一个可以破坏的寄存器。（或者在MSVC内联asm中，让编译器担心保存它。）如果寄存器很紧，只使用一个（如果你的src和dst已知不重叠）：

 mov   eax, [esi+2h]
 mov   [edi+f1h], eax
 mov   eax, [esi+2h + 4]     ; write the +4 separately in the addressing mode as documentation
 mov   [edi+f1h + 4], eax

如果你可以备用2个寄存器，那么可能最好同时进行两个加载然后两个存储。

在x86程序集中复制x64 MOVQ

2 个答案: