Question

我想知道，有没有办法在xmm寄存器中增加一个值，还是只能将一个值移到一个？

我的意思是，你可以这样做：

inc eax

或者像这样：

inc [ebp+7F00F000]

有没有办法用xmm做同样的事情？

我尝试过类似的东西，但是......它不起作用

  inc [rbx+08]
  movss xmm1,[rbx+08]

我甚至尝试过一些非常愚蠢的东西，但它也没有用到

push edx
pextrw edx,xmm2,0
add edx,1
mov [rbx+08],edx
movss xmm1,[rbx+08]
pop edx

Answer 1

对于xmm regs没有inc等价物，并且paddw没有立即操作数形式（所以没有等同于add eax, 1或者）。

paddw (and other element sizes)仅适用于xmm / m128源操作数。因此，如果要增加向量的一个元素，则需要从内存加载常量or generate it on the fly。

e.g。增加xmm0所有元素的最便宜的方法是：

; outside the loop
pcmpeqw    xmm1,xmm1     # xmm1 = all-ones = -1

; inside the loop
psubw      xmm0, xmm1    ; xmm0 -= -1   (in each element).  i.e. xmm0++

或者

paddw      xmm0, [ones]  ; where ones is a static constant.

如果需要超过两条指令来构造常量，或者寄存器压力是一个问题，那么从内存加载常量可能只是一个好主意。

例如，如果要构造一个仅增加低32位元素的常量，则可以使用字节移位将其他元素归零：

; hoisted out of the loop
pcmpeqw    xmm1,xmm1     # xmm1 = all-ones = -1
psrldq     xmm1, 12      # xmm1 = [ 0 0 0 -1 ]


; in the loop
psubd      xmm0, xmm1

如果你的尝试只是增加xmm2中的低16位元素，那么是的，这是一次愚蠢的尝试。 IDK您正在存储到[rbx+8]然后加载到xmm1（将高96位归零）。

以下是如何编写xmm - ＆gt; gp - ＆gt; xmm往返行程不那么愚蠢。（与paddw相比，矢量常数仍然很糟糕。）

# don't push/pop.  Instead, pick a register you can clobber without saving/restoring
movd    edx, xmm2       # this is the cheapest way to get the low 16.  It doesn't matter that we also get the element 1 as garbage in the high half of edx
inc     edx             # we only care about dx, but this is still the most efficient instruction
pinsrw  xmm2, edx, 0    # normally you'd just use movd again, but we actually want to merge with the old contents.

如果你想使用16位以外的元素，你可以使用SSE4.1 pinsrb / d / q，或者你使用{ {1}}和洗牌。

有关如何使用SSE向量的更多好建议，请参阅Agner Fog's Optimize Assembly指南。还有x86标记wiki中的其他链接。

Answer 2

简而言之，不，不是你想的那样。

在SSE下，所有原始XMM寄存器都是浮点寄存器。浮点没有递增操作。

SSE2添加了许多整数类型的寄存器，但仍然没有增量。这些寄存器和附加操作实际上是用于高速算术运算，包括点积，带有舍入的精确乘积等。

您希望将增量操作应用于通用寄存器或累加器。

您可能会发现this set of slides在概述和功能方面有些信息。

有没有办法增加xmm寄存器的值？

2 个答案: