Question

我需要这样的内联汇编代码：

我有一对（因此，它是平衡的）程序集内的推/弹操作
我的内存中有一个变量（因此，不是注册）作为输入

像这样：

__asm__ __volatile__ ("push %%eax\n\t"
        // ... some operations that use ECX as a temporary
        "mov %0, %%ecx\n\t"
        // ... some other operation
        "pop %%eax"
: : "m"(foo));
// foo is my local variable, that is to say, on stack

在反汇编编译代码时，编译器给出的内存地址如0xc(%esp)，它相对于esp，因此，这段代码将无法正常工作，因为我有一个{{1} } push之前的操作。因此，我怎么能告诉编译我不喜欢mov相对于foo，而是esp相对于ebp。

P.S。您可以建议我可以将-8(%ebp)放在Clobbers中，但它只是一个示例代码。我不想说明我不接受这个解决方案的全部原因。

Answer 1

当你有任何内存输入/输出时，通常应该避免在inline-asm内部修改ESP，因此你不必禁用优化或强制编译器以其他方式与EBP建立堆栈帧。一个主要优点是您（或编译器）可以将EBP用作额外的空闲寄存器;如果你已经不得不泄漏/重装东西，可能会有显着的加速。如果您正在编写内联asm，可能这是一个热点，因此值得花费额外的代码大小来使用ESP相对寻址模式。

在x86-64代码中，安全使用推/弹是一个额外的障碍，因为you can't tell the compiler you want to clobber the red-zone低于RSP。（您可以使用-mno-red-zone进行编译，但是无法从C源代码中禁用它。）您可能会遇到问题like this，其中您在堆栈中破坏了编译器的数据。但是，没有32位x86 ABI具有红色区域，因此这仅适用于x86-64系统V.（或带有红色区域的非x86 ISA）。

如果你想像-fomit-frame-pointer那样只使用asm-only的东西作为堆栈数据结构，那么你只需要为该函数禁用push，这样就可以进行不同的推送。或者也许如果优化代码大小。

您总是可以在asm中编写一个完整的非内联函数并将其放在一个单独的文件中，然后您就可以完全控制。但只有当你的函数包含一个完整的循环时才这样做;不要让编译器call成为C内循环内的一个简短的非循环函数。

您似乎在内联asm中使用push / pop因为您没有足够的注册表，并且需要保存/重新加载某些内容。 您不需要使用push / pop进行保存/恢复。相反，使用带有"=m"约束的虚拟输出操作数来让编译器为您分配堆栈空间，并使用mov来自/来自这些插槽。（当然，您并不仅限于mov;如果您只需要一次或两次该值，那么对于ALU指令使用内存源操作数可能是一种胜利。）

对于代码大小，这可能稍差，但通常情况下并不差（并且可能更好）。如果这还不够好，请在asm中编写整个函数（或整个循环），这样您就不必与编译器搏斗。

int foo(char *p, int a, int b) {
    int t1,t2;  // dummy output spill slots
    int r1,r2;  // dummy output tmp registers
    int res;

    asm ("# operands: %0  %1  %2  %3  %4  %5  %6  %7  %8\n\t"
         "imull  $123, %[b], %[res]\n\t"
         "mov   %[res], %[spill1]\n\t"
         "mov   %[a], %%ecx\n\t"
         "mov   %[b], %[tmp1]\n\t"  // let the compiler allocate tmp regs, unless you need specific regs e.g. for a shift count
         "mov   %[spill1], %[res]\n\t"
    : [res] "=&r" (res),
      [tmp1] "=&r" (r1), [tmp2] "=&r" (r2),  // early-clobber
      [spill1] "=m" (t1), [spill2] "=&rm" (t2)  // allow spilling to a register if there are spare regs
      , [p] "+&r" (p)
      , "+m" (*(char (*)[]) p) // dummy in/output instead of memory clobber
    : [a] "rmi" (a), [b] "rm" (b)  // a can be an immediate, but b can't
    : "ecx"
    );

    return res;

    // p unused in the rest of the function
    // so it's really just an input to the asm,
    // which the asm is allowed to destroy
}

使用gcc7.3 -O3 -m32 on the Godbolt compiler explorer汇编到以下asm。请注意asm-comment显示编译器为所有模板操作数选择的内容：它选择了12(%esp) %[spill1]和％edi for％[spill2] (because I used＆＃34; =＆amp; rm＆＃34; for that operand, so the compiler saved/restore％edi`在asm之外，并将它交给我们用于该虚操作数。）

foo(char*, int, int):
    pushl   %ebp
    pushl   %edi
    pushl   %esi
    pushl   %ebx
    subl    $16, %esp
    movl    36(%esp), %edx
    movl    %edx, %ebp
#APP
# 19 "/tmp/compiler-explorer-compiler118120-55-w92ge8.v797i/example.cpp" 1
        # operands: %eax  %ebx  %esi  12(%esp)  %edi  %ebp  (%edx)  40(%esp)  44(%esp)
    imull  $123, 44(%esp), %eax
    mov   %eax, 12(%esp)
    mov   40(%esp), %ecx
    mov   44(%esp), %ebx
    mov   12(%esp), %eax

# 0 "" 2
#NO_APP
    addl    $16, %esp
    popl    %ebx
    popl    %esi
    popl    %edi
    popl    %ebp
    ret

嗯，告诉编译器我们修改了哪个内存的虚拟内存操作数似乎导致了一个寄存器专用，我猜因为p操作数是早期的，所以它不能使用同一个登记册。如果您确信其他输入中没有一个会使用与p相同的寄存器，我想您可能会冒险离开早期的破坏者。（即他们没有相同的价值）。

Answer 2

直接使用堆栈指针引用局部变量可能是由于使用编译器优化引起的。我想你可以通过几种方式解决这个问题：

禁用帧指针优化（GCC中为-fno-omit-frame-pointer）;
在Clobbers中插入esp，以便编译器知道其值正在被修改（检查编译器的兼容性）。

Answer 3

不要将移动放入汇编代码中的ecx，而是直接将操作数放在ecx中：

awk '{for(i=1;i<NF;i++) printf "%s " $i,$NF}' file1.txt

具有堆栈操作的GCC内联汇编

3 个答案: