arm - ARM gcc内联汇编程序优化问题

GCC内联汇编程序对正确的规范非常敏感。

特别是，必须非常精确地指定正确的约束，以确保编译器不会决定“优化”汇编代码。有几点需要注意。举个例子。

以下两个：

    int myasmfunc(int arg)    /* definitely buggy ... */
    {
        register int myval asm("r2") = arg;

        asm ("add r1, r0, #22\n" ::: "r1");
        asm ("adds r0, r1, r0\n" ::: "r0", "cc");
        asm ("subeq r2, #123\n" ::: "r2");
        asm ("subne r2, #213\n" ::: "r2");
        return myval;
    }

和

    int myasmfunc(int arg)
    {
        int myval = arg, plus = arg;

        asm ("add %0, #22\n\t" : "+r"(plus));
        asm ("adds %1, %2\n\t"
             "subeq %0, #123\n\t"
             "subne %0, #213\n\t" : "+r"(myval), "+r"(plus) : "r"(arg) : "cc");
        return myval;
    }

乍一看可能看起来很相似，你会天真地认为他们会这样做;但他们离这个很远！

此代码的第一个版本存在多个问题。

例如，如果将其指定为单独的asm()语句，编译器可以自由地插入任意代码。这尤其意味着sub指令，即使它们本身不修改条件代码，也可能违反编译器选择插入的内容。
其次，由于在指定单独的asm()语句时分离了指令，因此无法保证代码生成器会选择相同的寄存器来myval同时放置asm("r2") r0尽管有变量声明中的spec。
第三，第一个假设asm()包含函数的参数是错误的;编译器到达汇编块时，可能会选择将此参数移动到其他任何位置。更糟糕的是，即使你再次拥有split语句，也不能保证两个__asm__ __volatile__(...);之间发生的事情。即使您指定myval，编译器也会将两个这样的块视为独立实体。

第四，你并没有告诉编译器你正在破坏/分配gcc -c tst.c。它可能会选择临时移动到其他地方，因为你正在破坏“r2”并且在返回时，决定从......恢复它（。???）。

只是为了它的乐趣，这是第一个函数的输出，对于以下四种情况：

默认 - gcc -O8 -c tst.c

已优化 - gcc -c -finstrument-functions tst.c

使用一些不寻常的选项 - gcc -c -O8 -finstrument-functions tst.c

加上优化 - gcc -c -O8 ...

Disassembly of section .text: 00000000 : 0: e52db004 push {fp} ; (str fp, [sp, #-4]!) 4: e28db000 add fp, sp, #0 ; 0x0 8: e24dd00c sub sp, sp, #12 ; 0xc c: e50b0008 str r0, [fp, #-8] 10: e51b2008 ldr r2, [fp, #-8] 14: e2811016 add r1, r1, #22 ; 0x16 18: e0910000 adds r0, r1, r0 1c: 0242207b subeq r2, r2, #123 ; 0x7b 20: 124220d5 subne r2, r2, #213 ; 0xd5 24: e1a03002 mov r3, r2 28: e1a00003 mov r0, r3 2c: e28bd000 add sp, fp, #0 ; 0x0 30: e8bd0800 pop {fp} 34: e12fff1e bx lr Disassembly of section .text: 00000000 : 0: e1a03000 mov r3, r0 4: e2811016 add r1, r1, #22 ; 0x16 8: e0910000 adds r0, r1, r0 c: 0242207b subeq r2, r2, #123 ; 0x7b 10: 124220d5 subne r2, r2, #213 ; 0xd5 14: e1a00003 mov r0, r3 18: e12fff1e bx lr Disassembly of section .text: 00000000 : 0: e92d4830 push {r4, r5, fp, lr} 4: e28db00c add fp, sp, #12 ; 0xc 8: e24dd008 sub sp, sp, #8 ; 0x8 c: e1a0500e mov r5, lr 10: e50b0010 str r0, [fp, #-16] 14: e59f0038 ldr r0, [pc, #56] ; 54 18: e1a01005 mov r1, r5 1c: ebfffffe bl 0 20: e51b2010 ldr r2, [fp, #-16] 24: e2811016 add r1, r1, #22 ; 0x16 28: e0910000 adds r0, r1, r0 2c: 0242207b subeq r2, r2, #123 ; 0x7b 30: 124220d5 subne r2, r2, #213 ; 0xd5 34: e1a04002 mov r4, r2 38: e59f0014 ldr r0, [pc, #20] ; 54 3c: e1a01005 mov r1, r5 40: ebfffffe bl 0 44: e1a03004 mov r3, r4 48: e1a00003 mov r0, r3 4c: e24bd00c sub sp, fp, #12 ; 0xc 50: e8bd8830 pop {r4, r5, fp, pc} 54: 00000000 .word 0x00000000 Disassembly of section .text: 00000000 : 0: e92d4070 push {r4, r5, r6, lr} 4: e1a0100e mov r1, lr 8: e1a05000 mov r5, r0 c: e59f0028 ldr r0, [pc, #40] ; 3c 10: e1a0400e mov r4, lr 14: ebfffffe bl 0 18: e2811016 add r1, r1, #22 ; 0x16 1c: e0910000 adds r0, r1, r0 20: 0242207b subeq r2, r2, #123 ; 0x7b 24: 124220d5 subne r2, r2, #213 ; 0xd5 28: e59f000c ldr r0, [pc, #12] ; 3c 2c: e1a01004 mov r1, r4 30: ebfffffe bl 0 34: e1a00005 mov r0, r5 38: e8bd8070 pop {r4, r5, r6, pc} 3c: 00000000 .word 0x00000000

正如你所看到的， 这些都没有你希望看到的东西;但是，asm()上的代码的第二个版本最终为：

Disassembly of section .text: 00000000 : 0: e1a03000 mov r3, r0 4: e2833016 add r3, r3, #22 ; 0x16 8: e0933000 adds r3, r3, r0 c: 0240007b subeq r0, r0, #123 ; 0x7b 10: 124000d5 subne r0, r0, #213 ; 0xd5 14: e12fff1e bx lr

而且，更确切地说，就是你在装配中指定的内容以及你期望的内容。

士气：明确而精确地使用约束，操作数赋值，并在相同的 {{1}}块中保持相互依赖的汇编行（制作多行语句）。

ARM gcc内联汇编程序优化问题

2 个答案: