Question

我试图看到一个非常简单的循环的IR

for (int i = 0; i < 15; i++){
  a[b[i]]++;
}

在使用-O0进行编译并潜入.ll文件时，我可以看到define i32 @main()函数中逐步写入的指令。但是，在使用-O2编译并查看.ll文件时，ret i32 0函数中只有define i32 @main()。由-O0编译的.ll文件中显示的一些call指令在由-O2编译的.ll文件中更改为tail call。

任何人都可以对llvm如何进行-O2编译做一个相当详细的解释吗？感谢。

Ť

Answer 1

我们可以使用godbolt.org处的编译器资源管理器来查看您的示例。我们将使用以下测试平台代码：

int test() {
  int a[15] = {0};
  int b[15] = {0};

  for (int i = 0; i < 15; i++){
    a[b[i]]++;
  }

  return 0;
}

Godbolt展示了x86程序集，而不是LLVM字节码，但我已经对它进行了总结，以显示正在发生的事情。这是-O0 -m32：

test():                              
        # set up stack
.LBB0_1:                                
        cmp     dword ptr [ebp - 128], 15           # i < 15?
        jge     .LBB0_4                             # no? then jump out of loop
        mov     eax, dword ptr [ebp - 128]          # load i
        mov     eax, dword ptr [ebp + 4*eax - 124]  # load b[i]
        mov     ecx, dword ptr [ebp + 4*eax - 64]   # load a[b[i]]
        add     ecx, 1                              # increment it
        mov     dword ptr [ebp + 4*eax - 64], ecx   # store it back
        mov     eax, dword ptr [ebp - 128]
        add     eax, 1                              # increment i
        mov     dword ptr [ebp - 128], eax
        jmp     .LBB0_1                             # repeat
.LBB0_4:
        # tear down stack
        ret

这看起来像我们期望的那样：循环清晰可见，它完成了我们列出的所有步骤。如果我们在-O1 -m32 -march=i386进行编译，我们会看到循环仍然存在，但它更简单：

test():                               # @test()
        # set up stack
.LBB0_1:                               
        mov     ecx, dword ptr [esp + 4*eax]    # load b[i]
        inc     dword ptr [esp + 4*ecx + 60]    # increment a[b[i]]
        inc     eax                             # increment i
        cmp     eax, 15                         # compare == 15
        jne     .LBB0_1                         # no? then loop
        # tear down stack
        ret

Clang现在使用inc指令（有用），注意到它可以使用eax寄存器作为循环计数器i（整齐），并将条件检查移到底部循环（可能更好）。不过，我们仍然可以识别我们的原始代码。现在让我们试试-O2 -m32 -march=i386：

test():                               
        xor     eax, eax    # does nothing
        ret

那是什么？是。

clang检测到a数组永远不能在函数外部使用。这意味着进行递增绝不会影响程序的任何其他部分 - 而且当它消失时，没有人会错过它。

删除增量会留下一个空for循环，没有副作用，也可以删除。反过来，删除循环会留下（用于所有意图和目的）空函数。

这个空函数可能就是你在LLVM字节码（ret i32 0）中看到的。

这不是一个非常科学的描述，clang所采取的步骤可能会有所不同，但我希望这个例子能够清除它。如果您愿意，可以阅读as-if rule。我还建议你在https://godbolt.org/上玩一下：例如，看看当你将a和b移到函数外时会发生什么。

llvm如何进行O2优化

1 个答案: