Question

import std.range : cycle;
void foo() pure @safe {
    cycle([1, 2]);
}

今天我遇到了用D语言编写的程序。我试图了解它的汇编代码，从一个简单的函数开始。

来自the asm output on the D compiler explorer：

pure nothrow @nogc @safe std.range.Cycle!(int[]).Cycle std.range.cycle!(int[]).cycle(int[]):
 push   rbp
 mov    rbp,rsp
 sub    rsp,0x40
 mov    QWORD PTR [rbp-0x20],rdi
 mov    QWORD PTR [rbp-0x10],rsi
 mov    QWORD PTR [rbp-0x8],rdx
 ... rest of the function

我已经尝试过多次阅读，但无法理解原因 std.range.cycle()获得3个参数（RDI，RSI和RDX），或者我的范围是（[1, 2]）。它不是一个类似C的结构？

或者我错过了什么？

Answer 1

看起来你正在使用x86-64 SystemV ABI，基于rdi和rsi进行arg传递，因为Windows 64位ABI使用不同的regs。有关ABI文档的链接，请参阅x86代码Wiki，或参阅the current revision here。

通过值传递的小对象（如结构）进入多个整数寄存器。按值返回大对象（超过128位）也会使用隐藏指针来指向调用者分配的空间，而不是打包到RDX：RAX中。这就是你的功能中发生的事情。

基于asm和docs，我认为Cycle对象有三个值：start，end和index。我根本不知道D，但这是有道理的。因为它们都是64位，所以它太大而不适合RDX：RAX，所以它是由隐藏指针返回的。

进入Cycle（）的arg-passing注册：

RDI：指向返回值的“隐藏”指针（这是三个64位整数的结构）
RSI：Range arg的第一个成员（我称之为range_start）
RDX：Range arg的第二个成员（我称之为range_end）

我启用了优化以获得更具可读性的asm而没有太多噪音，但遗憾的是，看起来这个D编译器比clang或gcc要复杂得多。使用-O -release -inline（根据this page的推荐），它仍会对堆栈进行存储/重新加载 lot 。

pure nothrow @nogc @safe std.range.Cycle!(int[]).Cycle std.range.cycle!(int[]).cycle(int[]):
 sub    rsp,0x28
 mov    QWORD PTR [rsp+0x20],rdi        # hidden first arg (return-value pointer).
 mov    QWORD PTR [rsp+0x8],0x0         # totally useless: overwritten without read

 mov    QWORD PTR [rsp+0x10],0x0        # totally useless: same.

 mov    QWORD PTR [rsp+0x8],rsi         # first "real" arg
 mov    QWORD PTR [rsp+0x10],rdx        # second "real" arg
 xor    eax,eax
 xor    edx,edx                         # zero rax:rdx.  Perhaps from the index=0 default when you only use one arg?
 div    QWORD PTR [rsp+0x8]             # divide 0 by first arg of the range.
 mov    QWORD PTR [rsp+0x18],rdx        # remainder of (index / range_start), I guess.
 lea    rsi,[rsp+0x8]                   # RSI=pointer to where range_start, range_end, and index/range_start were stored on the stack.
 movs   QWORD PTR es:[rdi],QWORD PTR ds:[rsi]  # copy to the dst buffer.  A smart compiler would have stored there in the first place, instead of to local scratch and then copying.
 movs   QWORD PTR es:[rdi],QWORD PTR ds:[rsi]  # movs is not very efficient, this is horrible code.
 movs   QWORD PTR es:[rdi],QWORD PTR ds:[rsi]
 mov    rax,QWORD PTR [rsp+0x20]        # mov rax, rdi  before those MOVS instructions would have been much more efficient.
 add    rsp,0x28
 ret

ABI要求返回大对象的函数返回RAX中的隐藏指针，因此调用者不必单独保留指向返回缓冲区的指针的副本。这就是函数设置RAX的原因。

一个好的编译器可以做到这一点：

std.range.Cycle...:
   mov    [rdi], rsi           # cycle_start
   mov    [rdi+0x8], rdx       # cycle_end
   mov    [rdi+0x10], 0        # index
   mov    rax, rdi
   ret

或者只是简单地完成了对Cycle的调用，因为它是微不足道的。实际上，我认为它内联到foo（）中，但仍然会发出一个独立的cycle（）定义。

我们无法分辨哪两个函数foo()调用，因为编译器资源管理器似乎在不解析符号的情况下反汇编.o（而不是链接的二进制文件）。因此，调用偏移量为00 00 00 00，即链接器的占位符。但它可能正在调用内存分配函数，因为它使用esi = 2和edi = 0进行调用。（Using mov edi, 0 in optimizing release mode! Yuck!）。调用目标显示为下一条指令，因为这是call's rel32 displacement计数的地方。

希望LDC或GDC做得更好，因为它们基于现代优化后端（LLVM和gcc），但您链接的编译器 - 资源管理器站点没有安装这些编译器。如果有另一个基于Matt Godbolt's compiler explorer code的网站，但是有其他D编译器，那就太酷了。

Dlang - 在装配中低估std.cycle（）

1 个答案: