Question

嗨我已经拆解了一些程序（linux），我写的是为了更好地理解它是如何工作的，我注意到主要功能始终以：

lea    ecx,[esp+0x4] ; I assume this is for getting the adress of the first argument of the main...why ?
and    esp,0xfffffff0 ; ??? is the compiler trying to align the stack pointer on 16 bytes ???
push   DWORD PTR [ecx-0x4] ; I understand the assembler is pushing the return adress....why ?
push   ebp                
mov    ebp,esp
push   ecx  ;why is ecx pushed too ??

所以我的问题是：为什么所有这些工作都完成了？我只了解使用：

push   ebp                
mov    ebp,esp

其余对我来说似乎毫无用处......

Answer 1

我已经开始了：

;# As you have already noticed, the compiler wants to align the stack
;# pointer on a 16 byte boundary before it pushes anything. That's
;# because certain instructions' memory access needs to be aligned
;# that way.
;# So in order to first save the original offset of esp (+4), it
;# executes the first instruction:
lea    ecx,[esp+0x4]

;# Now alignment can happen. Without the previous insn the next one
;# would have made the original esp unrecoverable:
and    esp,0xfffffff0

;# Next it pushes the return addresss and creates a stack frame. I
;# assume it now wants to make the stack look like a normal
;# subroutine call:
push   DWORD PTR [ecx-0x4]
push   ebp
mov    ebp,esp

;# Remember that ecx is still the only value that can restore the
;# original esp. Since ecx may be garbled by any subroutine calls,
;# it has to save it somewhere:
push   ecx

Answer 2

这样做是为了使堆栈与16字节边界保持对齐。某些指令要求某些数据类型与16字节边界对齐。为了满足这一要求，GCC确保堆栈最初是16字节对齐的，并以16字节的倍数分配堆栈空间。这可以使用选项-mpreferred-stack-boundary=num进行控制。如果使用-mpreferred-stack-boundary = 2（对于2 ² = 4字节对齐），则不会生成此对齐代码，因为堆栈始终至少为4字节对齐。但是，如果您的程序使用任何需要更强对齐的数据类型，则可能会遇到麻烦。

根据gcc手册：

在Pentium和PentiumPro上，double和long double值应与8字节边界对齐（请参阅-malign-double）或遭受严重的运行时性能损失。在Pentium III上，如果不是16字节对齐，则流式SIMD扩展（SSE）数据类型__m128可能无法正常工作。

为确保在堆栈上正确对齐此值，堆栈边界必须与存储在堆栈中的任何值所需的对齐。此外，必须生成每个函数，使其保持堆栈对齐。因此，从使用较低优选堆栈边界编译的函数调用使用较高优选堆栈边界编译的函数很可能使堆栈不对齐。建议使用回调的库始终使用默认设置。

这种额外的对齐会消耗额外的堆栈空间，并且通常会增加代码大小。对堆栈空间使用敏感的代码（例如嵌入式系统和操作系统内核）可能希望将首选对齐减少为-mpreferred-stack-boundary = 2.

lea将原始堆栈指针（从调用main之前）加载到ecx，因为堆栈指针即将被修改。这用于两个目的：

访问main函数的参数，因为它们是相对于原始堆栈指针
从main

Answer 3

lea    ecx,[esp+0x4] ; I assume this is for getting the adress of the first argument of     the main...why ?
and    esp,0xfffffff0 ; ??? is the compiler trying to align the stack pointer on 16 bytes ???
push   DWORD PTR [ecx-0x4] ; I understand the assembler is pushing the return adress....why ?
push   ebp                
mov    ebp,esp
push   ecx  ;why is ecx pushed too ??

即使任意对齐的操作数，即使每条指令都能完美地工作而没有速度惩罚，对齐仍然会提高性能。想象一个循环引用一个16字节的数量，它只是重叠两个缓存行。现在，要将那个小wchar加载到缓存中，必须逐出两个整个缓存行，如果你需要它们在同一个循环中呢？缓存比RAM快得多，缓存性能始终至关重要。

此外，将未对齐的操作数移入寄存器通常会有速度损失。鉴于堆栈正在重新排列，我们自然必须保存旧的对齐，以遍历堆栈帧的参数并返回。

ecx是一个临时寄存器，因此必须保存。此外，根据优化级别，为了建立一个跟踪就绪的帧链，一些看起来不是运行程序所必需的帧链接操作可能非常重要。

试图理解gcc在main的顶部复杂的堆栈对齐，复制返回地址

3 个答案: