Question

在IDA生成的反编译代码中，我看到如下表达式：

malloc(20 * c | -(20 * (unsigned __int64)(unsigned int)c >> 32 != 0))
malloc(6  * n | -(3  * (unsigned __int64)(unsigned int)(2 * n) >> 32 != 0))

有人可以解释这些计算的目的吗？
c和n是int（有符号整数）值。

更新
原始C ++代码是用MSVC编译的32位平台。
这是上面第二行反编译C代码的汇编代码（malloc（6 * ..））：

mov     ecx, [ebp+pThis]
mov     [ecx+4], eax
mov     eax, [ebp+pThis]
mov     eax, [eax]
shl     eax, 1
xor     ecx, ecx
mov     edx, 3
mul     edx
seto    cl
neg     ecx
or      ecx, eax
mov     esi, esp
push    ecx             ; Size
call    dword ptr ds:__imp__malloc

Answer 1

由于此代码是从ASM反编译的，因此我们只能猜测它实际上做了什么。

让我们首先对其进行格式化以便确定优先级：

malloc(20 * c | -(20 * (unsigned __int64)(unsigned int)c >> 32 != 0))
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                       //this is first evaluated, promoting c to 
                       //64 bit unsigned int without doing sign
                       //extension, regardless the type of c

malloc(20 * c | -(20 * (uint64_t)c >> 32 != 0))
                  ^^^^^^^^^^^^^^^^
                  //then, multiply by 20, with uint64 result

malloc(20 * c | -(20 * (uint64_t)c >> 32 != 0))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
                  //if 20c is greater than 2^32-1, then result is true,
                  //use -1 to generate a mask of 0xffffffff,
                  //bitwise operator | then masks 20c to 0xffffffff 
                  //(2^32-1, the maximum of size_t, input type to malloc)
                  //regardless what 20c actually is

                  //if 20c is smaller than 2^32-1, then result is false,
                  //the mask is 0, bitwise operator | keeps the final    
                  //input to malloc as 20c untouched

20和6是什么？

那些可能来自于常见用法 malloc(sizeof(Something)*count)。对malloc的这两次调用可能是sizeof(Something)和sizeof(SomethingElse)在编译时评估为20和6。

那么这段代码实际上是做什么的：

我的猜测是，它试图阻止sizeof(Something)*count溢出并导致malloc成功并在使用内存时导致缓冲区溢出。

通过使用64位无符号int评估产品并针对2^32-1进行测试，当大小大于2^32-1时，malloc的输入设置为非常大使其保证失败的值（没有32位系统可以分配2 ^ 32-1个字节的内存）。

Answer 2

我猜测原始源代码使用C ++ new运算符来分配数组并使用Visual C ++编译。由于user3528438的答案表明此代码旨在防止溢出。具体来说，它是一个32位无符号饱和乘法。如果乘法的结果大于4,294,967,295（32位无符号数的最大值），则结果被钳位或“饱和”到该最大值。

自Visual Studio 2005起，Microsoft的C ++编译器具有generated code to protect against overflows。例如，我可以通过使用Visual C ++编译以下内容来生成可以反编译成示例的汇编代码：

#include <stdlib.h>

void *
operator new[](size_t n) {
        return malloc(n);
}

struct S {
        char a[20];
};

struct T {
        char a[6];
};

void
foo(int n, S **s, T **t) {
        *s = new S[n];
        *t = new T[n * 2];
}

使用Visual Studio 2015编译器生成以下汇编代码：

    mov esi, DWORD PTR _n$[esp]
    xor ecx, ecx
    mov eax, esi
    mov edx, 20                 ; 00000014H
    mul edx
    seto    cl
    neg ecx
    or  ecx, eax
    push    ecx
    call    _malloc
    mov ecx, DWORD PTR _s$[esp+4]
; Line 19
    mov edx, 6
    mov DWORD PTR [ecx], eax
    xor ecx, ecx
    lea eax, DWORD PTR [esi+esi]
    mul edx
    seto    cl
    neg ecx
    or  ecx, eax
    push    ecx
    call    _malloc

大多数反编译表达式实际上只是处理一个汇编语句。如果先前的MUL指令溢出，则汇编指令seto cl将CL设置为1，否则将CL设置为0.类似地，如果20 * (unsigned __int64)(unsigned int)c >> 32 != 0的结果溢出，则表达式20 * c的计算结果为1，并进行求值否则为0。

如果此溢出保护不存在且20 * c的结果确实溢出，那么对malloc的调用可能会成功，但分配的内存比程序预期的少得多。然后程序可能会写入超过实际分配的内存的末尾并丢弃其他内存。这相当于缓冲区溢出，可能被黑客利用。

Answer 3

~~它四舍五入到最近的区块大小。~~

原谅我。它正在做的是计算c的倍数，同时检查负值（溢出）：

#include <iostream>
#include <cstdint>


size_t foo(char c)
{
    return 20 * c | -(20 * (std::uint64_t)(unsigned int)c >> 32 != 0);
}

int main()
{
    using namespace std;
    for (char i = -4 ; i < 4 ; ++i)
    {
        cout << "input is: " << int(i) << ", result is " << foo(i) << endl;
    }
    return 0;
}

结果：

input is: -4, result is 18446744073709551615
input is: -3, result is 18446744073709551615
input is: -2, result is 18446744073709551615
input is: -1, result is 18446744073709551615
input is: 0, result is 0
input is: 1, result is 20
input is: 2, result is 40
input is: 3, result is 60

对我来说，号码18446744073709551615一目了然并不重要。只有在看到它以十六进制表示之后我才去了＃啊;啊＆＃34;。 - Jongware

添加＆lt;＆lt;十六进制：

input is: -1, result is ffffffffffffffff

Answer 4

有人可以解释这些计算的目的吗？

重要的是要理解编译会改变代码的语义。原始代码的许多未指定的行为由编译过程指定。

IDA不知道生成的汇编代码恰好是否重要。为了安全起见，它试图完美地复制汇编代码的行为，即使在使用代码的方式不可能发生的情况下也是如此。

在这里，IDA可能正在复制类型转换恰好在此平台上具有的溢出特性。它不能仅复制原始C代码，因为原始C代码可能对c或n的某些值（可能是负值）具有未指定的行为。

例如，假设我写了这个C代码：int f(unsigned j) { return j; }。我的编译器可能会将其转换为非常简单的汇编代码，为我的平台恰好提供的j的负值提供任何行为。

但是，如果您对生成的程序集进行反编译，则无法将其反编译为int f(unsigned j) { return j; }，因为这与我的汇编代码在具有不同溢出行为的平台上的行为不同。这可以编译为代码（在其他平台上），返回的值不同于我的汇编代码对j的负值的返回值。

因此，将C代码反编译为原始代码通常是不可能的（实际上是不正确的），它通常会使这些类型的代码能够轻松地复制这个平台的行为＆＃34;古怪。

有人可以解释malloc的含义（20 * c | - （20 *（unsigned __int64）（unsigned int）c＆gt;＆gt; 32！= 0））

4 个答案: