Question

使用-O2（或-O3）进行编译并运行此程序会在我的机器上产生有趣的结果。

#include <iostream>

using namespace std;

int main()
{
    // Pointer to an int in the heap with a value of 5
    int *p = new int(5);
    // Deallocate the memory, but keep a dangling pointer
    delete p;
    // Write 123 to deallocated space
    *p = 123;
    // Allocate a long int in the heap
    long *x = new long(456);

    // Print values and pointers
    cout << "*p: " << *p << endl;
    cout << "*x: " << *x << endl;
    cout << "p:  " << p << endl;
    cout << "x:  " << x << endl;

    cout << endl << "Changing nothing" << endl << endl;

    // Print again without changing anything
    cout << "*p: " << *p << endl;
    cout << "*x: " << *x << endl;
    cout << "p:  " << p << endl;
    cout << "x:  " << x << endl;

    return 0;
}

g ++ -O2 code.cc; ./a.out

*p: 123
*x: 456
p:  0x112f010
x:  0x112f010

Changing nothing

*p: 456
*x: 456
p:  0x112f010
x:  0x112f010

我正在做的是写入int指向的堆中的已解除分配的p，然后分配一个地址为x的长号。我的编译器始终将长整数放在与p相同的地址上 - ＆gt; x == p。现在当我取消引用p并打印它时，它保留了123的值，即使它已被长456重写。*x然后打印为456.甚至更奇怪的是，以后，不改变任何东西，打印相同的值会产生预期的结果。我认为这是一种优化技术，它只在打印值*p后使用时初始化* x，这可以解释它。然而，一个objdump说了别的。这是一个截断的评论objdump -d a.out：

00000000004008a0 <main>:
  4008a0:   41 54                   push   %r12
  4008a2:   55                      push   %rbp

Most likely the int allocation, where 0x4 is the size (4 bytes)
  4008a3:   bf 04 00 00 00          mov    $0x4,%edi
  4008a8:   53                      push   %rbx
  4008a9:   e8 e2 ff ff ff          callq  400890 <_Znwm@plt>

I have no idea what is going on here, but the pointer p is in 2 registers. Let's call the other one q.
q = p;
  4008ae:   48 89 c3                mov    %rax,%rbx

  4008b1:   48 89 c7                mov    %rax,%rdi

*p = 5;
  4008b4:   c7 00 05 00 00 00       movl   $0x5,(%rax)

delete p;
  4008ba:   e8 51 ff ff ff          callq  400810 <_ZdlPv@plt>

*q = 123;
  4008bf:   c7 03 7b 00 00 00       movl   $0x7b,(%rbx)

The long allocation and some other stuff (?). (8 bytes)
  4008c5:   bf 08 00 00 00          mov    $0x8,%edi
  4008ca:   e8 c1 ff ff ff          callq  400890 <_Znwm@plt>
  4008cf:   44 8b 23                mov    (%rbx),%r12d
  4008d2:   be e4 0b 40 00          mov    $0x400be4,%esi
  4008d7:   bf c0 12 60 00          mov    $0x6012c0,%edi

Initialization of the long before the printing
*p = 456;
  4008dc:   48 c7 00 c8 01 00 00    movq   $0x1c8,(%rax)

  4008e3:   48 89 c5                mov    %rax,%rbp

The printing
  4008e6:   e8 85 ff ff ff          callq  400870 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt>
........

现在，虽然*p初始化（long）已覆盖4008dc，但它仍然打印为123.

我希望我在这里有任何意义，谢谢你的帮助。

让自己明确：我试图弄清楚幕后发生了什么，编译器做了什么，以及为什么生成的编译代码与输出不对应。我知道这是不明确的行为，而且任何事情都可能发生。但这意味着编译器可以生成任何代码而不是CPU将编写指令。欢迎任何想法。

PS：别担心，我不打算在任何地方使用它;）

编辑：在我朋友的机器（OS X）上，即使进行优化，它也会产生预期的结果。

Answer 1

你过早地停止查看你的反汇编输出（或者至少你没有发布与你的问题相关的下几行）。他们可能看起来像：

movl    %r12d, %esi
movq    %rax, %rdi
call    _ZNSolsEi
movq    %rax, %rdi
call    _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_

rbx和r12是必须在Linux上GCC使用的x64 ABI中的函数调用之间保留的寄存器。分配long后，您会看到以下指令：

mov    (%rbx),%r12d

早先在指令流中使用rbx包括：

mov    %rax,%rbx       ; store the `p` pointer in `rbx`

...

movl   $0x7b,(%rbx)    ; store 123 where `p` pointed (even though it has been freed before)

... 

mov    (%rbx),%r12d    ; read that value - 123 - back and into `r12`

然后你会在我上面发布的片段中看到，这是反汇编并没有进入你的问题，并且对应于cout << "*p: " << *p << endl声明的一部分：

movl    %r12d, %esi    ; put 123 into `esi`, which is used to pass an argument to a function call

123被打印出来。

Answer 2

正如您所提到的，这可能是由于编译器强制执行的优化。如果使用-O0进行编译，那么它将为值打印456。当p被删除并且x被立即分配时，x将指向p所指向的相同地址（可能不是总是相同的情况，但在你的测试中最可能是这种情况）。因此，* p和* x应该取消相同的值。如果更改打印语句的顺序，则将始终为值打印456。我已经更改了代码中前两个cout语句的顺序，如下所示：

#include <iostream>

using namespace std;

int main()
{
    // Pointer to an int in the heap with a value of 5
    int *p = new int(5);
    // Deallocate the memory, but keep a dangling pointer
    delete p;
    // Write 123 to deallocated space
    *p = 123;
    // Allocate a long int in the heap
    long *x = new long(456);

    // Print values and pointers
    cout << "*x: " << *x << endl;
    cout << "*p: " << *p << endl;
    cout << "p:  " << p << endl;
    cout << "x:  " << x << endl;

    cout << endl << "Changing nothing" << endl << endl;

    // Print again without changing anything
    cout << "*p: " << *p << endl;
    cout << "*x: " << *x << endl;
    cout << "p:  " << p << endl;
    cout << "x:  " << x << endl;

    return 0;
}

Answer 3

即使您从编译器生成了汇编输出，您也无法在自己的源代码中找到答案或编译器对它做了什么。

未定义的是C-runtime内存分配器，它已经是已编译的二进制代码，它与您的测试应用程序相链接。调用new时，运行时库会决定指针的位置。不能保证new / delete / new意味着第二个new会给你相同的地址，它完全依赖于实现。

如果你真的想知道，那么你需要使用完整的源代码构建，包括新的源代码，然后阅读它是如何实现的和/或在调试器中逐步执行它以查看＆＃39继续。

C ++指针奇怪的未定义行为

3 个答案: