现在我在linux-3.0上遇到了一个关于内核恐慌的问题,看起来RSP寄存器错误地减去了8个字节。所以我不能判断它是一个CPU错误还是内核错误。但是我查看了do page_fault的汇编代码,没有找到任何代码-8 for rsp.hope你可以给我一些想法。谢谢!
BTW:这个问题难以复制,只能在一台x86机器上实现。(1)对于AMD64,r12-r15和rbx,rbp是被调用者保存寄存器,当调用do_page_fault时,在此函数中,它们将被保存。 堆栈如下:
00007f48c91c1000(r11)
0000000000000000(rbx)
00007ffc0f907bb0(rbp)
00007f48c9558000(r12)
00007f48c91c9708(r13)
00007f48ca168500(r14)
00007f48ca168500(r15)(caller save)
ffffffff81461fc5(page_fault+0x25/0x30)* (return address)
00007f48ca168500(r15) (callee save)
00007f48ca168500(r14)
00007f48c91c9708(r13)
00007f48c9558000(r12)
00007ffc0f907bb0(rbp)
00007f48c91e8598(rbx)
(2)但是当do_page_fault结束时,我得到一个错误的返回值,它应该向RIP弹出“page_fault + 0x25 / 0x30”,但它似乎弹出“00007f48ca168500(r15)”到RIP并导致此跟随OOP,它似乎RSP寄存器在do_page_fault函数中被错误地减去了8个字节:
<6>[29205.617769] ovs-vsctl[33927]: segfault at 7f48c9558000 ip 00007f48c9f62285 sp 00007ffc0f907ad0 error 6 in ld-2.11.3.so[7f48c9f57000+1f000]
<1>[29205.617808] BUG: unable to handle kernel paging request at 00007f48ca168500
<1>[29205.621539] IP: [<00007f48ca168500>] 0x7f48ca1684ff
<4>[29205.621539] PGD 3f76860067 PUD 32cf7f7067 PMD 2afce53067 PTE 800000375422e067
<1>[29205.621539] Thread overran stack, or stack corrupted
<0>[29205.621539] Oops: 0011 [#1] SMP
<4>[29205.621539] Inexact backtrace:
<4>[29205.621539]
<4>[29205.621539] CPU 43
<4>[29205.621539] Supported: No, Unsupported modules are loaded
<4>[29205.621539]
<4>[29205.621539] Pid: 33927, comm: ovs-vsctl Tainted: GF NX 3.0.93-0.8-default #1 xxxxx
<4>[29205.621539] RIP: 0010:[<00007f48ca168500>] [<00007f48ca168500>] 0x7f48ca1684ff
<4>[29205.621539] RSP: 0000:ffff882b370adf50 EFLAGS: 00010286
<4>[29205.621539] RAX: 0000000000000000 RBX: 00007f48c91c0000 RCX: ffff883fb99c03c0
<4>[29205.621539] RDX: 0000000000000000 RSI: 0000000000000286 RDI: 0000000000000286
<4>[29205.621539] RBP: 0000000000000000 R08: 0000000000000020 R09: 0000000000000000
<4>[29205.621539] R10: 0000000000000006 R11: 000000000000004a R12: 00007ffc0f907bb0
<4>[29205.621539] R13: 00007f48c9558000 R14: 00007f48c91c9708 R15: 00007f48ca168500
<4>[29205.621539] FS: 00007f48ca163c00(0000) GS:ffff88407f3e0000(0000) knlGS:0000000000000000
<4>[29205.621539] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[29205.621539] CR2: 00007f48ca168500 CR3: 0000002b3ce96000 CR4: 00000000001427e0
<4>[29205.621539] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[29205.621539] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[29205.621539] Process ovs-vsctl (pid: 33927, threadinfo ffff882b370ac000, task ffff883fb99c03c0)
<0>[29205.621539] Stack:
<4>[29205.621539] ffffffff81461fc5 00007f48ca168500(r15) 00007f48ca168500(r14) 00007f48c91c9708(r13)
<4>[29205.621539] 00007f48c9558000(r12) 00007ffc0f907bb0(rbp) 00007f48c91e8598(rbx) 00007f48c91c1000(r11)
<4>[29205.621539] 00007f48c955ed60(r10) 0000000000000001(r9) 00007f48c91c5cb8(r8) 0000000000000007
<0>[29205.621539] Call Trace:
<0>[29205.621539] Inexact backtrace:
<0>[29205.621539]
<4>[29205.621539] [<ffffffff81461fc5>] ? page_fault+0x25/0x30
<0>[29205.621539] Code: Bad RIP value.
<1>[29205.621539] RIP [<00007f48ca168500>] 0x7f48ca1684ff
<4>[29205.621539] RSP <ffff882b370adf50>
<0>[29205.621539] CR2: 00007f48ca168500
Page_fault调用序列:
ffffffff81461fa0 <page_fault>:
ffffffff81461fa0: ff 15 ca aa 5b 00 callq *0x5baaca(%rip) # ffff:
ffff81a1ca70 <pv_irq_ops+0x30>
ffffffff81461fa6: 48 83 ec 78 sub $0x78,%rsp
ffffffff81461faa: e8 b1 01 00 00 callq ffffffff81462160 <error_entry>
ffffffff81461faf: 48 89 e7 mov %rsp,%rdi
ffffffff81461fb2: 48 8b 74 24 78 mov 0x78(%rsp),%rsi
ffffffff81461fb7: 48 c7 44 24 78 ff ff movq $0xffffffffffffffff,0x78(%rsp)
ffffffff81461fbe: ff ff
ffffffff81461fc0: e8 6b 32 00 00 callq ffffffff81465230 <do_page_fault>
ffffffff81461fc5: e9 46 02 00 00 jmpq ffffffff81462210 <error_exit> ffffffff81461fc5(page_fault+0x25/0x30)
ffffffff81461fca: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)