我已经实现了一种在Linux OS中不同服务器之间迁移TCP套接字的机制。迁移机制完美运行,除非导入服务器需要关闭服务器时整个机器冻结。我能够使用串行连接转储机器的日志文件,如本文Debugging Linux Kernel中所述。从读取日志文件,我看到tcp_time_wait
函数中发生了NULL指针解除引用问题。那么我现在的问题现在我需要知道如何找出tcp_time_wait
中的哪个数据成员或宏导致问题?
我应该用一些printk重新编译内核以检测确切的错误来源吗?
Linux内核转储:
[ 225.717049] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[ 225.720544] IP: [<ffffffff8162056c>] __inet_twsk_hashdance+0x8c/0x160
[ 225.720544] PGD 7bbf9067 PUD 7b7fc067 PMD 0
[ 225.720544] Oops: 0000 [#1] SMP
[ 225.720544] Modules linked in: sockmi(OF) nf_conntrack(F) vesafb(F) vboxsf(OF) snd_int)
[ 225.720544] CPU 0
[ 225.720544] Pid: 0, comm: swapper/0 Tainted: GF W O 3.8.0-29-generic #42~precisx
[ 225.720544] RIP: 0010:[<ffffffff8162056c>] [<ffffffff8162056c>] __inet_twsk_hashdance0
[ 225.720544] RSP: 0018:ffff88007fc039d0 EFLAGS: 00010282
[ 225.720544] RAX: 0000000000000000 RBX: ffff88005cbd5000 RCX: ffffffff81e304c0
[ 225.720544] RDX: 0000000000001c37 RSI: 0000000000000082 RDI: 0000000000000009
[ 225.720544] RBP: ffff88007fc03a00 R08: 000000000000000a R09: 0000000000000000
[ 225.720544] R10: 000000000000021a R11: 0000000000000219 R12: ffff88007ba4b800
[ 225.720544] R13: ffffc9000035a7d0 R14: ffffc90000322500 R15: ffff88007c1e3d40
[ 225.720544] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:000000000000000
[ 225.720544] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 225.720544] CR2: 0000000000000020 CR3: 000000007965f000 CR4: 00000000000006f0
[ 225.720544] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 225.720544] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 225.720544] Process swapper/0 (pid: 0, threadinfo ffffffff81c00000, task ffffffff81c15)
[ 225.720544] Stack:
[ 225.720544] ffff88007ba4b800 ffff88007ba4b800 ffff88005cbd5000 0000000000000000
[ 225.720544] 000000000000001f 0000000000000000 ffff88007fc03a50 ffffffff8163de3c
[ 225.720544] ffff8800000000d9 000000065cb8b300 ffff88003652e262 ffff88007ba4b800
[ 225.720544] Call Trace:
[ 225.720544] <IRQ>
[ 225.720544] [<ffffffff8163de3c>] tcp_time_wait+0x1bc/0x290
[ 225.720544] [<ffffffff8162d4fe>] tcp_fin+0x10e/0x1c0
[ 225.720544] [<ffffffff8162e078>] tcp_data_queue+0x3e8/0x580
[ 225.720544] [<ffffffff81631bb5>] tcp_rcv_state_process+0x2b5/0x6b0
[ 225.720544] [<ffffffff8163abc7>] tcp_v4_do_rcv+0xc7/0x220
[ 225.720544] [<ffffffff8163c829>] tcp_v4_rcv+0x569/0x830
[ 225.720544] [<ffffffff81616366>] ip_local_deliver_finish+0xe6/0x280
[ 225.720544] [<ffffffff8161668a>] ip_local_deliver+0x4a/0x90
[ 225.720544] [<ffffffff81616039>] ip_rcv_finish+0x119/0x360
[ 225.720544] [<ffffffff816168ed>] ip_rcv+0x21d/0x300
[ 225.720544] [<ffffffff815e355a>] __netif_receive_skb+0x5fa/0x760
[ 225.720544] [<ffffffff8163cbae>] ? tcp4_gro_receive+0x9e/0x110
[ 225.720544] [<ffffffff815e36e3>] netif_receive_skb+0x23/0x90
[ 225.720544] [<ffffffff815e3e28>] napi_gro_receive+0xe8/0x140
[ 225.906423] [<ffffffffa0002f98>] e1000_clean_rx_irq+0x2b8/0x520 [e1000]
[ 225.906423] [<ffffffff8144df56>] ? credit_entropy_bits.part.7+0x176/0x1d0
[ 225.906423] [<ffffffffa0004801>] e1000_clean+0x51/0xc0 [e1000]
[ 225.906423] [<ffffffff815e4cd4>] net_rx_action+0x134/0x260
[ 225.906423] [<ffffffff81045136>] ? native_safe_halt+0x6/0x10
[ 225.906423] [<ffffffff81062620>] __do_softirq+0xc0/0x240
[ 225.906423] [<ffffffff8103cca2>] ? ack_apic_level+0x72/0x130
[ 225.906423] [<ffffffff816fdd5c>] call_softirq+0x1c/0x30
[ 225.906423] [<ffffffff81016775>] do_softirq+0x65/0xa0
[ 225.906423] [<ffffffff810628fe>] irq_exit+0x8e/0xb0
[ 225.906423] [<ffffffff816fe5f3>] do_IRQ+0x63/0xe0
[ 225.906423] [<ffffffff816f406d>] common_interrupt+0x6d/0x6d
[ 225.906423] <EOI>
[ 225.906423] [<ffffffff81084008>] ? hrtimer_start+0x18/0x20
[ 225.906423] [<ffffffff81045136>] ? native_safe_halt+0x6/0x10
[ 225.906423] [<ffffffff8101cc33>] default_idle+0x53/0x1f0
[ 225.906423] [<ffffffff8101dad9>] cpu_idle+0xd9/0x120
[ 225.906423] [<ffffffff816c6db2>] rest_init+0x72/0x80
[ 225.906423] [<ffffffff81d05c4f>] start_kernel+0x3d1/0x3de
[ 225.906423] [<ffffffff81d057ff>] ? pass_bootoption.constprop.2+0xd3/0xd3
[ 225.906423] [<ffffffff81d05397>] x86_64_start_reservations+0x131/0x135
[ 225.906423] [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120
[ 225.906423] [<ffffffff81d05468>] x86_64_start_kernel+0xcd/0xdc
[ 225.906423] Code: 49 c1 e5 04 4c 03 6a 18 4c 89 ef e8 2f 35 0d 00 49 8b 84 24 60 03 00
[ 225.906423] RIP [<ffffffff8162056c>] __inet_twsk_hashdance+0x8c/0x160
[ 225.906423] RSP <ffff88007fc039d0>
[ 225.906423] CR2: 0000000000000020
[ 226.111238] ---[ end trace 18ecfe1006daa681 ]---
[ 226.111817] Kernel panic - not syncing: Fatal exception in interrupt