CPU在SMP中停止

时间:2013-07-25 05:03:18

标签: linux embedded-linux powerpc smp

            We are facing an issue on which we need some help.

简要说明:

            We have enabled SMP in Linux 2.6.39.4 kernel and cross compiled it for PPC-476. After booting, kernel is able to map both the processors (2 cores at h/w). The problem we are facing is, while running modprobe command repeatedly,  one of the cpu goes into stall state. We have tried to dump stack of all active cpus (using sysrq) while one of the cpu is in stall state. The stack dump showed both the processors were executing same process (with same PID) i.e. modprobe.

问题: 1.两个处理器是否可以在具有相同PID的运动中执行相同的过程。 2.两个进程的执行是否同时引起一些竞争条件,导致CPU进入停顿状态。

日志============================================== =

SysRq : Show backtrace of all active CPUs
CPU0:
NIP: 701786c4 LR: 701752f0 CTR: 00000004
REGS: 9fb4fdc0 TRAP: 0501   Not tainted  (2.6.39.4)
MSR: 00029000 <EE,ME,CE>  CR: 44002048  XER: 00000000
TASK = 8f868ae0[827] 'modprobe' THREAD: 9fb48000 CPU: 0
GPR00: 08101820 9fb4fe70 8f868ae0 22222222 0002374d 00000002 00849ffc 00000000
GPR08: a2bb3a8c 00000810 a2c2aac4 00000148 44002088
NIP [701786c4] __sw_hweight32+0x50/0x58
LR [701752f0] __bitmap_weight+0x54/0xc0
Call Trace:
[9fb4fe70] [20000000] 0x20000000 (unreliable)
[9fb4fe90] [7006b1cc] sys_init_module+0x11a8/0x1ca4
[9fb4ff40] [7000f1b8] ret_from_syscall+0x0/0x3c
--- Exception: c01 at 0x10050e38
    LR = 0x100a6708
Instruction dump:
7c634838 7c004838 7c001a14 5409e13e 7c090214 3d200f0f 61290f0f 7c004838
5409c23e 7c090214 5409843e 7c090214 <5403063e> 4e800020 5460f87e 70005555
CPU1:
NIP: 700a9678 LR: 700a964c CTR: 7012b0f8
REGS: 9fb4fe30 TRAP: 0501   Not tainted  (2.6.39.4)
MSR: 00029000 <EE,ME,CE>  CR: 80008022  XER: 20000000
TASK = 8f868ae0[827] 'modprobe' THREAD: 9fb48000 CPU: 1
GPR00: 00000000 9fb4fee0 8f868ae0 9efb9f20 9efb9f20 9fb4fee8 1004d6d0 0002d000
GPR08: 9efb9d68 9f8eee00 9efb9f20 00000000 20002022
NIP [700a9678] do_munmap+0x114/0x314
LR [700a964c] do_munmap+0xe8/0x314
Call Trace:
[9fb4fee0] [00000014] 0x14 (unreliable)
[9fb4ff20] [700aa7c8] sys_munmap+0x44/0x74
[9fb4ff40] [7000f1b8] ret_from_syscall+0x0/0x3c
--- Exception: c01 at 0x100510a8
    LR = 0x10022088
Instruction dump:
4bffe461 7c641b79 41820010 80040004 7f9d0040 419d01d4 83210008 2e190000
41920200 83d9000c 801f0074 2f800000 <419e00f8> 2f9e0000 419e00f0 801e0004
Call Trace:
[9ffaff00] [70008654] show_stack+0x6c/0x1a4 (unreliable)
[9ffaff40] [70192d30] showacpu+0x84/0xcc
[9ffaff60] [700679bc] generic_smp_call_function_single_interrupt+0x100/0x18c
[9ffaff90] [7000ff6c] call_function_single_action+0x10/0x24
[9ffaffa0] [7006f7f4] handle_irq_event_percpu+0xa0/0x21c
[9ffaffe0] [700728d8] handle_percpu_irq+0x88/0xb8
[9ffafff0] [7000e038] call_handle_irq+0x18/0x28
[9fb4fdf0] [700044b0] do_IRQ+0xe8/0x1a0
[9fb4fe20] [7000f81c] ret_from_except+0x0/0x18
--- Exception: 501 at do_munmap+0x114/0x314
    LR = do_munmap+0xe8/0x314
[9fb4fee0] [00000014] 0x14 (unreliable)
[9fb4ff20] [700aa7c8] sys_munmap+0x44/0x74
[9fb4ff40] [7000f1b8] ret_from_syscall+0x0/0x3c
--- Exception: c01 at 0x100510a8

=============================================== ==============

1 个答案:

答案 0 :(得分:0)

你的两个问题都是假的。两个核心不可能在同一时刻运行相同的任务。

从OOPS / panic跟踪中,我们首先知道CPU 0上的内核恐慌,然后它触发SysRq为“显示所有活动CPU的回溯”。也就是说,CPU 1在CPU 0出现紧急情况时运行良好,但触发CPU 0异常以转储CPU 1的回溯。

现在让我们分析为什么CPU 0恐慌:从回溯中,它发生在“modprobe”期间,因此请分析系统上的内核模块以找到触发恐慌的模块。