我正在尝试解决内核中的问题:
由于陷入困境,khungtask恐慌被抛出。引发恐慌的过程有四个线程。两个线程试图获取mmap_sem
信号量。另外两个正在等待用户空间互斥。
这是第一个的堆栈跟踪:
crash> set 14789
PID: 14789
COMMAND: "dumper:du"
TASK: ffff8801a434c140 [THREAD_INFO: ffff8801ecd7c000]
CPU: 0
STATE: TASK_UNINTERRUPTIBLE
crash> bt
PID: 14789 TASK: ffff8801a434c140 CPU: 0 COMMAND: "dumper:du"
#0 [ffff8801ecd7dda8] __schedule at ffffffff8143fa09
#1 [ffff8801ecd7de50] schedule at ffffffff8143fb95
#2 [ffff8801ecd7de60] rwsem_down_failed_common at ffffffff81440fdf
#3 [ffff8801ecd7dec0] rwsem_down_write_failed at ffffffff81441024
#4 [ffff8801ecd7ded0] call_rwsem_down_write_failed at ffffffff812167c3
#5 [ffff8801ecd7df20] sys_mprotect at ffffffff810f968e
#6 [ffff8801ecd7df80] system_call_fastpath at ffffffff81447f42
RIP: 00000031dfce5377 RSP: 00007fff3c2b6408 RFLAGS: 00013202
RAX: 000000000000000a RBX: ffffffff81447f42 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000001000 RDI: 00007f95310a1000
RBP: 00000031e061c360 R8: 0000000000000019 R9: 0000000000000008
R10: 00000031dfa20000 R11: 0000000000003246 R12: 0000000000000003
R13: 0000000000000000 R14: 00007f95318a19c0 R15: 0000000000800000
ORIG_RAX: 000000000000000a CS: 0033 SS: 002b
这是第二个的堆栈跟踪:
crash> set 14791
PID: 14791
COMMAND: "dumper:du"
TASK: ffff8801eda161c0 [THREAD_INFO: ffff8801f0fa2000]
CPU: 1
STATE: TASK_RUNNING
crash> bt
PID: 14791 TASK: ffff8801eda161c0 CPU: 1 COMMAND: "dumper:du"
#0 [ffff8801f0fa3d88] __schedule at ffffffff8143fa09
#1 [ffff8801f0fa3e30] schedule at ffffffff8143fb95
#2 [ffff8801f0fa3e40] rwsem_down_failed_common at ffffffff81440fdf
#3 [ffff8801f0fa3ea0] rwsem_down_write_failed at ffffffff81441024
#4 [ffff8801f0fa3eb0] call_rwsem_down_write_failed at ffffffff812167c3
#5 [ffff8801f0fa3f00] sys_mmap_pgoff at ffffffff810f8de4
#6 [ffff8801f0fa3f70] sys_mmap at ffffffff8101280e
#7 [ffff8801f0fa3f80] system_call_fastpath at ffffffff81447f42
RIP: 00000031dfce531a RSP: 00007f95330a30c8 RFLAGS: 00013246
RAX: 0000000000000009 RBX: ffffffff81447f42 RCX: 0000000000000000
RDX: 0000000000000003 RSI: 0000000000001000 RDI: 0000000000000000
RBP: 0000000000001000 R8: 00000000ffffffff R9: 0000000000000000
R10: 0000000000000022 R11: 0000000000003246 R12: ffffffff8101280e
R13: ffff8801f0fa3f78 R14: 0000000000000003 R15: 0000000000040000
ORIG_RAX: 0000000000000009 CS: 0033 SS: 002b
我看到第一个是TASK_UNINTERRUPTIBLE,第二个是TASK_RUNNING。为什么会这样?
显然,第一个是TASK_UNINTERRUPTIBLE的是引发恐慌的那个。但是,为什么两者都无法获得信号量呢?
我不明白这一点。