“信息:任务被阻止超过120秒。”无法弄清楚它被封锁了什么

时间:2014-04-08 03:08:08

标签: multithreading tcp linux-kernel timeout blocking

我有一个只是TCP数据转发器的程序。数据包来自一个源,并智能缓冲,以便重新分配给多个客户端。有一些线程,还有一些互斥体正在进行中。还有一点记录到磁盘。但主要是它只是TCP套接字数据(尽管dmesg似乎意味着什么,但没有UDP)。此外,没有阻止。它全部选择非阻塞套接字和一些轮询。

此过程已挂断。虽然它响应SNMP请求(有专门的线程)并且将接受客户端连接(那里是专用的侦听器线程),但是没有数据会流动。请注意,该程序在最终挂起之前将运行一个月没有问题,因此使用调试版本进行探测真的很难。我真的希望有人可以帮我解释我的dmesg报告,因为这就是我要继续下去的所有内容。

接收数据包的主线程有这个堆栈帧:

#0  0x00289424 in __kernel_vsyscall ()
#1  0x00a93656 in nanosleep () from /lib/libc.so.6
#2  0x00acc34c in usleep () from /lib/libc.so.6
#3  0x0805a140 in main ()

发件人线程有这个堆栈框架:

#0  0x00289424 in __kernel_vsyscall ()
#1  0x00a93656 in nanosleep () from /lib/libc.so.6
#2  0x00acc34c in usleep () from /lib/libc.so.6
#3  0x08066efc in TSITcpip::TcpipThread(void*) ()
#4  0x08067026 in tcpip_callback(void*) ()
#5  0x00b95a09 in start_thread () from /lib/libpthread.so.0
#6  0x00ad343e in clone () from /lib/libc.so.6

还有6个主题。一个是侦听器线程,其余是SNMP。我不知道如何获得更多的堆栈帧信息来告诉睡眠的持续时间,但是它有限制。在一个地方,最长的睡眠时间限制为5秒。其他硬编码为1秒和5毫秒。没有阻止调用来访问套接字。涉及互斥锁,但是这个系统运行了一个月没有连接客户端,这意味着锁定锁定非常罕见。

以下是dmesg的摘录。之前的所有内容都与启动相关,除了针对不相关进程的段错误之外,没有其他任何内容。这是关于发生了什么的唯一线索,我真的希望有人可以帮助我理解它。 (为了便于阅读,我添加了一些空行。)

INFO: task str2many:1624 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
str2many      D efa23b80     0  1624   1593 0x00000080
 efd8c570 00200082 f2987140 efa23b80 00200246 c07d9c2b 00000031 0000bc43
 00000000 eff27040 00010ced 00000001 00010ced c0b110c0 c0b110c0 efd8c818
 c0b110c0 c0b0caa4 c0b110c0 efd8c818 efac6000 c07d3aa3 efe6d01c f7032780
Call Trace:
 [<c07d9c2b>] ? udp_recvmsg+0x14b/0x2c0
 [<c07d3aa3>] ? tcp_v4_rcv+0x473/0x770
 [<c048792d>] ? exit_robust_list+0x7d/0x140
 [<c04529ca>] ? mm_release+0xda/0xf0
 [<c0458d9d>] ? exit_mm+0x6d/0x140
 [<c0459136>] ? do_exit+0x126/0x740
 [<c0788b91>] ? __netif_receive_skb+0x401/0x5f0
 [<c046bd60>] ? dequeue_signal+0x30/0x190
 [<c053da0c>] ? core_sys_select+0x16c/0x270
 [<c045978c>] ? do_group_exit+0x3c/0xa0
 [<c046c2fe>] ? get_signal_to_deliver+0x1ce/0x460
 [<f8a4df94>] ? e1000_clean_rx_irq+0x274/0x450 [e1000e]
 [<c0408fa3>] ? do_signal+0x93/0xa50
 [<c0489395>] ? do_futex+0xe5/0xa40
 [<c078cd79>] ? net_rx_action+0x199/0x280
 [<c0480be0>] ? ktime_get_ts+0xd0/0x100
 [<c0600623>] ? copy_to_user+0x33/0x110
 [<c053c941>] ? poll_select_copy_remaining+0xe1/0x130
 [<c04afed0>] ? __audit_syscall_exit+0x220/0x250
 [<c04099d7>] ? do_notify_resume+0x77/0xa0
 [<c083135c>] ? work_notifysig+0x13/0x1b
 [<c0830000>] ? __mutex_lock_slowpath+0xb0/0x140

INFO: task str2many:1635 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
str2many      D c2d03ac0     0  1635   1593 0x00000080
 efa75570 00200082 00000200 c2d03ac0 00000002 efac5df4 efac5df0 00000000
 00000000 eff27040 00010ced cc11e7f9 00010ced c0b110c0 c0b110c0 efa75818
 c0b110c0 c0b0caa4 c0b110c0 efa75818 f6272000 00000000 000005ff efa75570
Call Trace:
 [<c048792d>] ? exit_robust_list+0x7d/0x140
 [<c04529ca>] ? mm_release+0xda/0xf0
 [<c0458d9d>] ? exit_mm+0x6d/0x140
 [<c0459136>] ? do_exit+0x126/0x740
 [<c05f906a>] ? plist_del+0x2a/0x60
 [<c046bd60>] ? dequeue_signal+0x30/0x190
 [<c045978c>] ? do_group_exit+0x3c/0xa0
 [<c046c2fe>] ? get_signal_to_deliver+0x1ce/0x460
 [<c0408fa3>] ? do_signal+0x93/0xa50
 [<c04893c5>] ? do_futex+0x115/0xa40
 [<c04b78d1>] ? move_native_irq+0x11/0x50
 [<c04afed0>] ? __audit_syscall_exit+0x220/0x250
 [<c04099d7>] ? do_notify_resume+0x77/0xa0
 [<c083135c>] ? work_notifysig+0x13/0x1b

INFO: task str2many:1636 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
str2many      D efa75030     0  1636   1593 0x00000080
 efa75030 00200082 eff27040 efa75030 c082ecb0 00000000 00000000 0000d164
 00000000 eff27040 00010ced 00000001 00010ced c0b110c0 c0b110c0 efa752d8
 c0b110c0 c0b0caa4 c0b110c0 efa752d8 efaca000 c0b110c0 efa752d8 efaba000
Call Trace:
 [<c082ecb0>] ? schedule+0x3c0/0xae0
 [<c07e03ab>] ? inet_sendmsg+0x4b/0xb0
 [<c048792d>] ? exit_robust_list+0x7d/0x140
 [<c04529ca>] ? mm_release+0xda/0xf0
 [<c0458d9d>] ? exit_mm+0x6d/0x140
 [<c0459136>] ? do_exit+0x126/0x740
 [<c05f906a>] ? plist_del+0x2a/0x60
 [<c046bd60>] ? dequeue_signal+0x30/0x190
 [<c045978c>] ? do_group_exit+0x3c/0xa0
 [<c046c2fe>] ? get_signal_to_deliver+0x1ce/0x460
 [<c0408fa3>] ? do_signal+0x93/0xa50
 [<c04893c5>] ? do_futex+0x115/0xa40
 [<c04b8874>] ? __rcu_process_callbacks+0x44/0x2d0
 [<c04b8b35>] ? rcu_process_callbacks+0x35/0x40
 [<c045c18e>] ? __do_softirq+0xae/0x1a0
 [<c04afed0>] ? __audit_syscall_exit+0x220/0x250
 [<c04099d7>] ? do_notify_resume+0x77/0xa0
 [<c083135c>] ? work_notifysig+0x13/0x1b

INFO: task str2many:1637 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
str2many      D 00000070     0  1637   1593 0x00000080
 efa9bab0 00200082 00000000 00000070 f63c559c efaa7030 c2d080fc 000170ed
 00000000 eff27040 00010ced cdcefd93 00010ced c0b110c0 c0b110c0 efa9bd58
 c0b110c0 c0b0caa4 c0b110c0 efa9bd58 f70fc000 c2d080c0 00200082 c044d665
Call Trace:
 [<c044d665>] ? try_to_wake_up+0x205/0x3a0
 [<c043b2b7>] ? __wake_up_common+0x47/0x70
 [<c043f8f0>] ? complete+0x40/0x60
 [<c0458d9d>] ? exit_mm+0x6d/0x140
 [<c0459136>] ? do_exit+0x126/0x740
 [<c05f906a>] ? plist_del+0x2a/0x60
 [<c046bd60>] ? dequeue_signal+0x30/0x190
 [<c045978c>] ? do_group_exit+0x3c/0xa0
 [<c046c2fe>] ? get_signal_to_deliver+0x1ce/0x460
 [<c0408fa3>] ? do_signal+0x93/0xa50
 [<c04893c5>] ? do_futex+0x115/0xa40
 [<c04b8874>] ? __rcu_process_callbacks+0x44/0x2d0
 [<c04b8b35>] ? rcu_process_callbacks+0x35/0x40
 [<c045c18e>] ? __do_softirq+0xae/0x1a0
 [<c04afed0>] ? __audit_syscall_exit+0x220/0x250
 [<c04099d7>] ? do_notify_resume+0x77/0xa0
 [<c083135c>] ? work_notifysig+0x13/0x1b

INFO: task str2many:1638 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
str2many      D b4dfcbe0     0  1638   1593 0x00000080
 efa9b570 00200082 00000000 b4dfcbe0 ef991b64 c0833efa c0439bf0 0001a113
 00000000 eff27040 00010ced cd0bd748 00010ced c0b110c0 c0b110c0 efa9b818
 c0b110c0 c0b0caa4 c0b110c0 efa9b818 f70fc000 c0b100d8 000000e0 ffffffff
Call Trace:
 [<c0833efa>] ? do_page_fault+0x2a/0x90
 [<c0439bf0>] ? kmap_atomic_prot+0x120/0x150
 [<c05ff1e1>] ? __get_user_4+0x11/0x17
 [<c048792d>] ? exit_robust_list+0x7d/0x140
 [<c04529ca>] ? mm_release+0xda/0xf0
 [<c0458d9d>] ? exit_mm+0x6d/0x140
 [<c0459136>] ? do_exit+0x126/0x740
 [<c05f906a>] ? plist_del+0x2a/0x60
 [<c046bd60>] ? dequeue_signal+0x30/0x190
 [<c045978c>] ? do_group_exit+0x3c/0xa0
 [<c046c2fe>] ? get_signal_to_deliver+0x1ce/0x460
 [<c0439bf0>] ? kmap_atomic_prot+0x120/0x150
 [<c0408fa3>] ? do_signal+0x93/0xa50
 [<c04893c5>] ? do_futex+0x115/0xa40
 [<c0433a3a>] ? __do_page_fault+0x1aa/0x420
 [<c04afed0>] ? __audit_syscall_exit+0x220/0x250
 [<c04099d7>] ? do_notify_resume+0x77/0xa0
 [<c083135c>] ? work_notifysig+0x13/0x1b

INFO: task str2many:1716 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
str2many      D eb09ae40     0  1716   1593 0x00000080
 efaa7ab0 00200082 ef53e040 eb09ae40 eb09ae60 000005a8 c077c7fc 000bb500
 00000000 eff27040 00010ced 00000001 00010ced c0b110c0 c0b110c0 efaa7d58
 c0b110c0 c0b0caa4 c0b110c0 efaa7d58 efba2000 00000001 9b0bb0f7 00000000
Call Trace:
 [<c077c7fc>] ? sk_reset_timer+0xc/0x20
 [<c048792d>] ? exit_robust_list+0x7d/0x140
 [<c07cf241>] ? __tcp_push_pending_frames+0x31/0xe0
 [<c04529ca>] ? mm_release+0xda/0xf0
 [<c0458d9d>] ? exit_mm+0x6d/0x140
 [<c0459136>] ? do_exit+0x126/0x740
 [<c046bd60>] ? dequeue_signal+0x30/0x190
 [<c053da0c>] ? core_sys_select+0x16c/0x270
 [<c045978c>] ? do_group_exit+0x3c/0xa0
 [<c046c2fe>] ? get_signal_to_deliver+0x1ce/0x460
 [<c0408fa3>] ? do_signal+0x93/0xa50
 [<c0476450>] ? autoremove_wake_function+0x0/0x40
 [<c0480be0>] ? ktime_get_ts+0xd0/0x100
 [<c0600623>] ? copy_to_user+0x33/0x110
 [<c053c941>] ? poll_select_copy_remaining+0xe1/0x130
 [<c04afed0>] ? __audit_syscall_exit+0x220/0x250
 [<c04099d7>] ? do_notify_resume+0x77/0xa0
 [<c083135c>] ? work_notifysig+0x13/0x1b

INFO: task str2many:1717 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
str2many      D c2d03ac0     0  1717   1593 0x00000080
 efb11570 00200082 00000200 c2d03ac0 00000002 efba3df4 efba3df0 00000000
 00000000 eff27040 00010ced 00000001 00010ced c0b110c0 c0b110c0 efb11818
 c0b110c0 c0b0caa4 c0b110c0 efb11818 efac8000 00000000 000009ff efb11570
Call Trace:
 [<c0784701>] ? sk_stream_wait_memory+0x1b1/0x210
 [<c048792d>] ? exit_robust_list+0x7d/0x140
 [<c04529ca>] ? mm_release+0xda/0xf0
 [<c0458d9d>] ? exit_mm+0x6d/0x140
 [<c0459136>] ? do_exit+0x126/0x740
 [<c046bd60>] ? dequeue_signal+0x30/0x190
 [<c045978c>] ? do_group_exit+0x3c/0xa0
 [<c046c2fe>] ? get_signal_to_deliver+0x1ce/0x460
 [<c0408fa3>] ? do_signal+0x93/0xa50
 [<c0476450>] ? autoremove_wake_function+0x0/0x40
 [<c05a292c>] ? security_file_permission+0xc/0x10
 [<c052ab96>] ? rw_verify_area+0x66/0xe0
 [<c045c18e>] ? __do_softirq+0xae/0x1a0
 [<c04afed0>] ? __audit_syscall_exit+0x220/0x250
 [<c04099d7>] ? do_notify_resume+0x77/0xa0
 [<c083135c>] ? work_notifysig+0x13/0x1b
CE: hpet increasing min_delta_ns to 40226 nsec

INFO: task str2many:9401 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
str2many      D f0238040     0  9401   1593 0x00000080
 f0132ab0 00200082 ef802500 f0238040 00200246 c07d9c2b 00000031 00004b07
 00000000 f619ec80 00031c6c 00000001 00031c6c c0b110c0 c0b110c0 f0132d58
 c0b110c0 c0b0caa4 c0b110c0 f0132d58 f2294000 00000000 efe6d01c f7032780
Call Trace:
 [<c07d9c2b>] ? udp_recvmsg+0x14b/0x2c0
 [<c0830c65>] ? rwsem_down_failed_common+0x75/0x1a0
 [<c0830dda>] ? rwsem_down_read_failed+0x1a/0x24
 [<c0830e17>] ? call_rwsem_down_read_failed+0x7/0xc
 [<c0458d9d>] ? exit_mm+0x6d/0x140
 [<c0459136>] ? do_exit+0x126/0x740
 [<c046bd60>] ? dequeue_signal+0x30/0x190
 [<c053da0c>] ? core_sys_select+0x16c/0x270
 [<c045978c>] ? do_group_exit+0x3c/0xa0
 [<c046c2fe>] ? get_signal_to_deliver+0x1ce/0x460
 [<c044126b>] ? check_preempt_wakeup+0x16b/0x220
 [<c0408fa3>] ? do_signal+0x93/0xa50
 [<c04870ee>] ? wake_futex+0x2e/0x60
 [<c0489860>] ? do_futex+0x5b0/0xa40
 [<c0480be0>] ? ktime_get_ts+0xd0/0x100
 [<c0600623>] ? copy_to_user+0x33/0x110
 [<c053c941>] ? poll_select_copy_remaining+0xe1/0x130
 [<c04afed0>] ? __audit_syscall_exit+0x220/0x250
 [<c04099d7>] ? do_notify_resume+0x77/0xa0
 [<c083135c>] ? work_notifysig+0x13/0x1b
 [<c0830000>] ? __mutex_lock_slowpath+0xb0/0x140

INFO: task str2many:9407 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
str2many      D f01a5570     0  9407   1593 0x00000080
 f01a5570 00200082 f619ec80 f01a5570 c082ecb0 00000400 c2c03ac0 00018f12
 00000000 f619ec80 00031c6c 00000000 00031c6c c0b110c0 c0b110c0 f01a5818
 c0b110c0 c0b0caa4 c0b110c0 f01a5818 f0690000 c0b110c0 f01a5818 342d2c7f
Call Trace:
 [<c082ecb0>] ? schedule+0x3c0/0xae0
 [<c048792d>] ? exit_robust_list+0x7d/0x140
 [<c04529ca>] ? mm_release+0xda/0xf0
 [<c0458d9d>] ? exit_mm+0x6d/0x140
 [<c0459136>] ? do_exit+0x126/0x740
 [<c05f906a>] ? plist_del+0x2a/0x60
 [<c046bd60>] ? dequeue_signal+0x30/0x190
 [<c045978c>] ? do_group_exit+0x3c/0xa0
 [<c046c2fe>] ? get_signal_to_deliver+0x1ce/0x460
 [<c0408fa3>] ? do_signal+0x93/0xa50
 [<c04893c5>] ? do_futex+0x115/0xa40
 [<c04b78d1>] ? move_native_irq+0x11/0x50
 [<c04afed0>] ? __audit_syscall_exit+0x220/0x250
 [<c04099d7>] ? do_notify_resume+0x77/0xa0
 [<c083135c>] ? work_notifysig+0x13/0x1b

INFO: task str2many:9408 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
str2many      D efa5b570     0  9408   1593 0x00000080
 efa5b570 00200082 f619ec80 efa5b570 c082ecb0 00000000 00000000 00018f12
 00000000 f619ec80 00031c6c 00000000 00031c6c c0b110c0 c0b110c0 efa5b818
 c0b110c0 c0b0caa4 c0b110c0 efa5b818 eb01e000 c0b110c0 efa5b818 efb42000
Call Trace:
 [<c082ecb0>] ? schedule+0x3c0/0xae0
 [<c07e03ab>] ? inet_sendmsg+0x4b/0xb0
 [<c048792d>] ? exit_robust_list+0x7d/0x140
 [<c04529ca>] ? mm_release+0xda/0xf0
 [<c0458d9d>] ? exit_mm+0x6d/0x140
 [<c0459136>] ? do_exit+0x126/0x740
 [<c05f906a>] ? plist_del+0x2a/0x60
 [<c046bd60>] ? dequeue_signal+0x30/0x190
 [<c045978c>] ? do_group_exit+0x3c/0xa0
 [<c046c2fe>] ? get_signal_to_deliver+0x1ce/0x460
 [<c0408fa3>] ? do_signal+0x93/0xa50
 [<c04893c5>] ? do_futex+0x115/0xa40
 [<c04b8b35>] ? rcu_process_callbacks+0x35/0x40
 [<c045c18e>] ? __do_softirq+0xae/0x1a0
 [<c04afed0>] ? __audit_syscall_exit+0x220/0x250
 [<c04099d7>] ? do_notify_resume+0x77/0xa0
 [<c083135c>] ? work_notifysig+0x13/0x1b

0 个答案:

没有答案