我正在使用LITMUS ^ RT修补linux内核3.10,这是一个实时扩展,专注于多处理器实时调度和同步。
我的目标是编写一个调度程序,允许任务在被抢占时从cpu迁移到另一个,并且仅在满足特定条件时。我当前的实现受到死锁beetwen cpus的影响,如以下错误所示:
Setting up rt task parameters for process 1622.
[ INFO: inconsistent lock state ]
3.10.5-litmus2013.1 #105 Not tainted
---------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
rtspin/1620 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&rq->lock){?.-.-.}, at: [<ffffffff8155f0d5>] __schedule+0x175/0xa70
{IN-HARDIRQ-W} state was registered at:
[<ffffffff8107832a>] __lock_acquire+0x86a/0x1e90
[<ffffffff81079f65>] lock_acquire+0x95/0x140
[<ffffffff81560fc6>] _raw_spin_lock+0x36/0x50
[<ffffffff8105e231>] scheduler_tick+0x61/0x210
[<ffffffff8103f112>] update_process_times+0x62/0x80
[<ffffffff81071677>] tick_periodic+0x27/0x70
[<ffffffff8107174b>] tick_handle_periodic+0x1b/0x70
[<ffffffff810042d0>] timer_interrupt+0x10/0x20
[<ffffffff810849fd>] handle_irq_event_percpu+0x6d/0x260
[<ffffffff81084c33>] handle_irq_event+0x43/0x70
[<ffffffff8108778c>] handle_level_irq+0x6c/0xc0
[<ffffffff81003a89>] handle_irq+0x19/0x30
[<ffffffff81003925>] do_IRQ+0x55/0xd0
[<ffffffff81561cef>] ret_from_intr+0x0/0x13
[<ffffffff8108615a>] __setup_irq+0x20a/0x4e0
[<ffffffff81086473>] setup_irq+0x43/0x90
[<ffffffff8184fb5f>] setup_default_timer_irq+0x12/0x14
[<ffffffff8184fb78>] hpet_time_init+0x17/0x19
[<ffffffff8184fb46>] x86_late_time_init+0xa/0x11
[<ffffffff8184ecd1>] start_kernel+0x270/0x2e0
[<ffffffff8184e5a3>] x86_64_start_reservations+0x2a/0x2c
[<ffffffff8184e66c>] x86_64_start_kernel+0xc7/0xca
irq event stamp: 8886
hardirqs last enabled at (8885): [<ffffffff8108dd6b>] rcu_note_context_switch+0x8b/0x2d0
hardirqs last disabled at (8886): [<ffffffff81561052>] _raw_spin_lock_irq+0x12/0x50
softirqs last enabled at (8880): [<ffffffff81037125>] __do_softirq+0x195/0x2b0
softirqs last disabled at (8857): [<ffffffff8103738d>] irq_exit+0x7d/0x90
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&rq->lock);
<Interrupt>
lock(&rq->lock);
*** DEADLOCK ***
1 lock held by rtspin/1620:
#0: (&rq->lock){?.-.-.}, at: [<ffffffff8155f0d5>] __schedule+0x175/0xa70
stack backtrace:
CPU: 1 PID: 1620 Comm: rtspin Not tainted 3.10.5-litmus2013.1 #105
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
ffffffff81bc4cc0 ffff88001cdf3aa8 ffffffff8155ae1e ffff88001cdf3af8
ffffffff81557f39 0000000000000000 ffff880000000001 ffff880000000001
0000000000000000 ffff88001c5ec280 ffffffff810750d0 0000000000000002
Call Trace:
[<ffffffff8155ae1e>] dump_stack+0x19/0x1b
[<ffffffff81557f39>] print_usage_bug+0x1f7/0x208
[<ffffffff810750d0>] ? print_shortest_lock_dependencies+0x1c0/0x1c0
[<ffffffff81075ead>] mark_lock+0x2ad/0x320
[<ffffffff81075fd0>] mark_held_locks+0xb0/0x120
[<ffffffff8129bf71>] ? pfp_schedule+0x691/0xba0
[<ffffffff810760f2>] trace_hardirqs_on_caller+0xb2/0x210
[<ffffffff8107625d>] trace_hardirqs_on+0xd/0x10
[<ffffffff8129bf71>] pfp_schedule+0x691/0xba0
[<ffffffff81069e70>] pick_next_task_litmus+0x40/0x500
[<ffffffff8155f17a>] __schedule+0x21a/0xa70
[<ffffffff8155f9f4>] schedule+0x24/0x70
[<ffffffff8155d1bc>] schedule_timeout+0x14c/0x200
[<ffffffff8105e3ed>] ? get_parent_ip+0xd/0x50
[<ffffffff8105e589>] ? sub_preempt_count+0x69/0xf0
[<ffffffff8155ffab>] wait_for_completion_interruptible+0xcb/0x140
[<ffffffff81060e60>] ? try_to_wake_up+0x470/0x470
[<ffffffff8129266f>] do_wait_for_ts_release+0xef/0x190
[<ffffffff81292782>] sys_wait_for_ts_release+0x22/0x30
[<ffffffff81562552>] system_call_fastpath+0x16/0x1b
此时我想我有两种可能的方法可以解决这个问题:
在使用内核函数迁移到目标cpu之前,释放对当前cpu的锁定。 LITMUS ^ RT提供了一个回调函数,我可以在其中决定执行哪个任务:
static struct task_struct * pfp_schedule(struct task_struct * prev){ [...] if(is_preempted(prev){ //释放当前cpu的锁定 migrate_to_another_cpu(); } [...] }
我认为我必须做的是在调用migrate_to_another_cpu函数之前释放当前锁,但我仍然没有找到任何方法。
长话短说,有没有人知道其中一种解决方案是否可行,如果有,是如何实施的?
提前致谢!
P.S。:非常欢迎提示或更好的想法! ; - )