我正在研究通过网络连接硬盘的驱动程序。有一个错误,如果我在计算机上启用两个或更多硬盘,只有第一个硬盘可以查看和识别分区。结果是,如果我在hda上有1个分区,在hdb上有1个分区,只要我连接hda,就会有一个可以挂载的分区。所以hda1一旦安装就会得到一个blkid xyz123。但是当我继续安装hdb1时,它也会出现相同的blkid,实际上,驱动程序是从hda而不是hdb读取它。
所以我想我找到了司机弄乱的地方。下面是一个调试输出,包括dump_stack,我把它放在第一个看似访问错误设备的地方。
以下是代码部分:
/*basically, this is just the request_queue processor. In the log output that
follows, the second device, (hdb) has just been connected, right after hda
was connected and hda1 was mounted to the system. */
void nblk_request_proc(struct request_queue *q)
{
struct request *req;
ndas_error_t err = NDAS_OK;
dump_stack();
while((req = NBLK_NEXT_REQUEST(q)) != NULL)
{
dbgl_blk(8,"processing queue request from slot %d",SLOT_R(req));
if (test_bit(NDAS_FLAG_QUEUE_SUSPENDED, &(NDAS_GET_SLOT_DEV(SLOT_R(req))->queue_flags))) {
printk ("ndas: Queue is suspended\n");
/* Queue is suspended */
#if ( LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,31) )
blk_start_request(req);
#else
blkdev_dequeue_request(req);
#endif
这是一个日志输出。我添加了一些评论,以帮助了解正在发生的事情以及糟糕的电话似乎出现的地方。
/* Just below here you can see "slot" mentioned many times. This is the
identification for the network case in which the hd is connected to the
network. So you will see slot 2 in this log because the first device has
already been connected and mounted. */
kernel: [231644.155503] BL|4|slot_enable|/driver/block/ctrldev.c:281|adding disk: slot=2, first_minor=16, capacity=976769072|nd/dpcd1,64:15:44.38,3828:10
kernel: [231644.155588] BL|3|ndop_open|/driver/block/ops.c:233|ing bdev=f6823400|nd/dpcd1,64:15:44.38,3720:10
kernel: [231644.155598] BL|2|ndop_open|/driver/block/ops.c:247|slot =0x2|nd/dpcd1,64:15:44.38,3720:10
kernel: [231644.155606] BL|2|ndop_open|/driver/block/ops.c:248|dev_t=0x3c00010|nd/dpcd1,64:15:44.38,3720:10
kernel: [231644.155615] ND|3|ndas_query_slot|netdisk/nddev.c:791|slot=2 sdev=d33e2080|nd/dpcd1,64:15:44.38,3696:10
kernel: [231644.155624] ND|3|ndas_query_slot|netdisk/nddev.c:817|ed|nd/dpcd1,64:15:44.38,3696:10
kernel: [231644.155631] BL|3|ndop_open|/driver/block/ops.c:326|mode=1|nd/dpcd1,64:15:44.38,3720:10
kernel: [231644.155640] BL|3|ndop_open|/driver/block/ops.c:365|ed open|nd/dpcd1,64:15:44.38,3724:10
kernel: [231644.155653] BL|8|ndop_revalidate_disk|/driver/block/ops.c:2334|gendisk=c6afd800={major=60,first_minor=16,minors=0x10,disk_name=ndas-44700486-0,private_data=00000002,capacity=%lld}|nd/dpcd1,64:15:44.38,3660:10
kernel: [231644.155668] BL|8|ndop_revalidate_disk|/driver/block/ops.c:2346|ed|nd/dpcd1,64:15:44.38,3652:10
/* So at this point the hard disk is added (gendisk=c6...) and the identifications
all match the network device. The driver is now about to begin scanning the
hard drive for existing partitions. the little 'ed', at the end of the previous
line indicates that revalidate_disk has finished it's job.
Also, I think the request queue is indicated by the output dpcd1 near the very
end of the line.
Now below we have entered the function that is pasted above. In the function
you can see that the slot can be determined by the queue. And the log output
after the stack dump shows it is from slot 1. (The first network drive that was
already mounted.) */
kernel: [231644.155677] ndas-44700486-0:Pid: 467, comm: nd/dpcd1 Tainted: P 2.6.32-5-686 #1
kernel: [231644.155711] Call Trace:
kernel: [231644.155723] [<fc5a7685>] ? nblk_request_proc+0x9/0x10c [ndas_block]
kernel: [231644.155732] [<c11298db>] ? __generic_unplug_device+0x23/0x25
kernel: [231644.155737] [<c1129afb>] ? generic_unplug_device+0x1e/0x2e
kernel: [231644.155743] [<c1123090>] ? blk_unplug+0x2e/0x31
kernel: [231644.155750] [<c10cceec>] ? block_sync_page+0x33/0x34
kernel: [231644.155756] [<c108770c>] ? sync_page+0x35/0x3d
kernel: [231644.155763] [<c126d568>] ? __wait_on_bit_lock+0x31/0x6a
kernel: [231644.155768] [<c10876d7>] ? sync_page+0x0/0x3d
kernel: [231644.155773] [<c10876aa>] ? __lock_page+0x76/0x7e
kernel: [231644.155780] [<c1043f1f>] ? wake_bit_function+0x0/0x3c
kernel: [231644.155785] [<c1087b76>] ? do_read_cache_page+0xdf/0xf8
kernel: [231644.155791] [<c10d21b9>] ? blkdev_readpage+0x0/0xc
kernel: [231644.155796] [<c1087bbc>] ? read_cache_page_async+0x14/0x18
kernel: [231644.155801] [<c1087bc9>] ? read_cache_page+0x9/0xf
kernel: [231644.155808] [<c10ed6fc>] ? read_dev_sector+0x26/0x60
kernel: [231644.155813] [<c10ee368>] ? adfspart_check_ICS+0x20/0x14c
kernel: [231644.155819] [<c10ee138>] ? rescan_partitions+0x17e/0x378
kernel: [231644.155825] [<c10ee348>] ? adfspart_check_ICS+0x0/0x14c
kernel: [231644.155830] [<c10d26a3>] ? __blkdev_get+0x225/0x2c7
kernel: [231644.155836] [<c10ed7e6>] ? register_disk+0xb0/0xfd
kernel: [231644.155843] [<c112e33b>] ? add_disk+0x9a/0xe8
kernel: [231644.155848] [<c112dafd>] ? exact_match+0x0/0x4
kernel: [231644.155853] [<c112deae>] ? exact_lock+0x0/0xd
kernel: [231644.155861] [<fc5a8b80>] ? slot_enable+0x405/0x4a5 [ndas_block]
kernel: [231644.155868] [<fc5a8c63>] ? ndcmd_enabled_handler+0x43/0x9e [ndas_block]
kernel: [231644.155874] [<fc5a8c20>] ? ndcmd_enabled_handler+0x0/0x9e [ndas_block]
kernel: [231644.155891] [<fc54b22b>] ? notify_func+0x38/0x4b [ndas_core]
kernel: [231644.155906] [<fc561cba>] ? _dpc_cancel+0x17c/0x626 [ndas_core]
kernel: [231644.155919] [<fc562005>] ? _dpc_cancel+0x4c7/0x626 [ndas_core]
kernel: [231644.155933] [<fc561cba>] ? _dpc_cancel+0x17c/0x626 [ndas_core]
kernel: [231644.155941] [<c1003d47>] ? kernel_thread_helper+0x7/0x10
/* here are the output of the driver debugs. They show that this operation is
being performed on the first devices request queue. */
kernel: [231644.155948] BL|8|nblk_request_proc|/driver/block/block26.c:494|processing queue request from slot 1|nd/dpcd1,64:15:44.38,3408:10
kernel: [231644.155959] BL|8|nblk_handle_io|/driver/block/block26.c:374|struct ndas_slot sd = NDAS GET SLOT DEV(slot 1)
kernel: [231644.155966] |nd/dpcd1,64:15:44.38,3328:10
kernel: [231644.155970] BL|8|nblk_handle_io|/driver/block/block26.c:458|case READA call ndas_read(slot=1, ndas_req)|nd/dpcd1,64:15:44.38,3328:10
kernel: [231644.155979] ND|8|ndas_read|netdisk/nddev.c:824|read io: slot=1, cmd=0, req=x00|nd/dpcd1,64:15:44.38,3320:10
我希望这是足够的背景资料。也许在这个时刻一个显而易见的问题是“何时何地分配了request_queues?”
在add_disk函数之前处理一下。添加磁盘,是日志输出的第一行。
slot->disk = NULL;
spin_lock_init(&slot->lock);
slot->queue = blk_init_queue(
nblk_request_proc,
&slot->lock
);
据我所知,这是标准操作。回到我原来的问题。我可以在某处找到请求队列,并确保每个新设备的请求队列增加或唯一,或者Linux内核是否只为每个主要号码使用一个队列?我想了解为什么这个驱动程序在两个不同的块存储上加载相同的队列,并确定在初始注册过程中是否导致重复的blkid。
感谢您为我看这种情况。
答案 0 :(得分:1)
Queue = blk_init_queue(sbd_request, &Device.lock);
答案 1 :(得分:0)
我分享了导致我发布此问题的错误的解决方案。虽然它实际上没有回答如何识别设备请求队列的问题。
在上面的代码中有以下内容:
if (test_bit(NDAS_FLAG_QUEUE_SUSPENDED,
&(NDAS_GET_SLOT_DEV(SLOT_R(req))->queue_flags)))
那么,“SLOT_R(req)”造成了麻烦。这是定义返回gendisk设备的其他位置。
#define SLOT_R(_request_) SLOT((_request_)->rq_disk)
这会返回磁盘,但不会在以后的各种操作中返回正确的值。因此,当加载额外的块设备时,此函数基本上保持返回1.(我认为它是作为布尔值处理的。)因此,所有请求都堆积在磁盘1的请求队列中。
修复是访问磁盘的private_data中存储的正确磁盘标识值,当它被添加到系统时。
Correct identifier definition:
#define SLOT_R(_request_) ( (int) _request_->rq_disk->private_data )
How the correct disk number was stored.
slot->disk->queue = slot->queue;
slot->disk->private_data = (void*) (long) s; <-- 's' is the disk id
slot->queue_flags = 0;
现在从私有数据返回正确的磁盘ID,因此所有请求都到正确的队列。
如前所述,这并未说明如何识别队列。一个没有受过教育的猜测可能是:
x = (int) _request_->rq_disk->queue->id;
参考。 linux中的request_queue函数 http://lxr.free-electrons.com/source/include/linux/blkdev.h#L270&amp; 321
感谢大家的帮助!