Question

我在使用PCIe块设备驱动程序读取或写入文件时遇到了一个我真的不明白的错误。我似乎在swiotlb_unmap_sg_attrs()中遇到了一个问题，它似乎在对sg指针进行NULL解除引用，但我不知道它来自哪里，因为唯一scatterlist我使用自己被分配为设备信息结构的一部分，并且只要驱动程序存在就会持续存在。

有一个stacktrace可以解决这个问题。它的细节往往有所不同，但它总是在swiotlb_unmap_sq_attrs()中崩溃。

我认为我可能有锁定问题，因为我不知道如何处理IO功能的锁定。调用request函数时已经保持锁定，我在调用IO函数本身之前释放它，因为它们需要（MSI）IRQ才能完成。 IRQ处理程序更新IO功能正在等待的“状态”值。当IO函数返回时，我然后恢复锁定并返回请求队列处理。

崩溃发生在blk_fetch_request()期间：

if (!__blk_end_request(req, res, bytes)){
    printk(KERN_ERR "%s next request\n", DRIVER_NAME);

    req = blk_fetch_request(q);
} else {
    printk(KERN_ERR "%s same request\n", DRIVER_NAME);
}

其中bytes由请求处理程序更新为IO的总长度（每个分散 - 聚集段的总长度）。

Answer 1

事实证明这是由于request功能的重新引入。因为我在中间解锁以允许IRQ进入，所以request函数可以再次被调用，将获取锁（当原始请求处理程序在等待IO时）然后错误的处理程序将获得IRQ一切都以失败的IO堆栈向南走。

我解决这个问题的方法是在请求函数开始时设置一个“忙”标志，在结束时清除它并在函数开始时立即返回，如果设置了这个：

static void mydev_submit_req(struct request_queue *q){
    struct mydevice *dev = q->queuedata;

    // We are already processing a request
    // so reentrant calls can take a hike
    // They'll be back
    if (dev->has_request)
        return;

    // We own the IO now, new requests need to wait
    // Queue lock is held when this function is called
    // so no need for an atomic set
    dev->has_request = 1; 

    // Access request queue here, while queue lock is held

    spin_unlock_irq(q->queue_lock);

    // Perform IO here, with IRQs enabled
    // You can't access the queue or request here, make sure 
    // you got the info you need out before you release the lock

    spin_lock_irq(q->queue_lock);

    // you can end the requests as needed here, with the lock held

    // allow new requests to be processed after we return
    dev->has_request = 0;

    // lock is held when the function returns
}

然而，我仍然不确定为什么我始终从swiotlb_unmap_sq_attrs()获得堆栈跟踪。

磁盘IO上的swiotlb_unmap_sg_attrs（）中的NULL指针取消引用

1 个答案: