Question

I've implemented a char device for my kernel module and implemented a read function for it. The read function calls copy_to_user to return data to the caller. I've originally implemented the read function in a blocking manner (with wait_event_interruptible) but the problem reproduces even when I implement read in a non-blocking manner. My code is running on a MIPS procesor.

The user space program opens the char device and reads into a buffer allocated on the stack.

What I've found is that occasionally copy_to_user will fail to copy any bytes. Moreover, even if I replace copy_to_user with a call to memcpy (only for the purposes of checking... I know this isn't the right thing to do), and print out the destination buffer immediately afterwards, I see that memcpy has failed to copy any bytes.

I'm not really sure how to further debug this - how can I determine why memory is not being copied? Is it possible that the process context is wrong?

EDIT: Here's some pseudo-code outlining what the code currently looks like:

User mode (runs repeatedly):

char buf[BUF_LEN];
FILE *f = fopen(char_device_file, "rb");
fread(buf, 1, BUF_LEN, f);
fclose(f);

Kernel mode:

char_device = 
    create_char_device(char_device_name,
        NULL,
        read_func,
        NULL,
        NULL);

int read_func(char *output_buffer, int output_buffer_length, loff_t *offset)
{
    int rc;
    if (*offset == 0)
    {
        spin_lock_irqsave(&lock, flags);

        while (get_available_bytes_to_read() == 0)
        {
            spin_unlock_irqrestore(&lock, flags);
            if (wait_event_interruptible(self->wait_queue, get_available_bytes_to_read() != 0))
            {
                // Got a signal; retry the read
                return -ERESTARTSYS;
            }

            spin_lock_irqsave(&lock, flags);
        }

        rc = copy_to_user(output_buffer, internal_buffer, bytes_to_copy);

        spin_unlock_irqrestore(&lock, flags);
    } 
    else rc = 0;

    return rc;
}

Answer 1

需要进行相当多的调试，但最终Tsyvarev的提示（关于不使用自旋锁调用copy_to_user的评论）似乎是原因。

我们的流程有一个后台主题，偶尔会启动一个新流程（fork + exec）。当我们禁用此线程时，一切运行良好。我们得到的最好的理论是fork使我们的所有内存页面都是copy-on-write，所以当我们尝试复制它们时，内核必须完成一些无法使用spinlock完成的工作。希望它至少有一定道理（虽然我已经猜到这只适用于子进程，父进程页面只会保持可写，但谁知道......）。

我们将代码重写为无锁，问题就消失了。

现在我们只需要验证我们的无锁代码在不同架构上确实是安全的。很容易就是馅饼。

copy_to_user returns an error in a char device read function

1 个答案: