copy_to_user returns an error in a char device read function

时间:2017-12-18 06:58:42

标签: c linux-kernel mips embedded-linux

I've implemented a char device for my kernel module and implemented a read function for it. The read function calls copy_to_user to return data to the caller. I've originally implemented the read function in a blocking manner (with wait_event_interruptible) but the problem reproduces even when I implement read in a non-blocking manner. My code is running on a MIPS procesor.

The user space program opens the char device and reads into a buffer allocated on the stack.

What I've found is that occasionally copy_to_user will fail to copy any bytes. Moreover, even if I replace copy_to_user with a call to memcpy (only for the purposes of checking... I know this isn't the right thing to do), and print out the destination buffer immediately afterwards, I see that memcpy has failed to copy any bytes.

I'm not really sure how to further debug this - how can I determine why memory is not being copied? Is it possible that the process context is wrong?

EDIT: Here's some pseudo-code outlining what the code currently looks like:

User mode (runs repeatedly):

char buf[BUF_LEN];
FILE *f = fopen(char_device_file, "rb");
fread(buf, 1, BUF_LEN, f);
fclose(f);

Kernel mode:

char_device = 
    create_char_device(char_device_name,
        NULL,
        read_func,
        NULL,
        NULL);

int read_func(char *output_buffer, int output_buffer_length, loff_t *offset)
{
    int rc;
    if (*offset == 0)
    {
        spin_lock_irqsave(&lock, flags);

        while (get_available_bytes_to_read() == 0)
        {
            spin_unlock_irqrestore(&lock, flags);
            if (wait_event_interruptible(self->wait_queue, get_available_bytes_to_read() != 0))
            {
                // Got a signal; retry the read
                return -ERESTARTSYS;
            }

            spin_lock_irqsave(&lock, flags);
        }

        rc = copy_to_user(output_buffer, internal_buffer, bytes_to_copy);

        spin_unlock_irqrestore(&lock, flags);
    } 
    else rc = 0;

    return rc;
}

1 个答案:

答案 0 :(得分:0)

需要进行相当多的调试,但最终Tsyvarev的提示(关于不使用自旋锁调用copy_to_user的评论)似乎是原因。

我们的流程有一个后台主题,偶尔会启动一个新流程(fork + exec)。当我们禁用此线程时,一切运行良好。我们得到的最好的理论是fork使我们的所有内存页面都是copy-on-write,所以当我们尝试复制它们时,内核必须完成一些无法使用spinlock完成的工作。希望它至少有一定道理(虽然我已经猜到这只适用于子进程,父进程页面只会保持可写,但谁知道......)。

我们将代码重写为无锁,问题就消失了。

现在我们只需要验证我们的无锁代码在不同架构上确实是安全的。很容易就是馅饼。