I've implemented a char device for my kernel module and implemented a read function for it. The read function calls copy_to_user
to return data to the caller. I've originally implemented the read function in a blocking manner (with wait_event_interruptible
) but the problem reproduces even when I implement read in a non-blocking manner. My code is running on a MIPS procesor.
The user space program opens the char device and reads into a buffer allocated on the stack.
What I've found is that occasionally copy_to_user
will fail to copy any bytes. Moreover, even if I replace copy_to_user
with a call to memcpy
(only for the purposes of checking... I know this isn't the right thing to do), and print out the destination buffer immediately afterwards, I see that memcpy
has failed to copy any bytes.
I'm not really sure how to further debug this - how can I determine why memory is not being copied? Is it possible that the process context is wrong?
EDIT: Here's some pseudo-code outlining what the code currently looks like:
User mode (runs repeatedly):
char buf[BUF_LEN];
FILE *f = fopen(char_device_file, "rb");
fread(buf, 1, BUF_LEN, f);
fclose(f);
Kernel mode:
char_device =
create_char_device(char_device_name,
NULL,
read_func,
NULL,
NULL);
int read_func(char *output_buffer, int output_buffer_length, loff_t *offset)
{
int rc;
if (*offset == 0)
{
spin_lock_irqsave(&lock, flags);
while (get_available_bytes_to_read() == 0)
{
spin_unlock_irqrestore(&lock, flags);
if (wait_event_interruptible(self->wait_queue, get_available_bytes_to_read() != 0))
{
// Got a signal; retry the read
return -ERESTARTSYS;
}
spin_lock_irqsave(&lock, flags);
}
rc = copy_to_user(output_buffer, internal_buffer, bytes_to_copy);
spin_unlock_irqrestore(&lock, flags);
}
else rc = 0;
return rc;
}
答案 0 :(得分:0)
需要进行相当多的调试,但最终Tsyvarev的提示(关于不使用自旋锁调用copy_to_user
的评论)似乎是原因。
我们的流程有一个后台主题,偶尔会启动一个新流程(fork
+ exec
)。当我们禁用此线程时,一切运行良好。我们得到的最好的理论是fork使我们的所有内存页面都是copy-on-write,所以当我们尝试复制它们时,内核必须完成一些无法使用spinlock完成的工作。希望它至少有一定道理(虽然我已经猜到这只适用于子进程,父进程页面只会保持可写,但谁知道......)。
我们将代码重写为无锁,问题就消失了。
现在我们只需要验证我们的无锁代码在不同架构上确实是安全的。很容易就是馅饼。