内核崩溃 - 在USB设备驱动程序中从KTHREAD调用DEVICE_WRITE时,NULL指针取消引用

时间:2014-01-15 06:20:42

标签: drivers usb multithreading

我正在编写一个简单的USB驱动程序来驱动基于USB Skeleton 2.2 Driver,内核3.8的步进电机。基本版本正常运行。作为一个进步,我介绍了KTHREAD调用DEVICE_WRITE(skel_write)(),以便驱动程序可用于其他任务&要求。

Calling procedure : USER (request) -> DEVICE_IOCTL -> KTHREAD -> DEVICE_WRITE.

在这种情况下,当我通过循环从KTHREAD多次调用DEVICE_WRITE时,一切正常。然后在一些迭代之后,内核搞砸了,否则如果直接调用就可以了。看到日志文件后,错误是

Dec 30 01:15:14 mit kernel: [  962.316843] device_write(efed1180,2,10),ioused : 1
Dec 30 01:15:14 mit kernel: [  962.316900] data : 0, motor_cnt : 2, master_counter : 20
Dec 30 01:15:14 mit kernel: [  962.366498] data : 1, motor_cnt : 2, master_counter : 21
Dec 30 01:15:14 mit kernel: [  962.416116] Write over, going for sleep
Dec 30 01:15:14 mit kernel: [  962.416125] file : efed1180,data : 2,i : 11
Dec 30 01:15:14 mit kernel: [  962.416128] device_write(efed1180,2,10),ioused : 1
Dec 30 01:15:14 mit kernel: [  962.416166] BUG: unable to handle kernel NULL pointer dereference at   (null)
Dec 30 01:15:14 mit kernel: [  962.416254] IP: [] skel_write+0xd7/0x360 [usbstep]
Dec 30 01:15:14 mit kernel: [  962.416294] *pdpt = 0000000000000000* pde= f0002accf0002acc
Dec 30 01:15:14 mit kernel: [  962.416332] Oops: 0000 [#1] SMP
Dec 30 01:15:14 mit kernel: [  962.416363] Modules linked in: usbstep(OF) parport_pc(F) ppdev(F)   bnep rfcomm bluetooth snd_hda_codec_hdmi uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev snd_hda_codec_idt coretemp snd_hda_intel kvm snd_hda_codec snd_hwdep(F) snd_pcm(F) snd_page_alloc(F) joydev(F) snd_seq_midi(F) snd_seq_midi_event(F) snd_rawmidi(F) hp_wmi lib80211_crypt_tkip snd_seq(F) snd_seq_device(F) snd_timer(F) sparse_keymap radeon wl(POF) lib80211 ttm drm_kms_helper cfg80211 drm hp_accel lis3lv02d mei input_polldev wmi i2c_algo_bit video(F) intel_ips mac_hid snd(F) lpc_ich soundcore(F) microcode(F) lp(F) parport(F) psmouse(F) serio_raw(F) r8169 ahci(F) libahci(F) [last unloaded: usbstep]
Dec 30 01:15:14 mit kernel: [  962.416866] Pid: 2997, comm: mitesh Tainted: PF          O 3.8.0-26-generic #38-Ubuntu Hewlett-Packard HP ProBook 4520s/1411
Dec 30 01:15:14 mit kernel: [  962.416928] EIP: 0060:[] EFLAGS: 00010287 CPU: 2
Dec 30 01:15:14 mit kernel: [  962.416960] EIP is at skel_write+0xd7/0x360 [usbstep]
Dec 30 01:15:14 mit kernel: [  962.416989] EAX: f0665b84 EBX: 00000014 ECX: 000000d0 EDX: 00000014
Dec 30 01:15:14 mit kernel: [  962.417024] ESI: f0665b40 EDI: 00000000 EBP: efddbf40 ESP: efddbf04
Dec 30 01:15:14 mit kernel: [  962.417059]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Dec 30 01:15:14 mit kernel: [  962.417089] CR0: 8005003b CR2: 00000000 CR3: 019d1000 CR4: 000007f0
Dec 30 01:15:14 mit kernel: [  962.417124] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Dec 30 01:15:14 mit kernel: [  962.417158] DR6: ffff0ff0 DR7: 00000400
Dec 30 01:15:14 mit kernel: [  962.417181] Process mitesh (pid: 2997, ti=efdda000 task=f0bed9b0 task.ti=efdda000)
Dec 30 01:15:14 mit kernel: [  962.417223] Stack:
Dec 30 01:15:14 mit kernel: [  962.417236]  f0665b84 efddbf58 efddbf58 0000000a 00000001 efddbf58 00000000 efddbf40
Dec 30 01:15:14 mit kernel: [  962.417301]  c1609d81 00000002 f06c5d40 00000014 0000000c efddbf58 f1487408 efddbf6c
Dec 30 01:15:14 mit kernel: [  962.417398]  f8585546 00000000 efed1180 efddbf58 0000000b f6eb0032 aa092dff f6eb7ebc
Dec 30 01:15:14 mit kernel: [  962.417475] Call Trace:
Dec 30 01:15:14 mit kernel: [  962.417504]  [] ? printk+0x4d/0x4f
Dec 30 01:15:14 mit kernel: [  962.417559]  [] tele+0x86/0xc0 [usbstep]
Dec 30 01:15:14 mit kernel: [  962.417618]  [] ? skel_write+0x360/0x360 [usbstep]
Dec 30 01:15:14 mit kernel: [  962.417691]  [] kthread+0x94/0xa0
Dec 30 01:15:14 mit kernel: [  962.417744]  [] ? __hrtimer_start_range_ns+0x2e0/0x460
Dec 30 01:15:14 mit kernel: [  962.417819]  [] ret_from_kernel_thread+0x1b/0x28
Dec 30 01:15:14 mit kernel: [  962.417886]  [] ? kthread_create_on_node+0xc0/0xc0
Dec 30 01:15:14 mit kernel: [  962.417951] Code: c0 89 c6 0f 84 83 01 00 00 83 c3 0a b8 00 0e 00 00 81 fb 00 0e 00 00 b9 d0 00 00 00 0f 46 c3 89 45 f0 8d 46 44 8b 55 f0 89 04 24 <8b> 07 e8 52 9f ee c8 85 c0 89 45 e4 0f 84 0f 01 00 00 8d 47 54
Dec 30 01:15:14 mit kernel: [  962.418433] EIP: [] skel_write+0xd7/0x360 [usbstep] SS:ESP 0068:efddbf04
Dec 30 01:15:14 mit kernel: [  962.418530] CR2: 0000000000000000
Dec 30 01:15:14 mit kernel: [  962.433930] ---[ end trace 63245eeeb64414aa ]---

代码如下: KTHREAD

int tele(void *__tele_data) {
        struct tele_data *tele_data = __tele_data;
        int i=0;
        char *dptr=NULL;
        char numb[4];
        sprintf(numb,"%d",tele_data->num);
        dptr=numb;
        for(i=0;i<30;i++) {
        is_ioctl_used=1;
                printk("file : %p,data : %s,i : %d\n", tele_data->file,dptr,i);
                skel_write(tele_data->file,(char *)dptr, 10, 0);
                printk("Write over, going for sleep\n");
        }
        return 0;
}

DEVICE_WRITE -

static ssize_t skel_write(struct file *file, const char *user_buffer,
                      size_t count, loff_t *ppos)
{
    struct usb_skel *dev;
    int retval = 0,i = 0,motor_count,dir=0;
    struct urb *urb = NULL;
    char *buf = NULL;
    char *buf1 = NULL;
    size_t writesize = min(count+10, (size_t)MAX_TRANSFER);
    printk(KERN_INFO "device_write(%p,%s,%d),ioused : %d\n", file, user_buffer, count,is_ioctl_used);
    dev = file->private_data;

    // verify that we actually have some data to write 
    if (count == 0)
            goto exit;

    /*
     * limit the number of URBs in flight to stop a user from using up all
     * RAM
     */

    if (!(file->f_flags & O_NONBLOCK)) {
            if (down_interruptible(&dev->limit_sem)) {
                    retval = -ERESTARTSYS;
                    goto exit;
            }
    } else {
            if (down_trylock(&dev->limit_sem)) {
                    retval = -EAGAIN;
                    goto exit;
            }
    }

    spin_lock_irq(&dev->err_lock);
    retval = dev->errors;
if (retval < 0) {
            // any error is reported once 
            dev->errors = 0;
            // to preserve notifications about reset 
            retval = (retval == -EPIPE) ? retval : -EIO;
    }
    spin_unlock_irq(&dev->err_lock);
    if (retval < 0)
            goto error;

    /* create a urb, and a buffer for it, and copy the data to the urb */
    buf1=(char *)kmalloc(sizeof(char)*20,GFP_KERNEL);  //Allocate 2nd buffer.
    if(is_ioctl_used) {   //Whether the write function is called from IOCTL or Directly (echo > /dev/stepper)
            sprintf(buf1,user_buffer);
    } else {
            if (copy_from_user(buf1, user_buffer,count)) {
                    retval = -EFAULT;
                    goto error;
            }
    }
    motor_count=simple_strtol(buf1,NULL,10);
    if(motor_count<0) {  //Rotation counts of stepper motor.
            motor_count=motor_count * -1;  //If motor_count<0 then rotate in anti-clock direction.
            dir=1;
    }
            urb = usb_alloc_urb(0, GFP_KERNEL);
            if (!urb) {
                    retval = -ENOMEM;
                    goto error;
            }

            buf = usb_alloc_coherent(dev->udev, writesize, GFP_KERNEL,
                            &urb->transfer_dma);
     if (!buf) {
                    retval = -ENOMEM;
                    goto error;
            }

            /* this lock makes sure we don't submit URBs to gone devices */
                    mutex_lock(&dev->io_mutex);
            if (!dev->interface) {          /* disconnect() was called */
                    mutex_unlock(&dev->io_mutex);
                    retval = -ENODEV;
                    goto error;
            }

            /* initialize the urb properly */
            usb_fill_int_urb(urb, dev->udev,
                            usb_sndintpipe(dev->udev, dev->bulk_out_endpointAddr),
                            buf, writesize, skel_write_bulk_callback, dev,dev->bInterval);
            urb->transfer_flags |= URB_NO_TRANSFER_DMA_MAP;
            usb_anchor_urb(urb, &dev->submitted);

    for(i=0;i<motor_count;i++) {  //Loop to rotate motor based on counts.
            printk("data : %d, motor_cnt : %d, master_counter : %d\n",ptr->data,motor_count,master_counter);
            if(dir==0) ptr=ptr->next;
            else ptr=ptr->prev;
            // Fill the buffers.
            buf[0]=0x01;
            buf[1]=0;
            buf[2]=ptr->data;

            /* send the data out the bulk port */
            retval = usb_submit_urb(urb, GFP_KERNEL);
    if (retval) {
                            dev_err(&dev->interface->dev,
                                            "%s - failed submitting write urb, error %d\n",
                                            __func__, retval);
                            mutex_unlock(&dev->io_mutex);
                            goto error_unanchor;

                    }
            if(++master_counter && master_counter > 47) master_counter=0;
            /*
             * release our reference to this urb, the USB core will eventually free
             * it entirely
             */
                            mdelay(50); //Delay is required to match with motor speed. 

            }
                    mutex_unlock(&dev->io_mutex);
                    usb_free_coherent(dev->udev, writesize, buf, urb->transfer_dma);
                    kfree(buf1);
                    usb_free_urb(urb);

                    is_ioctl_used=0;
            return writesize;
error_unanchor:
            usb_unanchor_urb(urb);
error:
    if (urb) {
            usb_free_coherent(dev->udev, writesize, buf, urb->transfer_dma);
            usb_free_urb(urb);
    }
    up(&dev->limit_sem);

exit:
    return retval;
}

我是内核编程的新手,可能会错过一些东西。

2 个答案:

答案 0 :(得分:1)

我不知道这是否是您问题的根本原因,但您的tele()函数似乎有很多问题:

    int tele(void *__tele_data) {
            struct tele_data *tele_data = __tele_data;
            int i=0;
            char *dptr=NULL;
            char numb[4];
            sprintf(numb,"%d",tele_data->num);

在这里,你sprintf()进入麻木缓冲区的数字。 tele_data->num的范围是多少?是否需要超过4个字符(包括终止NUL字符)?此外,您没有记录缓冲区中打印的字节数。好像你想知道在下面使用......

            dptr=numb;

好的,现在dptr指向numb。这意味着它指向一个最多包含4个字节的字符缓冲区,但是......

            for(i=0;i<30;i++) {
                    is_ioctl_used=1;
                    printk("file : %p,data : %s,i : %d\n", tele_data->file,dptr,i);
                    skel_write(tele_data->file,(char *)dptr, 10, 0);

在上面的skel_write()行中,您要求写入 10 个字节。这比现有的多了6个。所以你可以在这里粉碎堆栈。

我不相信这是你唯一的问题,但它似乎确实存在问题。

其他一些小事要指出。您不需要dptrskel_write()上的演员表......它已经是char *。警惕铸造,因为如果变量的类型改变,它可以隐藏类型的无意识不匹配。此外,代码中的缩进遍布整个地方。我意识到你只是在学习,但在这里养成良好做法的习惯。通过skel_write()实施来阅读真的很难。这可能是其他一些问题,而正确缩进这样简单的东西可以帮助读者理解流程,并可能看到问题。

最后,不要放弃。内核编程很难:有许多移动部件,并发,锁定,缓存和非常异步的编程风格。 OTOH,你接近处理器和系统的裸机,这是非常有益的。

答案 1 :(得分:0)

Mit,

你是否需要仔细检查你正在进行的kthreads行为w.r.t操作

请参阅:In what context Kernel Thread runs in Linux?