add_timer导致多个PCI板的内核堆栈转储

时间:2016-02-18 16:16:22

标签: linux-kernel linux-device-driver kernel-module

我们正在使用带有PCI Express驱动程序的FPGA卡来使用DMA引擎移动数据。这一切都适用于机器中的单张卡,但是有两张卡失败。作为初步调查,我已将错误缩小到用于设置轮询机制的add_timer函数。当insmod添加驱动程序模块时,会产生堆栈跟踪,因为poll_timer例程对于两个实例都是相同的。代码已缩减为

static int  dat_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
{
    struct timer_list * timer = &poll_timer;
    int i;

    /* Start polling routine */
    log_normal(KERN_INFO "DEBUG ADD TIMER: Starting poll routine with %x\n", pdev);
    init_timer(timer);

    // random number added so that expires value is different for both instances of timer
    get_random_bytes(&i, 1);
    timer->expires=jiffies+HZ+i;
    timer->data=(unsigned long) pdev;
    timer->function = poll_routine;

    log_verbose("DEBUG ADD TIMER: Timer expires %x\n", timer->expires);
    log_verbose("DEBUG ADD TIMER: Timer data %x\n", timer->data);
    log_verbose("DEBUG ADD TIMER: Timer function %x\n", timer->function);

    // ***** THIS IS WHERE STACK TRACE OCCURS (WHEN CALLED FOR SECOND TIME)
    add_timer(timer);

    log_verbose("DEBUG ADD TIMER: Value of HZ is %d\n", HZ);
    log_verbose("DEBUG ADD TIMER: End of probe\n");

    return 0;
}

堆栈跟踪产生
list_add corruption. prev->next should be next (ffffffff81f76228), but was (null). (prev=ffffffffa050a3c0).
list_add double add: new=ffffffffa050a3c0, prev=ffffffffa050a3c0, next=ffffffff81f76228.

查看printk语句,很明显add_timer正在尝试将相同的例程添加到链表中。它是否正确?

DEBUG ADD TIMER: Timer expires fffd9cd3
DEBUG ADD TIMER: Timer data 6c0ac000
DEBUG ADD TIMER: Timer function **a0508150**
DEBUG ADD TIMER: Value of HZ is 1000
DEBUG ADD TIMER: End of probe
DEBUG ADD TIMER: Starting poll routine with 6c0ad000
DEBUG ADD TIMER: Timer expires fffd9c7d
DEBUG ADD TIMER: Timer data 6c0ad000
DEBUG ADD TIMER: Timer function **a0508150**

所以我的问题是(是),我应该如何为同一个驱动程序的多个瞬时配置计时器? (假设当多个电路板插入机器时会发生这种情况)。

完整堆栈跟踪

DEBUG ADD TIMER: Inserting driver into kernel.
DEBUG ADD TIMER: Starting poll routine with 6c0ac000
DEBUG ADD TIMER: Timer expires fffd9cd3
DEBUG ADD TIMER: Timer data 6c0ac000
DEBUG ADD TIMER: Timer function a0508150
DEBUG ADD TIMER: Value of HZ is 1000
DEBUG ADD TIMER: End of probe
DEBUG ADD TIMER: Starting poll routine with 6c0ad000
DEBUG ADD TIMER: Timer expires fffd9c7d
DEBUG ADD TIMER: Timer data 6c0ad000
DEBUG ADD TIMER: Timer function a0508150
------------[ cut here ]------------
WARNING: CPU: 0 PID: 2201 at lib/list_debug.c:33 __list_add+0xa0/0xd0()
list_add corruption. prev->next should be next (ffffffff81f76228), but was           (null). (prev=ffffffffa050a3c0).
Modules linked in: xdma_v7(POE+) xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller crc32c_intel eeepc_wmi ghash_clmulni_intel asus_wmi ftdi_sio iTCO_wdt snd_hda_codec sparse_keymap raid0 iTCO_vendor_support
 snd_hda_core rfkill sb_edac ipmi_ssif video mxm_wmi edac_core snd_hwdep mei_me snd_seq snd_seq_device ipmi_msghandler snd_pcm mei acpi_pad tpm_infineon lpc_ich mfd_core snd_timer tpm_tis shpchp tpm snd soundcore i2c_i801 wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ast drm_kms_helper ttm drm igb serio_raw ptp pps_core dca i2c_algo_bit
CPU: 0 PID: 2201 Comm: insmod Tainted: P           OE   4.1.8-100.fc21.x86_64 #1
Hardware name: ASUSTeK COMPUTER INC. Z10PE-D8 WS/Z10PE-D8 WS, BIOS 1001 03/17/2015
 0000000000000000 00000000ec73155d ffff880457123928 ffffffff81792065
 0000000000000000 ffff880457123980 ffff880457123968 ffffffff810a163a
 0000000000000246 ffffffffa050a3c0 ffffffff81f76228 ffffffffa050a3c0
Call Trace:
 [<ffffffff81792065>] dump_stack+0x45/0x57
 [<ffffffff810a163a>] warn_slowpath_common+0x8a/0xc0
 [<ffffffff810a16c5>] warn_slowpath_fmt+0x55/0x70
 [<ffffffff810f8250>] ? vprintk_emit+0x3b0/0x560
 [<ffffffff813c7c30>] __list_add+0xa0/0xd0
 [<ffffffff81108412>] __internal_add_timer+0xb2/0x130
 [<ffffffff811084bf>] internal_add_timer+0x2f/0xb0
 [<ffffffff8110a1ca>] mod_timer+0x12a/0x210
 [<ffffffff8110a2c8>] add_timer+0x18/0x30
 [<ffffffffa050810f>] dat_probe+0xbf/0x100 [xdma_v7]
 [<ffffffff813f6da5>] local_pci_probe+0x45/0xa0
 [<ffffffff812a8da2>] ? sysfs_do_create_link_sd.isra.2+0x72/0xc0
 [<ffffffff813f8109>] pci_device_probe+0xf9/0x150
 [<ffffffff814e7e59>] driver_probe_device+0x209/0x4b0
 [<ffffffff814e81db>] __driver_attach+0x9b/0xa0
 [<ffffffff814e8140>] ? __device_attach+0x40/0x40
 [<ffffffff814e5973>] bus_for_each_dev+0x73/0xc0
 [<ffffffff814e772e>] driver_attach+0x1e/0x20
 [<ffffffff814e72e0>] bus_add_driver+0x180/0x250
 [<ffffffffa000a000>] ? 0xffffffffa000a000
 [<ffffffff814e89d4>] driver_register+0x64/0xf0
 [<ffffffff813f662c>] __pci_register_driver+0x4c/0x50
 [<ffffffffa000a02c>] dat_init+0x2c/0x1000 [xdma_v7]
 [<ffffffff81002148>] do_one_initcall+0xd8/0x210
 [<ffffffff812094f9>] ? kmem_cache_alloc_trace+0x1a9/0x230
 [<ffffffff817911bc>] ? do_init_module+0x28/0x1cc
 [<ffffffff817911f5>] do_init_module+0x61/0x1cc
 [<ffffffff811270bb>] load_module+0x20db/0x2550
 [<ffffffff81122990>] ? store_uevent+0x70/0x70
 [<ffffffff8122e860>] ? kernel_read+0x50/0x80
 [<ffffffff81127766>] SyS_finit_module+0xa6/0xe0
 [<ffffffff8179892e>] system_call_fastpath+0x12/0x71
---[ end trace 340e5d7ba2d89081 ]---
------------[ cut here ]------------
WARNING: CPU: 0 PID: 2201 at lib/list_debug.c:36 __list_add+0xcb/0xd0()
list_add double add: new=ffffffffa050a3c0, prev=ffffffffa050a3c0, next=ffffffff81f76228.
Modules linked in: xdma_v7(POE+) xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller crc32c_intel eeepc_wmi ghash_clmulni_intel asus_wmi ftdi_sio iTCO_wdt snd_hda_codec sparse_keymap raid0 iTCO_vendor_support
 snd_hda_core rfkill sb_edac ipmi_ssif video mxm_wmi edac_core snd_hwdep mei_me snd_seq snd_seq_device ipmi_msghandler snd_pcm mei acpi_pad tpm_infineon lpc_ich mfd_core snd_timer tpm_tis shpchp tpm snd soundcore i2c_i801 wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ast drm_kms_helper ttm drm igb serio_raw ptp pps_core dca i2c_algo_bit
CPU: 0 PID: 2201 Comm: insmod Tainted: P        W  OE   4.1.8-100.fc21.x86_64 #1
Hardware name: ASUSTeK COMPUTER INC. Z10PE-D8 WS/Z10PE-D8 WS, BIOS 1001 03/17/2015
 0000000000000000 00000000ec73155d ffff880457123928 ffffffff81792065
 0000000000000000 ffff880457123980 ffff880457123968 ffffffff810a163a
 0000000000000246 ffffffffa050a3c0 ffffffff81f76228 ffffffffa050a3c0
Call Trace:
 [<ffffffff81792065>] dump_stack+0x45/0x57
 [<ffffffff810a163a>] warn_slowpath_common+0x8a/0xc0
 [<ffffffff810a16c5>] warn_slowpath_fmt+0x55/0x70
 [<ffffffff810f8250>] ? vprintk_emit+0x3b0/0x560
 [<ffffffff813c7c5b>] __list_add+0xcb/0xd0
 [<ffffffff81108412>] __internal_add_timer+0xb2/0x130
 [<ffffffff811084bf>] internal_add_timer+0x2f/0xb0
 [<ffffffff8110a1ca>] mod_timer+0x12a/0x210
 [<ffffffff8110a2c8>] add_timer+0x18/0x30
 [<ffffffffa050810f>] dat_probe+0xbf/0x100 [xdma_v7]
 [<ffffffff813f6da5>] local_pci_probe+0x45/0xa0
 [<ffffffff812a8da2>] ? sysfs_do_create_link_sd.isra.2+0x72/0xc0
 [<ffffffff813f8109>] pci_device_probe+0xf9/0x150
 [<ffffffff814e7e59>] driver_probe_device+0x209/0x4b0
 [<ffffffff814e81db>] __driver_attach+0x9b/0xa0
 [<ffffffff814e8140>] ? __device_attach+0x40/0x40
 [<ffffffff814e5973>] bus_for_each_dev+0x73/0xc0
 [<ffffffff814e772e>] driver_attach+0x1e/0x20
 [<ffffffff814e72e0>] bus_add_driver+0x180/0x250
 [<ffffffffa000a000>] ? 0xffffffffa000a000
 [<ffffffff814e89d4>] driver_register+0x64/0xf0
 [<ffffffff813f662c>] __pci_register_driver+0x4c/0x50
 [<ffffffffa000a02c>] dat_init+0x2c/0x1000 [xdma_v7]
 [<ffffffff81002148>] do_one_initcall+0xd8/0x210
 [<ffffffff812094f9>] ? kmem_cache_alloc_trace+0x1a9/0x230
 [<ffffffff817911bc>] ? do_init_module+0x28/0x1cc
 [<ffffffff817911f5>] do_init_module+0x61/0x1cc
 [<ffffffff811270bb>] load_module+0x20db/0x2550
 [<ffffffff81122990>] ? store_uevent+0x70/0x70
 [<ffffffff8122e860>] ? kernel_read+0x50/0x80
 [<ffffffff81127766>] SyS_finit_module+0xa6/0xe0
 [<ffffffff8179892e>] system_call_fastpath+0x12/0x71
---[ end trace 340e5d7ba2d89082 ]---
DEBUG ADD TIMER: Value of HZ is 1000
DEBUG ADD TIMER: End of probe

1 个答案:

答案 0 :(得分:0)

问题在于,对dat_probe的第二次调用正在破坏第一次调用poll_timer时初始化和排队的dat_probe变量。你正在破坏内核计时器列表中的指针。

您需要摆脱poll_timer变量并为每个设备提供其自己的动态分配的私有数据结构,其中包含自己的struct timer_list成员。调用pci_set_drvdata设置PCI设备的专用数据指针。其他PCI驱动程序函数可以调用pci_get_drvdata来检索该指针。