“无法处理虚拟地址处的内核NULL指针取消引用。” - 发信号通知内核模块| OOPS

时间:2015-10-04 00:28:18

标签: c linux ubuntu linux-kernel kernel-module

我正在学习一些内核模块和线程的基础知识。所以我试着制作一个示例模块并对其进行测试。 现在,它成功加载。

模块代码:

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/kthread.h>
#include <linux/delay.h>
#include <linux/version.h>


static struct task_struct *thread_st;

// Function called by thread
static int thread_fun(void *unused)
{
    allow_signal(SIGKILL);
    while(!kthread_should_stop())
    {
        printk(KERN_INFO "Thread Running\n");
        ssleep(5);

        if(signal_pending(current))
            break;
    }
    printk(KERN_INFO "Thread Stopping\n");
    do_exit(0);
    return 0;
}



// Module initialisation
static int __init init_thread(void)
{
    printk(KERN_INFO "Creating Thread\n");

    thread_st = kthread_run(thread_fun, NULL, "mythread");
    if(thread_st)
        printk(KERN_INFO "Thread created successfully\n");
    else
        printk(KERN_INFO "Thread creation failed\n");
    return 0;

}




// Module exit
static void __exit cleanup_thread(void)
{
    printk(KERN_INFO "Cleaning up\n");
    if(thread_st)
    {
        kthread_stop(current);
        printk(KERN_INFO "Thread Stopped\n");
    }
}

module_init(init_thread);
module_exit(cleanup_thread);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Pinkesh Badjatiya");
MODULE_DESCRIPTION("Simple Kernel Module");

现在,一旦模块加载,我按照以下步骤卸载它,

  1. 发送SIGKILL信号, sudo kill -9 [PID]
  2. 等待dmesg显示'Thread Stopping',这意味着kthread_should_stop()已返回true。
  3. 删除模块 sudo rmmod [MODULE_NAME]
  4. dmesg 日志:

    [  492.979030] Creating Thread
    [  492.979753] Thread created successfully
    [  492.979776] Thread Running
    [  497.985420] Thread Running
    [  502.992223] Thread Running
    [  507.999007] Thread Running
    [  513.005837] Thread Running
    [  518.012585] Thread Running
    [  523.019354] Thread Running
    [  528.026195] Thread Running
    [  533.032919] Thread Running
    [  538.039795] Thread Running
    [  543.046588] Thread Running
    [  548.053383] Thread Stopping
    [  556.317200] Cleaning up
    [  556.317212] Thread Stopped
    

    现在当我使用原来使用的结构指针 thread_st 更改变量 current ,然后加载模块并按照上面相同的步骤删除模块,内核给出恐慌(OOPS)并填写dmesg日志。

    我还在Ubuntu上获得了Report Error弹出窗口。

    dmesg 日志:

    [ 1269.832922] Creating Thread
    [ 1269.833888] Thread created successfully
    [ 1269.834217] Thread Running
    [ 1274.839425] Thread Running
    [ 1279.846211] Thread Running
    [ 1284.853017] Thread Running
    [ 1289.859819] Thread Running
    [ 1294.866589] Thread Running
    [ 1299.873353] Thread Stopping
    [ 1305.758783] Cleaning up
    [ 1305.758853] BUG: unable to handle kernel NULL pointer dereference at           (null)
    [ 1305.762603] IP: [<ffffffff81096d6b>] exit_creds+0x1b/0x70
    [ 1305.766266] PGD 0 
    [ 1305.769967] Oops: 0000 [#3] SMP 
    [ 1305.774675] Modules linked in: kernel_thread_example(OE-) vmnet(OE) vmw_vsock_vmci_transport vsock vmw_vmci vmmon(OE) cmac rmd160 crypto_null camellia_generic camellia_x86_64 cast6_avx_x86_64 cast6_generic cast5_avx_x86_64 cast5_generic cast_common deflate cts ctr gcm ccm serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64 blowfish_common twofish_generic twofish_avx_x86_64 twofish_x86_64_3way xts twofish_x86_64 twofish_common xcbc sha256_ssse3 sha512_ssse3 des_generic aes_x86_64 lrw gf128mul glue_helper ablk_helper xfrm_user ah6 ah4 esp6 esp4 xfrm4_mode_beet xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_ro xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm6_tunnel tunnel6 xfrm_ipcomp af_key xfrm_algo bnep rfcomm bluetooth 6lowpan_iphc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev media snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi arc4 snd_seq intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ath9k ath9k_common ath9k_hw crct10dif_pclmul snd_seq_device crc32_pclmul snd_timer ath ghash_clmulni_intel cryptd mac80211 joydev serio_raw snd cfg80211 i915 lpc_ich shpchp soundcore drm_kms_helper drm mei_me mei i2c_algo_bit mac_hid video wmi parport_pc ppdev lp parport hid_generic usbhid hid psmouse ahci libahci atl1c [last unloaded: kernel_thread_example]
    [ 1305.817666] CPU: 3 PID: 4038 Comm: rmmod Tainted: G      D    OE 3.16.0-50-generic #66~14.04.1-Ubuntu
    [ 1305.822078] Hardware name: HCL Infosystems Limited HCL ME LAPTOP/HCL Infosystems Limited, BIOS 203.T01 03/19/2011
    [ 1305.826447] task: ffff8800a6221e90 ti: ffff880119700000 task.ti: ffff880119700000
    [ 1305.830740] RIP: 0010:[<ffffffff81096d6b>]  [<ffffffff81096d6b>] exit_creds+0x1b/0x70
    [ 1305.834968] RSP: 0018:ffff880119703e90  EFLAGS: 00010246
    [ 1305.839081] RAX: 0000000000000000 RBX: ffff8800b6e065e0 RCX: 0000000000000000
    [ 1305.843133] RDX: ffffffff81c8ea00 RSI: ffff8800b6e065e0 RDI: 0000000000000000
    [ 1305.847062] RBP: ffff880119703e98 R08: 0000000000000086 R09: 0000000000000431
    [ 1305.850897] R10: 0000000000000000 R11: ffff880119703c0e R12: ffff8800b6e065e0
    [ 1305.854697] R13: 0000000000000000 R14: 0000000000000000 R15: 00007f0325bb6240
    [ 1305.858456] FS:  00007f0325595740(0000) GS:ffff88011fa60000(0000) knlGS:0000000000000000
    [ 1305.862225] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 1305.866197] CR2: 0000000000000000 CR3: 00000000b6e23000 CR4: 00000000000407e0
    [ 1305.866199] Stack:
    [ 1305.866206]  ffff8800b6e065e0 ffff880119703eb8 ffffffff8106abf2 0000000000000000
    [ 1305.866211]  ffff8800b6e065e0 ffff880119703ee0 ffffffff81091868 0000000000000000
    [ 1305.866216]  ffffffffc0a61000 0000000000000800 ffff880119703ef0 ffffffffc0a5f086
    [ 1305.866217] Call Trace:
    [ 1305.866232]  [<ffffffff8106abf2>] __put_task_struct+0x52/0x140
    [ 1305.866241]  [<ffffffff81091868>] kthread_stop+0xd8/0xe0
    [ 1305.866249]  [<ffffffffc0a5f086>] cleanup_thread+0x23/0xf9d [kernel_thread_example]
    [ 1305.866259]  [<ffffffff810ebbb2>] SyS_delete_module+0x162/0x200
    [ 1305.866268]  [<ffffffff8176edcd>] system_call_fastpath+0x1a/0x1f
    [ 1305.866318] Code: ff ff 85 c0 0f 84 33 fe ff ff e9 0c fe ff ff 90 66 66 66 66 90 55 48 89 e5 53 48 8b 87 c0 05 00 00 48 89 fb 48 8b bf b8 05 00 00 <8b> 00 48 c7 83 b8 05 00 00 00 00 00 00 f0 ff 0f 74 23 48 8b bb 
    [ 1305.866324] RIP  [<ffffffff81096d6b>] exit_creds+0x1b/0x70
    [ 1305.866326]  RSP <ffff880119703e90>
    [ 1305.866328] CR2: 0000000000000000
    [ 1305.866378] ---[ end trace 0bd516c6629996c7 ]---
    

    我无法弄清楚为什么会发生这种情况 我在网上搜索但找不到任何理由。

    此外,变量当前是否已在上述任何标题中声明,使用上面创建的 thread_st 有什么问题?

2 个答案:

答案 0 :(得分:3)

来自kthread_stop函数的描述:

  

如果threadfn()可以调用do_exit()本身,则调用者必须确保task_struct不会消失。

这意味着如果它被其他地方的kthread_stop()终止,则不能简单地退出kthread。您应该只在找到kthread_should_stop()为真时退出,或者在退出之前 grub引用task_struct (以某种方式)。

  

等待dmesg显示“Thread Stopping”,这意味着kthread_should_stop()返回true。

如果是signal_pending(current),则在没有allow_signal()次调用的情况下为真。 kthread_should_stop()仅在有人为给定线程调用kthread_stop()时才为真。如果信号由用户空间明确发送(因为allow_signal()),signal_pending(current)不会反映kthread_should_stop()州。

所以,你的两个实现都是不正确的,因为它们会在从使用空间明确发送信号的情况下退出线程。

此外,在kthread函数中使用thread_st会引入竞争条件:线程函数可能会在kthread_run()返回之前开始(并且其结果将分配给thread_st)。

<强>更新

您可以等到“线程停止”后立即调用kthreas_stop():

static int thread_fun(void *unused)
{
    allow_signal(SIGKILL);
    while(!kthread_should_stop())
    {
        printk(KERN_INFO "Thread Running\n");
        ssleep(5);

        if(signal_pending(current))
            break;
    }
    printk(KERN_INFO "Thread Stopping\n");

    // Wait until kthread will be actually stopped.
    while(!kthread_should_stop())
    {
        /* 
         * Flush any pending signal.
         *
         * Otherwise interruptible wait will not wait actually.
         */
        flush_signals(current);
        /* Stopping thread is some sort of interrupt. That's why we need interruptible wait. */        
        set_current_state(TASK_INTERRUPTIBLE);
        if(!kthread_should_stop()) schedule();
        set_current_state(TASK_RUNNING);
    }

    return 0;
}

答案 1 :(得分:2)

  1. current始终指​​向当前正在运行的任务,并通过一些内核头文件包含在内。所以我们需要仔细使用它。因此,在下面编写的函数中,您试图停止调用cleanup_thread()即rmmod进程的任务,因为cleanup_thread()是一个模块退出函数

    static void __exit cleanup_thread(void)
    {
        printk(KERN_INFO "Cleaning up\n");
        if(thread_st)
        {
            kthread_stop(current);
            printk(KERN_INFO "Thread Stopped\n");
        }
    }
    
  2. 问题的可能原因是首先你用kill -9杀死线程。这会导致线程死亡并且task_struct被释放。 但由于 thread_st 不是零,它是一个悬空指针,即它指向已经释放的位置。
  3. 然后在cleanup_exit()中如果你调用kthread_stop(thread_st),那么实际上你传递的是无效的内存位置,因此内核崩溃了。

    在执行 do_exit()之前尝试取消 thread_st