我们使用ZYNQ7020 SoC开发了一个嵌入式系统,其中petalinux 2015-4为OS,内核版本为4.0.0-xilinx。在系统中,我们使用4G SanDisk SD卡作为额外存储。
系统大部分时间运行相当稳定。但偶尔,由于“无法处理内核空指针”问题,我们遇到内核恐慌问题。通过我们的调试,问题发生在MMC sdhci相关的驱动程序中。以下是详细信息。
转储的原始内核错误消息是:
Unable to handle kernel NULL pointer dereference at virtual address 00000008
pgd = dd6a4000
[00000008] *pgd=1d67c831, *pte=00000000, *ppte=00000000
Internal error: Oops - BUG: 17 [#1] PREEMPT SMP ARM
Modules linked in: ipv6
CPU: 0 PID: 915 Comm: bramservice Not tainted 4.0.0-xilinx #47
Hardware name: Xilinx Zynq Platform
task: de517040 ti: dd41a000 task.ti: dd41a000
PC is at sdhci_send_command+0x39c/0x9c8
LR is at sdhci_send_command+0x4a0/0x9c8
pc : [<c0387ae0>] lr : [<c0387be4>] psr: 800d0193
sp : dd41beb8 ip : c001c5d4 fp : 00000001
r10: 161c4000 r9 : 00000000 r8 : 00000000
r7 : df041008 r6 : dd4ec890 r5 : dd4ec8f8 r4 : de5ff300
r3 : 00000002 r2 : 00000003 r1 : 1e7f0600 r0 : 00000000
Flags: Nzcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
Control: 18c5387d Table: 1d6a404a DAC: 00000015
Process bramservice (pid: 915, stack limit = 0xdd41a210)
Stack: (0xdd41beb8 to 0xdd41c000)
bea0: 00000001 00000000
bec0: 00000001 00000400 de7f0600 1e7f0600 000005bd 00000000 00000000 de5ff300
bee0: 00000001 00000000 00000001 00000000 c0baa444 00000000 b54acd3c c03887f4
bf00: de5ff418 00000100 00000010 c00444b4 c0baa088 dd41a000 40000005 dd517640
bf20: 00000000 00000093 00000000 de4fa700 00000000 00000000 b54acd3c c00505a4
bf40: dd41a000 de4fa700 de4fa760 dd517640 00000093 de402400 00000000 c0050698
bf60: de4fa700 c0bb83c4 de4fa760 c0052fac 00000093 00000000 c0ba4bd4 c004fda0
bf80: 00000000 c0050064 f8f00100 c0baaf7c dd41bfb0 18c5387d 18c5387d c000860c
bfa0: 000aa2a4 200d0010 ffffffff c0011ac4 00255e58 00000046 00000070 00255e58
bfc0: 00000000 bec26a10 00000000 00000152 b648cf80 00000000 00000000 b54acd3c
bfe0: b644cdd8 b54acd30 000a7bbc 000aa2a4 200d0010 ffffffff ebfffffe ea0000c9
[<c0387ae0>] (sdhci_send_command) from [<c03887f4>] (sdhci_irq+0x23c/0x794)
[<c03887f4>] (sdhci_irq) from [<c00505a4>] (handle_irq_event_percpu+0x28/0xe0)
[<c00505a4>] (handle_irq_event_percpu) from [<c0050698>] (handle_irq_event+0x3c/0x5c)
[<c0050698>] (handle_irq_event) from [<c0052fac>] (handle_fasteoi_irq+0xa4/0x11c)
[<c0052fac>] (handle_fasteoi_irq) from [<c004fda0>] (generic_handle_irq+0x20/0x30)
[<c004fda0>] (generic_handle_irq) from [<c0050064>] (__handle_domain_irq+0x8c/0xb4)
[<c0050064>] (__handle_domain_irq) from [<c000860c>] (gic_handle_irq+0x38/0x5c)
[<c000860c>] (gic_handle_irq) from [<c0011ac4>] (__irq_usr+0x44/0x60)
Exception stack(0xdd41bfb0 to 0xdd41bff8)
bfa0: 00255e58 00000046 00000070 00255e58
bfc0: 00000000 bec26a10 00000000 00000152 b648cf80 00000000 00000000 b54acd3c
bfe0: b644cdd8 b54acd30 000a7bbc 000aa2a4 200d0010 ffffffff
Code: e58d1014 e5943178 e15b0003 aa000042 (e5993008)
---[ end trace ef6ec1f8e3ce8554 ]---
We reverse them via symbol table, and get the call stack as following:
sdhci_adma_table_pre() drivers/mmc/host/sdhci.c:521
sdhci_prepare_data() drivers/mmc/host/sdhci.c:829
sdhci_send_command() drivers/mmc/host/sdhci.c:1049
sdhci_finish_command() drivers/mmc/host/sdhci.c:1113
sdhci_cmd_irq() drivers/mmc/host/sdhci.c:2428
sdhci_irq() drivers/mmc/host/sdhci.c:2638
handle_irq_event_percpu() kernel/irq/handle.c:143
handle_irq_event() kernel/irq/handle.c:192
handle_fasteoi_irq() kernel/irq/chip.c:536
generic_handle_irq_desc() include/linux/irqdesc.h:129
generic_handle_irq() kernel/irq/irqdesc.c:351
__handle_domain_irq() kernel/irq/irqdesc.c:388
handle_domain_irq() include/linux/irqdesc.h:147
gic_handle_irq() drivers/irqchip/irq-gic.c:291
__irq_usr() arch/arm/kernel/entry-armv.S:448
在sdhci_adma_table_pre()的第521行,sg_dma_len(sg),但sg是一个NULL指针。
我们正在使用内核的默认sdhci驱动程序代码,并且从未对SD控制器和驱动程序部分进行任何更改。所以,我们很困惑问题是如何发生的。这是由SD卡引起的问题,还是该版本的内核/驱动程序中存在任何已知问题?任何建议将不胜感激。