Python脚本随机失败 - 我可以使用哪些工具来确定原因?

时间:2016-06-06 20:53:12

标签: python tracing watchdog raspberry-pi3

我有一个python脚本,我作为池自动化项目的一部分编写。随着时间的推移,我做了很多改进,改进了它,并在功能上增加了。结果,我没有机会让它长时间运行,直到最近我(几乎)把它拿到我想要的地方。现在我已经让它一直运行,它随机失败并重新启动(通过看门狗支持)。

我通过systemd在Raspberry Pi3上运行这个脚本,它包括看门狗支持,因为我希望/需要它一直运行。看门狗在失败时捕获脚本并重新启动它就像它想象的那样,但我宁愿弄清楚导致脚本失败的原因。

脚本连接到mysql数据库,获取游泳池水平的一些信息以及我的游泳池泵使用了多少瓦特,然后确定我们是否需要填充游泳池。如果我们这样做,我们使用继电器打开连接到水池的洒水阀,如果不是,我们什么也不做。我们还检查洒水喷头是否正在运行,泳池泵是否正在运行以及是否有人抛出了物理断路开关。它有许多我们使用的状态LED和一些开关以及一个LCD屏幕通过串行通信到Pi。

除了sshd和系统内容之外,这个脚本几乎是Pi上运行的唯一东西...没有apache,没有node-red,ftp等......

我有一个对Pi打开的ssh会话,即使脚本失败,此会话也不会失败。对pi进行连续ping会显示零丢包,即使脚本失败也是如此。当脚本失败并重新启动时,我的syslog显示以下内容:

Jun  6 08:08:56 scruffy systemd[1]: Unit pool_control.service entered failed state.
Jun  6 08:08:57 scruffy systemd[1]: pool_control.service holdoff time over, scheduling restart.
Jun  6 08:08:57 scruffy systemd[1]: Stopping Installing Python script for Pool Fill Control /w watchdog...
Jun  6 08:08:57 scruffy systemd[1]: Starting Installing Python script for Pool Fill Control /w watchdog...
Jun  6 08:08:58 scruffy systemd[1]: Started Installing Python script for Pool Fill Control /w watchdog.
Jun  6 08:08:58 scruffy kernel: [34864.219647] gpiomem-bcm2835 3f200000.gpiomem: gpiomem device opened.

dmesg在脚本失败并重新启动时显示此信息:

[    8.938912] gpiomem-bcm2835 3f200000.gpiomem: gpiomem device opened.
[34864.219647] gpiomem-bcm2835 3f200000.gpiomem: gpiomem device opened.

我的程序日志没有显示任何异常:

2016-06-06 13:26:24,387 INFO Notify socket = /run/systemd/notify
2016-06-06 13:26:24,616 DEBUG PushBullet Notification Sent - Pool fill control started successfully
2016-06-06 13:26:24,617 INFO pool_fill_control.py V2.6 (2016-06-05) started
2016-06-06 13:26:25,182 DEBUG Sprinklers are not running (RACHIO).
2016-06-06 13:26:25,183 DEBUG SPRINKLER_RUN_LED should be OFF. This is a BLUE LED
2016-06-06 13:26:25,184 DEBUG Watchdog Ping Sent
2016-06-06 13:26:25,611 DEBUG get_pool_level returned 1
2016-06-06 13:26:25,764 DEBUG pool_pump_running_watts returned 12 watts in use by pump.
2016-06-06 13:26:25,765 DEBUG PUMP_RUN_LED should be OFF. This is the YELLOW LED
2016-06-06 13:26:25,766 DEBUG POOL_FILLING_LED should be OFF. This is a BLUE LED
2016-06-06 13:26:25,766 DEBUG Pool Level OK (PFC_LEVEL_OK) sent to MightyHat

在脚本运行时,这里是top的输出:

top - 13:29:36 up 15:01,  3 users,  load average: 0.05, 0.07, 0.05
Tasks: 119 total,   1 running, 118 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.7 us,  1.2 sy,  0.0 ni, 98.0 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem:    947760 total,   390032 used,   557728 free,   114444 buffers
KiB Swap:   102396 total,        0 used,   102396 free.    97648 cached Mem

和meminfo:

root scruffy: log #  cat /proc/meminfo 
MemTotal:         947760 kB
MemFree:          558160 kB
MemAvailable:     864020 kB
Buffers:          114460 kB
Cached:            97640 kB
SwapCached:            0 kB
Active:           202888 kB
Inactive:          31192 kB
Active(anon):      23672 kB
Inactive(anon):     6140 kB
Active(file):     179216 kB
Inactive(file):    25052 kB
Unevictable:        1744 kB
Mlocked:            1744 kB
SwapTotal:        102396 kB
SwapFree:         102396 kB
Dirty:                16 kB
Writeback:             0 kB
AnonPages:         23844 kB
Mapped:            19188 kB
Shmem:              6424 kB
Slab:             140780 kB
SReclaimable:     132312 kB
SUnreclaim:         8468 kB
KernelStack:        1000 kB
PageTables:          668 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      576276 kB
Committed_AS:      92620 kB
VmallocTotal:    1114112 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
CmaTotal:           8192 kB
CmaFree:            3736 kB

以下是一些系统信息:

root scruffy: log #  uptime
13:41:58 up 15:14,  3 users,  load average: 0.02, 0.04, 0.05

root scruffy: log #  uname -a
Linux scruffy 4.4.9-v7+ #884 SMP Fri May 6 17:28:59 BST 2016 armv7l GNU/Linux

这是systemd启动/关闭脚本:

# This script starts and stops our pool fill control python script

[Unit]
Description=Installing Python script for Pool Fill Control /w watchdog
Requires=basic.target
After=multi-user.target

[Service]
Type=notify
WatchdogSec=70s
ExecStart=/usr/bin/python /root/pool_control/pool_fill_control.py
ExecStop=/root/pool_control/setupgpio.sh
Restart=always

# The number of times the service is restarted within a time period can be set
# If that condition is met, the RPi can be rebooted
#
StartLimitBurst=4
StartLimitInterval=180s
# actions can be none|reboot|reboot-force|reboot-immidiate
StartLimitAction=none

# The following are defined the /etc/systemd/system.conf file and are
# global for all services
#
#DefaultTimeoutStartSec=90s
#DefaultTimeoutStopSec=90s
#
# They can also be set on a per process here:
# if they are not defined here, they fall back to the system.conf values
TimeoutStartSec=2s
TimeoutStopSec=2s

[Install]
WantedBy=multi-user.target

我尝试在全新安装的jessie上运行它,并将其移动到另一个Pi,所有结果都相同,经过一段不确定的时间后,脚本失败并且看门狗重新启动它。

有问题的脚本很长,所以我不确定在这里发布它的正确程序,但我确实在github上有这个:

https://github.com/rjsears/Pool_Fill_Control/blob/master/pool_fill_control.py

我正在寻找有关如何对代码进行故障排除以确定导致其失败的原因的指导,或者我是否有一些令人毛骨悚然的代码可以直接跳到具有更多python经验的人身上。我没有那么多经验,这是我的第一个(我认为是真实的)python脚本。

最终我想通过网页与内部网站进行交互,以便通过网页复制物理功能(按键,LED),但我希望脚本能够在更进一步之前正常工作。

非常感谢帮助或指导!

0 个答案:

没有答案