我有一个pythonw应用程序,使用守护进程运行来监视子进程状态。
因此系统正常时有2个进程。一旦工人进程死亡,守护进程将在10秒内分叉一个新的工作进程。
但是在将这个应用程序部署到~40台服务器之后,我发现有时候,这两个进程可能会死在一起,我无法追踪或重现这种情况。
以下是守护进程的代码,简单:
import subprocess
import time
import os
import sys
import logging
logging.basicConfig(filename = os.path.join(os.getcwd(), 'boot.log'), level = logging.INFO, filemode = 'w', format = '%(asctime)s - %(levelname)s: %(message)s')
def main():
_process = None
argv = [os.path.join(os.getcwd(), 'pythonw.exe'), "worker.py"]
if sys.argv[1:]:
argv += sys.argv[1:]
while 1:
_process = None
logging.info("create a new worker instance!")
_process = subprocess.Popen(argv)
_process.wait()
time.sleep(10)
logging.info("worker instance has terminated!")
if __name__ == '__main__':
main()
更多线索:
以下是日志:
2014-02-18 15:42:10,015 - INFO: create a new bootuser instance!
2014-02-21 15:42:22,226 - INFO: worker instance has terminated!
2014-02-21 15:42:22,226 - INFO: create a new worker instance!
2014-02-24 15:42:33,365 - INFO: worker instance has terminated!
2014-02-24 15:42:33,365 - INFO: create a new worker instance!
2014-02-27 15:42:14,336 - INFO: worker instance has terminated!
2014-02-27 15:42:14,336 - INFO: create a new worker instance!
2014-02-28 15:42:25,384 - INFO: worker instance has terminated!
2014-02-28 15:42:25,384 - INFO: create a new worker instance!
注意:我在工作流程中有一个自我更新程序会自行终止。
我无法确定这种情况发生的频率,似乎是随机的
此应用程序由SYSTEM(在Windows 2008 R2中)作为服务运行