Question

我有一个在nohup ./myprogram.py 1>console.out &后台运行的python脚本。该程序不断记录到某个日志文件，处理时间很长。跑了2天（周六和周日）后，我看到了

# myprogram and myprogram2 are both running in background
# myprogram2 clearly has finished 
[1]  + 25159 exit 1     nohup ./myprogram.py 1>console.out & 
[2]  + 25442 done       nohup ./myprogram2.py 1>console2.out &

myprogram的日志

2016-05-27 16:55:06 - sources.edf - INFO - processing day 1 ...
2016-05-27 16:55:06 - sources.edf - INFO - processing day 2 ...
...
2016-05-27 16:55:06 - sources.edf - INFO - processing day n ...

并停止（应该有n + 1和更多）。

可悲的是，我刚刚在console.out上进行了覆盖（所以在我看之前转储覆盖它...但是我从第n天起就开始了，似乎程序可以在没有任何错误/异常的情况下运行）

我知道这种描述有点或太模糊，无法指出退出的任何原因。我只需要知道一些关于此的线索。我并不是全新的，但我缺乏经验。所以任何可能的猜测都会受到赞赏。

简化的源代码：

import os
import sys
import logging
import logging.config as lconfig
from optparse import OptionParser
from contextlib import closing
from datetime import datetime, timedelta
from collections import defaultdict

import psycopg2
from configobj import ConfigObj

if __name__ == "__main__":
    ## setup optparser and parse argvs and return opts and args
    conf = ConfigObj(opts.config, list_values=False)[args[0]]
    __import__(conf["module"])
    ## myprogram and myprogram2 is running the same source code
    ## same module. only the data is different
    ## source will provide Mapper and iterator as APIs
    source = sys.modules[conf["module"]]
    ## extract start and end date, configure log

    # set up regions info and mapper
    ## connect to db and read countries and exchange list
    with closing(psycopg2.connect(conf["dsn"])) as db:
        cursor = db.cursor()
        regions = source.setup_region(conf['Regions'], cursor)
        ## find all wanted exchanges: (exch, region)
        exchanges = source.setup_exchanges(conf['Exchanges'], cursor)
        mapper = source.Mapper(exchanges, cursor, conf)

    iterator = source.iterator(conf, start, end)

    logger = logging.getLogger()
    logger.info('START')
    with regions:
        for filename, raw_rec in iterator:
            logger.info('Processing file {0}'.format(filename)
            try:
                record = source.Record(filename, raw_rec)
            except Exception as e:
                logger.warn("record parsing error: %s" % e)
                continue
            stks = mapper.find(record)
            if not stks:
                continue
            regs = defaultdict(set)
            for stk in stks:
                regnm = exchanges[stk[2]]
                regs[regnm].add(stk)
            for reg,secs in regs.iteritems():
                info = regions[reg]
                outf = info.get_file(record.get_tm())
                source.output(outf, record, secs, info.tz, conf)

    logger.info('END')

日志刚刚因为访问某个文件而停止...

Answer 1

数据库连接可能随时丢失，您的代码不会处理此问题。此外，计算机还需要不时重新启动，例如停电，安全更新。

显然，您的代码可能还有其他原因可能会中断，这就是为什么您需要让它作为服务运行，以防您想要使其可靠。

了解如何将此python脚本作为systemd服务运行，并将此服务配置为在失败时自动重新启动。即使存在使其崩溃的错误，这也可以防止它停止。如果它崩溃，Systemd将重新启动它。

有很多资源记录了这一点，但您可以从https://learn.adafruit.com/running-programs-automatically-on-your-tiny-computer/systemd-writing-and-enabling-a-service

开始

我的程序退出的原因

1 个答案: