Question

我编写了一个程序，它使用while循环永远运行，并由config解析器读取的.cfg文件提供休眠间隔。

它运行良好，大约二十六天左右。然后它停止运行，但当然是因为它作为服务启动而保持运行。此外，当时我没有考虑将主循环包装在try异常块中并使用import syslog进行记录。

下面的代码示例仅包含主要块。我没有包括其余部分，因为大部分只是一个典型的任务队列，结果队列构成了多处理模块。

什么可能导致这种行为？我的网络设备对象是否被垃圾收集了，因为它们没有通过while循环进行实例化？这只是编写/设计长期运行的Python程序的一种不好的方法吗？

if __name__ == '__main__':

#
#Hold results in the multiprocessing queues
#
monitor_results = ''

#
#Our task is to monitor and this will hold our tasks
#
monitors = []

# 
# list of network devices represented as 
# objects that will be monitored
#
device_list = []

#
# The addresses of the devices are provided by the config parser's
# .cfg file
#
device_addresses = list(config['monitored']['devices'].split(','))

for address in device_addresses:
        password = get_password(address)
        device_list.append( Device(address, 'admin', password))

for d in device_list:
    path = ['sys', 'clock']
    request = Transport(headers, timeout=20)
    request.http.credentials.add(user, passwd)
    request.url = DeviceUri(d.mgmt_address, path ).uri
    monitors.append(request)


while True:

    tasks = multiprocessing.JoinableQueue(maxsize=len(monitors) + 1)
    results = multiprocessing.Queue()

    num_consumers = multiprocessing.cpu_count() * 2
    consumers = [Consumer(tasks, results) for i in range(num_consumers)]

    for w in consumers:
        w.start()

    for monitor in monitors:
        tasks.put(Monitor(monitor))

    for i in range(num_consumers):
        tasks.put(None)

    tasks.join()

    count = 0

    while not results.empty():
        result = results.get()
        if result is not None:
            monitor_results += result + '\n'
            count += 1

    if count > 0:
        mail_result = send_email( monitor_results )

    #
    #reset the monitor results or it will keep sending all previous results
    #
    monitor_results = ''

    time.sleep(poll_interval)

Answer 1

我找到了答案。由于我通过logrotate.d旋转了生成的日志，因此他们删除了日志而不是截断日志。当计时器重新启动循环时，程序找不到要写入和退出的日志文件。因此，我将logrotate.d重新配置为'copytruncate'而不是create。

为什么程序意外停止运行？

1 个答案: