Question

我尝试运行脚本从json文件中提取数据，该文件每1到2分钟更新一次。基本概念是脚本首先执行提取过程，然后休眠1分钟并再次执行提取过程。它是无限循环;

它运行了一个多月并且突然停了一天没有任何错误消息，我重新启动它并且工作正常。然而，几天之后它又无缘无故地停了下来。

我不知道问题是什么，可以提供我的脚本。下面是我写的python文件。

from requests.auth import HTTPBasicAuth
    import sys
    import requests
    import re
    import time
    import datetime
    import json

    from CSVFileGen1 import csv_files_generator1
    from CSVFileGen2 import csv_files_generator2
    from CSVFileGen3 import csv_files_generator3
    from CSVFileGen4 import csv_files_generator4

    def passpara():
            current_time = datetime.datetime.now()
            current_time_string = current_time.strftime('%Y-%m-%d %H:%M:%S')
            sys.path.append('C:\\semester3\\data_copy\\WAZE\\output_scripts\\TNtool')
            FileLocation1 = 'C:\\semester3\\data_copy\\www\\output\\test1'
            FileLocation2 = 'C:\\semester3\\data_copy\\www\\output\\test2'
            FileLocation3 = 'C:\\semester3\\data_copy\\www\\output\\test3'
            FileLocation4 = 'C:\\semester3\\data_copy\\www\\output\\test4'
            try:
                    r1 = requests.get('https://www...=JSON')
                    json_text_no_lines1 = r1.text
                    csv_files_generator1(current_time, json_text_no_lines1, FileLocation1)
            except requests.exceptions.RequestException as e:
                    print 'request1 error'
                    print e
            try:
                    r2 = requests.get('https://www...=JSON')
                    json_text_no_lines2 = r2.text
                    csv_files_generator2(current_time, json_text_no_lines2, FileLocation2)
            except requests.exceptions.RequestException as e:
                    print 'request2 error'
                    print e
            try:
                    r3 = requests.get('https://www...=JSON')
                    json_text_no_lines3 = r3.text
                    csv_files_generator3(current_time, json_text_no_lines3, FileLocation3)
            except requests.exceptions.RequestException as e:
                    print 'request3 error'
                    print e
            try:
                    r4 = requests.get('https://www...JSON')
                    json_text_no_lines4 = r4.text
                    csv_files_generator4(current_time, json_text_no_lines4, FileLocation4)
            except requests.exceptions.RequestException as e:
                    print 'request4 error'
                    print e
            print current_time_string + ' Data Operated. '   
    while True:
        passpara()
        time.sleep(60)

这是第一个脚本调用的CSVFileGen1。此脚本解析json文件并将信息保存到csv文件。

import json
import datetime
import time
import os.path
import sys
from datetime import datetime
from dateutil import tz


def meter_per_second_2_mile_per_hour(input_meter_per_second):
    return input_meter_per_second * 2.23694

def csv_files_generator1(input_datetime, input_string, target_directory):

        try:
                real_json = json.loads(input_string)
                #get updatetime string
                updatetime_epoch = real_json['updateTime']
                update_time = datetime.fromtimestamp(updatetime_epoch/1000)
                updatetime_string = update_time.strftime('%Y%m%d%H%M%S')
                file_name = update_time.strftime('%Y%m%d%H%M')
                dir_name = update_time.strftime('%Y%m%d')
                if not os.path.exists(target_directory + '\\' + dir_name):
                    os.makedirs(target_directory + '\\' + dir_name)
                if not os.path.isfile(target_directory + '\\' + dir_name + '\\' + file_name):
                        ......#some detailed information I delete it for simplicity
        except ValueError, e:
                print e

Answer 1

乍一看，我认为这将是sys.path变满（如提到的litelite）。我认为您可以安全地将此代码块移到函数外部以防止它无限运行（仅附加到sys.path一次）：

sys.path.append('C:\\semester3\\data_copy\\WAZE\\output_scripts\\TNtool')
FileLocation1 = 'C:\\semester3\\data_copy\\www\\output\\test1'
FileLocation2 = 'C:\\semester3\\data_copy\\www\\output\\test2'
FileLocation3 = 'C:\\semester3\\data_copy\\www\\output\\test3'
FileLocation4 = 'C:\\semester3\\data_copy\\www\\output\\test4'

所以，你的代码看起来像是：

sys.path.append('C:\\semester3\\data_copy\\WAZE\\output_scripts\\TNtool')
FileLocation1 = 'C:\\semester3\\data_copy\\www\\output\\test1'
FileLocation2 = 'C:\\semester3\\data_copy\\www\\output\\test2'
FileLocation3 = 'C:\\semester3\\data_copy\\www\\output\\test3'
FileLocation4 = 'C:\\semester3\\data_copy\\www\\output\\test4'
while True:
    passpara()
    time.sleep(60)

当我尝试无限附加到sys.path的程序时，我的RAM被大量使用。您可能希望查看脚本的内存使用情况，因为Python脚本可能会挂起，因为它没有足够的内存。运行此脚本几分钟后，由于Python使用了大约10 GB的RAM（使用了所有可用的RAM），我的Chrome窗口崩溃了。

请注意我没有time.sleep（）。在没有任何暂停的情况下运行几分钟后获得的结果可能反映了在每60秒运行一个月时发现的结果。

我的计划如下：

import sys
while True:
    sys.path.append("C:\\semester3\\data_copy\\WAZE\\output_scri‌pts\\TNtool")

有趣的说明：while循环中变量的简单递增不会快速使用大量RAM。这主要是因为有问题的变量每次都被覆盖，并且不会占用额外的内存。在您的情况下，sys.path是一个“列表”，并附加到它无限导致额外的RAM使用。示例程序：

count = 0
while True:
    count += 1

另一方面，附加到列表会大量使用RAM，这是预期的：

count = []
while True:
    count.append(1)

Answer 2

我相信您的问题已经得到了解答，原因可能是您的脚本可能失败的原因，因此我不会复制该答案。

但是我会提供替代解决方案。不要让脚本连续运行数天，而是删除无限循环，并将其设置为每分钟使用任务调度程序（Windows）或cron（Linux）运行。这有几个直接的好处：

每次运行后清除内存;
从意外错误中恢复可能在60秒内发生，而不是在您看到脚本已停止运行时发生。

一个月后python脚本无故停止（没有错误消息）

2 个答案: