Question

我有一个python脚本，可以将与跟踪关键字相关的推文连续存储到文件中。但是，由于下面附加了错误，脚本会反复崩溃。如何编辑脚本以便自动重新启动？我已经看过很多解决方案，包括这个（Restarting a program after exception），但我不确定如何在我的脚本中实现它。

import sys
import tweepy
import json
import os

consumer_key=""
consumer_secret=""
access_key = ""
access_secret = ""

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
# directory that you want to save the json file
os.chdir("C:\Users\json_files")
# name of json file you want to create/open and append json to
save_file = open("12may.json", 'a')

class CustomStreamListener(tweepy.StreamListener):
    def __init__(self, api):
        self.api = api
        super(tweepy.StreamListener, self).__init__()

        # self.list_of_tweets = []

    def on_data(self, tweet):
        print tweet
        save_file.write(str(tweet))

    def on_error(self, status_code):
        print >> sys.stderr, 'Encountered error with status code:', status_code
        return True # Don't kill the stream
        print "Stream restarted"

    def on_timeout(self):
        print >> sys.stderr, 'Timeout...'
        return True # Don't kill the stream
        print "Stream restarted"

sapi = tweepy.streaming.Stream(auth, CustomStreamListener(api))
sapi.filter(track=["test"])

=============================================== ============================

Traceback (most recent call last):
  File "C:\Users\tweets_to_json.py", line 41, in <module>
    sapi.filter(track=["test"])
  File "C:\Python27\lib\site-packages\tweepy-2.3-py2.7.egg\tweepy\streaming.py", line 316, in filter
    self._start(async)
  File "C:\Python27\lib\site-packages\tweepy-2.3-py2.7.egg\tweepy\streaming.py", line 235, in _start
    self._run()
  File "C:\Python27\lib\site-packages\tweepy-2.3-py2.7.egg\tweepy\streaming.py", line 165, in _run
    self._read_loop(resp)
  File "C:\Python27\lib\site-packages\tweepy-2.3-py2.7.egg\tweepy\streaming.py", line 206, in _read_loop
    for c in resp.iter_content():
  File "C:\Python27\lib\site-packages\requests-1.2.3-py2.7.egg\requests\models.py", line 541, in generate
    chunk = self.raw.read(chunk_size, decode_content=True)
  File "C:\Python27\lib\site-packages\requests-1.2.3-py2.7.egg\requests\packages\urllib3\response.py", line 171, in read
    data = self._fp.read(amt)
  File "C:\Python27\lib\httplib.py", line 543, in read
    return self._read_chunked(amt)
  File "C:\Python27\lib\httplib.py", line 603, in _read_chunked
    value.append(self._safe_read(amt))
  File "C:\Python27\lib\httplib.py", line 660, in _safe_read
    raise IncompleteRead(''.join(s), amt)
IncompleteRead: IncompleteRead(0 bytes read, 1 more expected)

Answer 1

通过为流编写新函数来了解如何合并while / try循环：

def start_stream():
    while True:
        try:
            sapi = tweepy.streaming.Stream(auth, CustomStreamListener(api))
            sapi.filter(track=["Samsung", "s4", "s5", "note" "3", "HTC", "Sony", "Xperia", "Blackberry", "q5", "q10", "z10", "Nokia", "Lumia", "Nexus", "LG", "Huawei", "Motorola"])
        except: 
            continue

start_stream()

我通过CMD + C手动中断程序来测试自动重启。尽管如此，很高兴听到更好的方法来测试这些功能。

Answer 2

使用递归调用而不是无限循环时更好。请查看下面的过滤器功能。 e.g。

<input type="text" th:field="*{domain}" th:value="${domain}" readonly="readonly"/>

Answer 3

一种选择是尝试module multiprocessing。我认为有两个原因。

能够在一段时间内运行该过程而无需＆＃34; kill＆＃34;整个脚本/过程。
你可以把它放在一个for循环中，让它只是在它死亡或你选择杀死它时重新开始。

我完全采用了不同的方法，但这部分是因为我正在以常规（或假设的常规）间隔保存我的推文。 @Eugeune严，我认为尝试除了是一个简单而优雅的方式来处理问题。虽然，希望有人会对此发表评论;你真的不知道该方法何时或是否失败，但如果这真的很重要，那就是idk（并且很容易写几行来实现这一点）。

import tiipWriter #Twitter & Textfile writer I wrote with Tweepy.
from add import ThatGuy # utility to supply log file names that won't overwrite old ones.
import multiprocessing


if __name__ == '__main__':
        #number of time increments script needs to run        
        n = 60
        dir = "C:\\Temp\\stufffolder\\twiitlog"
        list = []
        print "preloading logs"
        ThatGuy(n,dir,list) #Finds any existing logs in the folder and one-ups it

        for a in list:
            print "Collecting Tweets....."
            # this is my twitter/textfile writer process
            p = multiprocessing.Process(target=tiipWriter.tiipWriter,args = (a,)) 
            p.start()
            p.join(1800) # num of seconds the process will run
            if p.is_alive():
                print " \n Saving Twitter Stream log   @  " + str(a)
                p.terminate()
                p.join()
            a = open(a,'r')
            a.close()
            if a.closed == True:
                print "File successfully closed"
            else: a.close()
            print "jamaica" #cuz why not

Answer 4

我用tweepy编写了一个2进程流。它将数据下载，压缩并转储到每小时轮换的文件中。该程序每小时重新启动一次，它可以定期检查流式传输过程，看是否有新的推文被下载。如果没有，它会重新启动整个系统。

可以找到代码here。请注意，对于压缩，它使用管道。如果不需要压缩，修改源代码很容易。

如果出错，如何重新启动tweepy脚本？

4 个答案: