Question

我正在使用Twitter Streaming API来获取实时推文数据。

我可以将数据打印到控制台。但是我想要的是将数据保存到文件中，并且该数据不应早于5分钟。

我如何像记录日志文件一样连续滚动存储最近5分钟内数据的文件。

同时应该可以读取文件。

在Python中有什么方法可以做到这一点吗？

我没有碰到过这样的事情，我们可以提到文件可以保存特定数据的持续时间。

Answer 1

这应该使您入门。只要年龄超过300秒，它就会从列表中杀死该推文。

import time;
TIME_LIMIT = 300; # 5 mins in seconds


# Get the tweet text and the time stamp
def GetTweet():
    #...
    # Wait until there is a new tweet from Donald :)
    text = 'your function that gets the text goes here';

    # Get the time stamp
    time_ = time.time();
    return text, time_


# Kill all the old data that is older than X seconds
def KillOldTweet(tweets):
    time_now = time.time();  # Capture the current time

    # For every tweet stored remove the old one
    for i, tweet in enumerate(tweets):
        time_diff = time_now - tweet[1]; # get the time difference
        if time_diff > TIME_LIMIT: # if older than X secods, kill that tweet from the list
            tweets.pop(i);
            pass;      
        pass;

    return tweets; # return the list

# Updates the file with the list of text of tweets
def UpdateFile(tweets):
    with open('output.txt', 'w') as file:  # open the file and close it after writing
        texts = [ i for i, j in tweets ]; # unzip;
        out_text = str(texts); # convert to string
        print(out_text); #
        file.write(out_text); # overwrite to file
        pass;
    pass;


tweets = [];
while(1):
    # Get the new tweet and append to the list
    text, time_ = GetTweet(); # Wait until a new tweet arrived
    tweet = (text, time_); # zip it into a tuple
    tweets.append(tweet); # append it to list


    # Kill the old tweets from the list
    tweets = KillOldTweet(tweets)

    # Update the file with the fresh tweets
    UpdateFile(tweets);

    # Sleep for 1 second.
    time.sleep(1);

    pass;

但是我建议您使用套接字模块，而不要写入文本文件。或使用pickle模块轻松地从文件中解包

Answer 2

搜索“ python logrotate”会发现：https://docs.python.org/3.7/library/logging.handlers.html#logging.handlers.TimedRotatingFileHandler。可能值得尝试。

如果您希望自己编写一些代码，则可以管理内存中的tweet（例如，使用collections.deque，以便轻松弹出旧tweet并追加新tweet），然后不时刷新一次到文件中。 ..（（或使用套接字，泡菜，只是变量名即可将此数据传递给您的分析函数，如其他答案中所述）。

Answer 3

将数据和实际时间保存在文件中，并检查实际时间是否相差5分钟。使用时间。或使用睡眠功能并每5分钟擦除一次旧数据。

如何创建不包含5分钟以上信息的文件？

3 个答案: