使用Python Tweepy指定开始和结束推文集合的确切时间?

时间:2014-10-04 08:12:05

标签: python-2.7 twitter tweepy

我有很多技术问题。我下面的python脚本通常可以工作(时间是yyyy-mm-dd'格式。但是在极其繁重的推文活动期间,例如每天收集超过500,000条推文,我的电脑内存不足,必须强行停止该计划。

我可以通过查看已停止的csv文件中最后一条推文的时间来处理,在这种情况下,它是在时间18:44:00。我尝试了很多时间格式(例如' yyyy-mm-dd hh:mm:ss'格式如下),但实际上没有。

import tweepy
import time
import csv

ckey = ""
csecret = ""
atoken = ""
asecret = ""

OAUTH_KEYS = {'consumer_key':ckey, 'consumer_secret':csecret,
    'access_token_key':atoken, 'access_token_secret':asecret}
auth = tweepy.OAuthHandler(OAUTH_KEYS['consumer_key'], OAUTH_KEYS['consumer_secret'])
api = tweepy.API(auth)

# Stream the first "xxx" tweets related to "car", then filter out the ones without geo-enabled
# Reference of search (q) operator: https://dev.twitter.com/rest/public/search

# Common parameters: Changeable only here
startSince = '2014-09-18 00:00:00'
endUntil = '2014-09-18 18:44:00'
suffix = '_18SEP2014.csv'

############################
### Lung cancer starts #####
searchTerms2 = '"lung cancer" OR "lung cancers" OR "lungcancer" OR "lungcancers" OR \
    "lung tumor" OR "lungtumor" OR "lung tumors" OR "lungtumors" OR "lung neoplasm"'

# Items from 0 to 500,000 (which *should* cover all tweets)
# Increase by 4,000 for each cycle (because 5000-6000 is over the Twitter rate limit)
# Then wait for 20 min before next request (becaues twitter request wait time is 15min)

counter2 = 0
for tweet in tweepy.Cursor(api.search, q=searchTerms2, 
    since=startSince, until=endUntil).items(999999999): # changeable here

    try:
        '''
        print "Name:", tweet.author.name.encode('utf8')
        print "Screen-name:", tweet.author.screen_name.encode('utf8')
        print "Tweet created:", tweet.created_at'''

        placeHolder = []
        placeHolder.append(tweet.author.name.encode('utf8'))
        placeHolder.append(tweet.author.screen_name.encode('utf8'))
        placeHolder.append(tweet.created_at)

        prefix = 'TweetData_lungCancer'
        wholeFileName = prefix + suffix     
        with open(wholeFileName, "ab") as f: # changeable here
            writeFile = csv.writer(f)
            writeFile.writerow(placeHolder)

        counter2 += 1

        if counter2 == 4000:
            time.sleep(60*20) # wait for 20 min everytime 4,000 tweets are extracted 
            counter2 = 0
            continue

    except tweepy.TweepError:
        time.sleep(60*20)
        continue

    except IOError:
        time.sleep(60*2.5)
        continue

    except StopIteration:
        break

0 个答案:

没有答案