Tweepy:忽略以前的推文以提高优化

时间:2019-01-22 03:55:37

标签: python api twitter tweepy

问题:尝试使用tweepy通过Cursor提取推文。我想确保我不拉以前拉过的推文。

这是工作代码:

import tweepy
import pandas as pd
import numpy as np

ACCESS_TOKEN = ""
ACCESS_TOKEN_SECRET = ""
CONSUMER_KEY = ""
CONSUMER_SECRET = ""

# OAuth process, using the keys and tokens
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

# Creation of the actual interface, using authentication
api = tweepy.API(auth, wait_on_rate_limit=True)

csvFile = open(r'filename', 'a')

#Use csv writer
headers = ['UserName', 'Tweet', 'TweetId', 'tweet_date', 'source', 'fav_count', 'retweet_count', 'coordinates', 'geo']

# definitions for writing to CSV
csvWriter = csv.writer(csvFile, lineterminator='\n')
# write the headers once
csvWriter.writerow(headers)


handles = ['pycon', 'gvanrossum']
previousTweets = 
 ['222288832031240000',
 '222287080586362000',
 '222277240178741000',
 '221414283844653000',
 '221188011906445000',
 '205274818877210000']


for handle in handles:   
    for status in tweepy.Cursor(api.user_timeline, screen_name= handle, tweet_mode="extended").items():
        if status.id not in previousTweets:
            csvWriter.writerow([status.user.name.encode('utf-8'), status.full_text.encode('utf-8'), status.id, status.created_at, status.source, 
                    status.favorite_count, status.retweet_count, status.coordinates, status.geo])
print(handle)

这需要很长时间,如果要拥有75条以上推文的PreviousTweet列表,它将变得无法使用。有没有人知道使用TweepyCursor函数时过滤掉旧推文的更好方法?

1 个答案:

答案 0 :(得分:2)

您可以将 i = 1; start loop i < = 6 //==> TRUE i++; //i == 2 printf // Will print 2 ///iteration 1 done i < = 6 //==> TRUE i++; //i == 3 printf // Will print 3 ///iteration 2 done . . . i < = 6 //==> TRUE i++; //i == 6 printf // Will print 6 ///iteration 5 done i < = 6 //==> TRUE i++; //i == 7 printf // Will print 7 ///iteration 6 done i < = 6 ==> FALSE end loop. 参数传递给光标。 这样可以获取比指定ID( http://docs.tweepy.org/en/v3.5.0/api.html#API.user_timeline

since_id