使用Tweepy提取推文一周

时间:2019-04-09 11:29:41

标签: python loops csv dataframe tweepy

我想将推文存储为CSV,我使用了tweepy,并且设法将其存储在CVS中,但它只提取一天的数据。我想提取并存储一周的数据,而无需每天提取。

这就是我所做的:

def tweets_to_data_frame(public_tweets):
    df = pd.DataFrame(data=[tweet.text for tweet in public_tweets], columns=['Tweets'])
    df['len'] = np.array([len(tweet.text) for tweet in public_tweets])
    df['date'] = np.array([tweet.created_at for tweet in public_tweets])
    df['retweets'] = np.array([tweet.retweet_count for tweet in public_tweets])
    df['lang'] = np.array([tweet.lang for tweet in public_tweets])
    return df

public_tweet= api.search('donald trump')
df = tweets_to_data_frame(public_tweet)
df.to_csv('donaldtrump.csv')
df.head(15)
    Tweets  len date    retweets    lang
0   RT @mehdirhasan: Stephen Miller’s Jewish uncle...   140 2019-04-09 11:08:23 67  en
1   RT @errollouis: "If the House ever gets his re...   140 2019-04-09 11:08:23 7927    en
2   RT @BillKristol: "This is what Kirstjen Nielse...   140 2019-04-09 11:08:22 73  en
3   RT @Newsweek: Trump claimed he wouldn't have t...   140 2019-04-09 11:08:21 7   en
4   RT @mehdirhasan: Stephen Miller’s Jewish uncle...   140 2019-04-09 11:08:20 67  en
5   The real reason Donald Trump just fired the he...   112 2019-04-09 11:08:19 0   en
6   RT @BillKristol: "This is what Kirstjen Nielse...   140 2019-04-09 11:08:19 73  en
7   RT @BobbyEberle13: Ilhan Omar is now praying f...   140 2019-04-09 11:08:18 457 en
8   The guy met the queen last time out and lots o...   140 2019-04-09 11:08:17 0   en
9   RT @PalmerReport: Donald Trump’s deconstructio...   135 2019-04-09 11:08:17 107 en
10  RT @ByronYork: Donald Trump has been paying ta...   139 2019-04-09 11:08:16 1232    en
11  RT @mehdirhasan: Stephen Miller’s Jewish uncle...   140 2019-04-09 11:08:16 67  en
12  RT @SayWhenLA:  YUGE !!\n\nPresident Donald J...  140 2019-04-09 11:08:15 1316    en
13  "As long as you're going to be thinking anyway...   100 2019-04-09 11:08:15 0   en
14  RT @TheLastRefuge2: Diana West Discusses The R...   140 2019-04-09 11:08:15 113 en

我想要的是一周的数据,

我的想法是:

def tweets_to_data_frame1(public_tweets):
    for tweets in tweepy.Cursor(api.search,q = (public_tweets),count=100,
                           since = "2019-04-04",
                           until = "2019-04-07").items():
        df = pd.DataFrame(data=[tweets.text for tweet in tweets], columns=['Tweets'])
        df['len'] = np.array([len(tweets.text) for tweet in tweets])
        df['date'] = np.array([tweets.created_at for tweet in tweets])
        df['retweets'] = np.array([tweets.retweet_count for tweet in tweets])
        df['lang'] = np.array([tweets.lang for tweet in tweets])

        return df

df1 = tweets_to_data_frame1('donald trump')

错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-24-96745c16c99c> in <module>
----> 1 df1 = tweets_to_data_frame1('donald trump')

<ipython-input-23-e5866a4adb3f> in tweets_to_data_frame1(public_tweets)
      3                            since = "2019-04-04",
      4                            until = "2019-04-07").items():
----> 5         df = pd.DataFrame(data=[tweets.text for tweet in tweets], columns=['Tweets'])
      6 
      7         #df['id'] = np.array([tweet.id for tweet in tweets])

TypeError: 'Status' object is not iterable

预期结果:

Tweets  len date    retweets    lang
0   RT @mehdirhasan: Stephen Miller’s Jewish uncle...   140 2019-04-09 11:08:23 67  en
1   RT @errollouis: "If the House ever gets his re...   140 2019-04-09 11:08:23 7927    en
2   RT @BillKristol: "This is what Kirstjen Nielse...   140 2019-04-09 11:08:22 73  en
3   RT @Newsweek: Trump claimed he wouldn't have t...   140 2019-04-09 11:08:21 7   en
4   RT @mehdirhasan: Stephen Miller’s Jewish uncle...   140 2019-04-09 11:08:20 67  en
5   The real reason Donald Trump just fired the he...   112 2019-04-09 11:08:19 0   en
6   RT @BillKristol: "This is what Kirstjen Nielse...   140 2019-04-09 11:08:19 73  en
7   RT @BobbyEberle13: Ilhan Omar is now praying f...   140 2019-04-09 11:08:18 457 en
8   The guy met the queen last time out and lots o...   140 2019-04-09 11:08:17 0   en
9   RT @PalmerReport: Donald Trump’s deconstructio...   135 2019-04-09 11:08:17 107 en
10  RT @ByronYork: Donald Trump has been paying ta...   139 2019-04-09 11:08:16 1232    en
11  RT @mehdirhasan: Stephen Miller’s Jewish uncle...   140 2019-04-09 11:08:16 67  en
12  RT @SayWhenLA:  YUGE !!\n\nPresident Donald J...  140 2019-04-09 11:08:15 1316    en
13  "As long as you're going to be thinking anyway...   100 2019-04-09 11:08:15 0   en
14  RT @TheLastRefuge2: Diana West Discusses The R...   140 2019-04-09 11:08:15 113 en

但持续一周

1 个答案:

答案 0 :(得分:0)

所以我想问题出在这里:

for tweets in tweepy.Cursor(api.search,q = (public_tweets),count=100,since = "2019-04-04",until = "2019-04-07").items():

tweepy.Cursor(...).items()是一个列表。因此,tweets变量的每个值都是一条推文。然后,您尝试使用列表推导,因此您尝试遍历该单个推文。这正是错误消息告诉您的内容。

相反,您可以做的是:

tweets = tweepy.Cursor(...).items()
df = pd.DataFrame(data=[tweet.text for tweet in tweets], columns=['Tweets'])

顺便说一句,我还要重命名public_tweets的{​​{1}}自变量

def tweets_to_data_frame1(public_tweets):参数在这种情况下只是一个搜索查询字符串,因此名称具有误导性