Question

这让我发疯了。正如您在下面所见，我尝试使用简单的while循环执行几个tweepy搜索并将它们附加到数据框中。出于某种原因，在拉出第一组100条推文之后，它只是重复该组而不是执行新的搜索。任何建议都将不胜感激。

import sys
import csv
import pandas as pd
import tweepy
from tweepy import OAuthHandler

consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''

auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth)

num_results = 200
result_count = 0
last_id = None 
df = pd.DataFrame(columns=['Name', 'Location', 'Followers', 'Text',    'Coorinates'])

while result_count <  num_results: 
    result = api.search(q='',count=100, geocode= "38.996918,-104.995826,190mi", since_id = last_id) 
    for tweet in result:
        user = tweet.user
        last_id = tweet.id_str
        name = user.name
        friends = user.friends_count
        followers = user.followers_count
        text = tweet.text.encode('utf-8')
        location = user.location
        coordinates = tweet.coordinates
        df.loc[result_count] = pd.Series({'Name':name, 'Location':location, 'Followers':followers, 'Text':text, 'Coordinates':coordinates})
        print(text)
        result_count += 1

# Save to Excel
print("Writing all tables to Excel...")
df.to_csv('out.csv')
print("Excel Export Complete.")

Answer 1

API.search方法返回与指定查询匹配的推文。它不是Streaming APi，因此它会立即返回所有数据。
此外，在您的查询参数中，您添加了count，它指定了要检索的状态数。

所以问题在于，对于您的查询，您将在每次迭代时返回完整集的前100个数据。

我建议您更改类似的代码

result = api.search(q='', geocode= "38.996918,-104.995826,190mi", since_id = last_id) 
for tweet in result:
    user = tweet.user
    last_id = tweet.id_str
    name = user.name
    friends = user.friends_count
    followers = user.followers_count
    text = tweet.text.encode('utf-8')
    location = user.location
    coordinates = tweet.coordinates
    df.loc[result_count] = pd.Series({'Name':name, 'Location':location, 'Followers':followers, 'Text':text, 'Coordinates':coordinates})
    print(text)

让我知道。

Tweepy搜索w / While循环

1 个答案: